How to tune the spark executor number? -


i submit spark stream calculation task stand alone spark cluster. submit command below:

./bin/spark-submit   \ --master spark://es01:7077 \ --executor-memory 4g --num-executors 1\  /opt/flowspark/sparkstream/latest5min.py    1>a.log 2>b.log 

note use num-executors 1. because want 1 executor.

then ps comand can find below output.

[root@es01 ~]# ps -ef | grep java | grep -v grep  | grep spark root     11659     1  0 apr19 ?        00:48:25 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4g -xmx4g -xx:maxpermsize=256m org.apache.spark.deploy.master.master --ip es01 --port 7077 --webui-port 8080 root     11759     1  0 apr19 ?        00:42:59 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4g -xmx4g -xx:maxpermsize=256m org.apache.spark.deploy.worker.worker --webui-port 8081 spark://es01:7077 root     18538 28335 38 16:13 pts/1    00:01:52 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms1g -xmx1g -xx:maxpermsize=256m org.apache.spark.deploy.sparksubmit --master spark://es01:7077 --executor-memory 4g --num-executors 1 /opt/flowspark/sparkstream/latest5min.py root     18677 11759 46 16:13 ?        00:02:14 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4096m -xmx4096m -dspark.driver.port=55652 -xx:maxpermsize=256m org.apache.spark.executor.coarsegrainedexecutorbackend --driver-url spark://coarsegrainedscheduler@10.79.148.184:55652 --executor-id 0 --hostname 10.79.148.184 --cores 1 --app-id app-20160509161303-0048 --worker-url spark://worker@10.79.148.184:35012 root     18679 11759 46 16:13 ?        00:02:13 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4096m -xmx4096m -dspark.driver.port=55652 -xx:maxpermsize=256m org.apache.spark.executor.coarsegrainedexecutorbackend --driver-url spark://coarsegrainedscheduler@10.79.148.184:55652 --executor-id 1 --hostname 10.79.148.184 --cores 1 --app-id app-20160509161303-0048 --worker-url spark://worker@10.79.148.184:35012 root     18723 11759 47 16:13 ?        00:02:14 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4096m -xmx4096m -dspark.driver.port=55652 -xx:maxpermsize=256m org.apache.spark.executor.coarsegrainedexecutorbackend --driver-url spark://coarsegrainedscheduler@10.79.148.184:55652 --executor-id 2 --hostname 10.79.148.184 --cores 1 --app-id app-20160509161303-0048 --worker-url spark://worker@10.79.148.184:35012 

from understanding

11659 , 11759 spark stand cluster process.

18538 driver program.

18677 18679 18723 should worker process now.

why there still 3 since use num-executor 1 ?

check spark.executor.cores in spark defaults, documentation

the number of cores use on each executor. yarn , standalone mode only.  in standalone mode, setting parameter allows application run multiple executors on same worker, provided there enough cores on worker.  otherwise, 1 executor per application run on each worker. 

http://spark.apache.org/docs/latest/configuration.html#execution-behavior


Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -