How to tune the spark executor number? -
i submit spark stream calculation task stand alone spark cluster. submit command below:
./bin/spark-submit \ --master spark://es01:7077 \ --executor-memory 4g --num-executors 1\ /opt/flowspark/sparkstream/latest5min.py 1>a.log 2>b.log
note use num-executors 1. because want 1 executor.
then ps comand can find below output.
[root@es01 ~]# ps -ef | grep java | grep -v grep | grep spark root 11659 1 0 apr19 ? 00:48:25 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4g -xmx4g -xx:maxpermsize=256m org.apache.spark.deploy.master.master --ip es01 --port 7077 --webui-port 8080 root 11759 1 0 apr19 ? 00:42:59 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4g -xmx4g -xx:maxpermsize=256m org.apache.spark.deploy.worker.worker --webui-port 8081 spark://es01:7077 root 18538 28335 38 16:13 pts/1 00:01:52 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms1g -xmx1g -xx:maxpermsize=256m org.apache.spark.deploy.sparksubmit --master spark://es01:7077 --executor-memory 4g --num-executors 1 /opt/flowspark/sparkstream/latest5min.py root 18677 11759 46 16:13 ? 00:02:14 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4096m -xmx4096m -dspark.driver.port=55652 -xx:maxpermsize=256m org.apache.spark.executor.coarsegrainedexecutorbackend --driver-url spark://coarsegrainedscheduler@10.79.148.184:55652 --executor-id 0 --hostname 10.79.148.184 --cores 1 --app-id app-20160509161303-0048 --worker-url spark://worker@10.79.148.184:35012 root 18679 11759 46 16:13 ? 00:02:13 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4096m -xmx4096m -dspark.driver.port=55652 -xx:maxpermsize=256m org.apache.spark.executor.coarsegrainedexecutorbackend --driver-url spark://coarsegrainedscheduler@10.79.148.184:55652 --executor-id 1 --hostname 10.79.148.184 --cores 1 --app-id app-20160509161303-0048 --worker-url spark://worker@10.79.148.184:35012 root 18723 11759 47 16:13 ? 00:02:14 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -xms4096m -xmx4096m -dspark.driver.port=55652 -xx:maxpermsize=256m org.apache.spark.executor.coarsegrainedexecutorbackend --driver-url spark://coarsegrainedscheduler@10.79.148.184:55652 --executor-id 2 --hostname 10.79.148.184 --cores 1 --app-id app-20160509161303-0048 --worker-url spark://worker@10.79.148.184:35012
from understanding
11659 , 11759 spark stand cluster process.
18538 driver program.
18677 18679 18723 should worker process now.
why there still 3 since use num-executor 1 ?
check spark.executor.cores in spark defaults, documentation
the number of cores use on each executor. yarn , standalone mode only. in standalone mode, setting parameter allows application run multiple executors on same worker, provided there enough cores on worker. otherwise, 1 executor per application run on each worker.
http://spark.apache.org/docs/latest/configuration.html#execution-behavior
Comments
Post a Comment