​Spark Configuration 官方文档​

​Spark Configuration 中文文档​


系统配置:


  1. Spark属性:控制大部分的应用程序参数,可以用SparkConf对象或者Java系统属性设置
  2. 环境变量:可以通过每个节点的conf/spark-env.sh脚本设置。例如IP地址、端口等信息
  3. 日志配置:可以通过log4j.properties配置


1. Spark 属性



These properties can be set directly on a SparkConf passed to your SparkContext. SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through the set() method. For example, we could initialize an application with two threads as follows:
Note that we run with local[2], meaning two threads - which represents “minimal” parallelism, which can help detect bugs that only exist when we run in a distributed context.



val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("CountingSheep")
val sc = new SparkContext(conf)


bin/spark-submit will also read configuration options from conf/spark-defaults.conf, in which each line consists of a key and a value separated by whitespace. For example:




spark.master            spark://5.6.7.8:7077
spark.executor.memory 4g
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer





优先级:



SparkConf > CLI > spark-defaults.conf





cat spark-env.sh
JAVA_HOME=/data/jdk1.8.0_111
SCALA_HOME=/data/scala-2.11.8
SPARK_MASTER_IP=192.168.1.10
HADOOP_CONF_DIR=/data/hadoop-2.6.5/etc/hadoop
SPARK_LOCAL_DIRS=/data/spark-1.6.3-bin-hadoop2.6/spark_data
SPARK_WORKER_DIR=/data/spark-1.6.3-bin-hadoop2.6/spark_data/spark_works





cat slaves
master
slave1
slave2


cat spark-defaults.conf
spark.master spark://master:7077
spark.serializer org.apache.spark.serializer.KryoSerializer




Spark - 配置参数详解_java