先搞清几个概念:
spark 提交任务方式
spark 提交任务有两种方式,一种是yarn-client,一种是yarn-cluster,这两种方式的区别就是:
yarn-cluster的ApplicationMaster在任一一台NodeManager上启动,此方式ApplicationMaster包含driver,am的内存:driver.memory+driver.memoryOverhead;
yarn-client方式,driver在提交任务的节点启动,一直运行直到整个任务执行完,而ApplicationMaster将在任意一台nodeManager启动,也就是说,driver和am是分开的;
而这种方式,am内存:spark.yarn.am.memory + spark.yarn.am.memoryOverhead
memoryOverhead
spark任务提交后,会给driver,executor,am分配额外的内存,这个内存就叫做memoryOverhead,例如我们提交任务指定--executor-memory 1g
,实际分配内存为:1g+max(1g*0.07,384m)=1g+384m=1024m+384m=1408m;默认情况下,container 内存分配的增幅为512m,所以实际给executor会分配1536m;
- spark.yarn.executor.memoryOverhead:值为executorMemory * 0.07, with minimum of 384
- spark.yarn.driver.memoryOverhead:值为driverMemory * 0.07, with minimum of 384
- spark.yarn.am.memoryOverhead:值为AM memory * 0.07, with minimum of 384
验证
现在我们有个队列root.users.leo,这个队列资源最大限制为:2core,3GB,AM内存上限为0.5*3GB=1536m,提交方式我们以yarn-cluster为例
spark-submit \
--principal leo@HAOHAOZHU.HADOOP \
--keytab /root/leo.keytab \
--class SparkWordCount \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
--queue root.users.leo \
--master yarn --deploy-mode cluster kerberos-java-1.0-SNAPSHOT.jar
am.memory=driver.memory=1g+max(1g * 0.07,384m)=1408m,实际分配为1536m,drvier.memory实际指定上限为1536m-384m=1152m;
executor.memory=1g+max(1g * 0.07,384m)=1408m,实际分配为1536m,executor.memory实际指定上限也是1536m-384m=1152m;
整个spark任务实际分配的总内存量为:1536m+1536m=3072m
查看控制台
现在我们将dirver内存设置为1153mb,这样的话,am内存将超过队列限制
spark-submit \
--principal leo@HAOHAOZHU.HADOOP \
--keytab /root/leo.keytab \
--class SparkWordCount \
--driver-memory 1153m \
--executor-memory 1g \
--executor-cores 1 \
--queue root.users.leo \
--master yarn --deploy-mode cluster kerberos-java-1.0-SNAPSHOT.jar
启动的时候提示am超过可允许的最大值
[Wed Oct 16 18:10:53 +0800 2019] Application is added to the scheduler and is not yet activated.
(Resource request: <memory:2048, vCores:1> exceeds maximum AM resource allowed)
控制台显示任务一直处于ACCEPTED状态,但没有container初始化,也没有内存分配,即便只是多了1mb,ApplicationMaster就无法启动,控制台为啥是2048?这点我也比较奇怪,但可以确定的是在driver.memory设置为1152m,yarn会分配一个1536m的containr启动am
再来做个实验,现在我们将driver-memory设置成1152,也就是dirver内存的最大值,实际启动后AM会占用1536mb内存;因为我们队列最大使用内存为3GB,也就是3072mb;
意味着executor内存总大小不能超过:3072-1536-384=1152mb
spark-submit \
--principal leo@HAOHAOZHU.HADOOP \
--keytab /root/leo.keytab \
--class SparkWordCount \
--driver-memory 1152m \
--executor-memory 1153m \
--executor-cores 1 \
--queue root.users.leo \
--master yarn --deploy-mode cluster kerberos-java-1.0-SNAPSHOT.jar
任务提交后,控制台一直显示打印running日志
19/10/16 18:22:00 INFO yarn.Client: Application report for application_1571134079417_0076 (state: RUNNING)
但是查看yarn控制台显示只有一个container被分配
查看Nodes Container列表,显示只有AM启动了,Executor根本没启动,原因很明显,因为executor所需内存已经超过目前队列剩余内存
19/10/16 18:22:03 WARN cluster.YarnClusterScheduler: Initial job has not accepted any resources;
check your cluster UI to ensure that workers are registered and have sufficient resources
只要executor内存小于等于1152mb,任务都是可以执行成功的;希望对大家理解者三个参数有所帮助
End