Spark源码解析-Yarn部署流程(ApplicationMaster)
可微信搜索 知了小巷 ,关注公众号支持一下,谢谢。另外,公众号后台回复 资料 ,可领取大数据2020学习视频资料。
前文【Spark源码解析Yarn部署流程(SparkSubmit)】
中
createContainerLaunchContext
用来运行ApplicationMaster。
主要调用是在:
yarnClient.submitApplication(appContext)。
RM:ResourceManager。
At this point, the RM will have accepted the application and in the background, will go through the process of allocating a container with the required specifications and then eventually setting up and launching the AM on the allocated container.
RM服务端实现:
resourcemanager#ClientRMService.java#submitApplication
// RMAppManager.java#submitApplication
rmAppManager.submitApplication(submissionContext, System.currentTimeMillis(), user);
RMAppManager.java#submitApplication
// 1. RMAppManager.java#submitApplication方法
RMAppImpl application = createAndPopulateNewRMApp(
submissionContext, submitTime, user, false, -1, null);
// 2. createAndPopulateNewRMApp方法
// Create RMApp
RMAppImpl application =
new RMAppImpl(applicationId, rmContext, this.conf,
submissionContext.getApplicationName(), user,
submissionContext.getQueue(),
submissionContext, this.scheduler, this.masterService,
submitTime, submissionContext.getApplicationType(),
submissionContext.getApplicationTags(), amReqs, placementContext,
startTime);
if (UserGroupInformation.isSecurityEnabled()) {
// ...
} else {
// 向RMAppImpl 发送 START事件
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.START));
}
// 3. RMAppImpl.java#RMAppImpl构造方法
this.stateMachine = stateMachineFactory.make(this);
// 后面会调用RMAppImpl里的handle方法
this.stateMachine.doTransition(event.getType(), event);
会执行到状态转化的过程:
RMAppImpl.java#StateMachineFactory
// 收到START事件,调用RMAppNewlySavingTransition#doTransition函数,并且RMAppImpl状态由NEW转化成NEW_SAVING
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
RMAppEventType.START, new RMAppNewlySavingTransition())
后续会有一系列的事件处理(略),关键状态到调度SCHEDULED:
RMAppAttemptImpl#ScheduleTransition
调度器分配资源allocate。
// AM resource has been checked when submission
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(...);
CapacityScheduler调度的过程:
CapacityScheduler.java#allocate方法
// CapacityScheduler.java#allocate方法
@Override
@Lock(Lock.NoLock.class)
public Allocation allocate(ApplicationAttemptId applicationAttemptId,
List<ResourceRequest> ask, List<SchedulingRequest> schedulingRequests,
List<ContainerId> release, List<String> blacklistAdditions,
List<String> blacklistRemovals, ContainerUpdates updateRequests) {
// ...
// Update application requests
// 重点在这里 updateResourceRequests
if (application.updateResourceRequests(ask) || application
.updateSchedulingRequests(schedulingRequests)) {
updateDemandForQueue = (LeafQueue) application.getQueue();
}
// ...
}
// 后面调到AppSchedulingInfo.java#updateResourceRequests方法
// The ApplicationMaster is updating resource requirements for the
// application, by asking for more resources and releasing resources acquired
// by the application.
// 然后是LocalityAppPlacementAllocator.java#updatePendingAsk方法
// Update resource requests
for (ResourceRequest request : requests) {
// Update asks 把request放入Map里面等待NodeManager心跳
resourceRequestMap.put(resourceName, request);
}
NameNode心跳过来之后,CapacityScheduler的handle方法:
@Override
public void handle(SchedulerEvent event) {
// ...
case NODE_UPDATE:
{
NodeUpdateSchedulerEvent nodeUpdatedEvent = (NodeUpdateSchedulerEvent)event;
nodeUpdate(nodeUpdatedEvent.getRMNode());
}
break;
// ...
}
// 在nodeUpdate方法里面,会调用allocateContainersToNode(rmNode.getNodeID(), true);
// 在allocateContainersToNode方法里面,会调用allocateContainersToNode(candidates, withNodeHeartbeat)
// 然后是allocateContainerOnSingleNode(candidates, node, withNodeHeartbeat)
// 然后是submitResourceCommitRequest(getClusterResource(), assignment);
// 在submitResourceCommitRequest方法中,异步或同步commit
tryCommit(cluster, request, true);
// 在tryCommit方法中
app.apply(cluster, request, updatePending)
// 在apply方法中,向RMContainerImpl发送RMContainerEvent RMContainerEventType.START事件
rmContainer.handle(
new RMContainerEvent(containerId, RMContainerEventType.START));
// 又是一系列的事件处理...好累...
// ApplicationMasterLauncher.java#handle
case LAUNCH:
launch(application);
break;
// AMLauncher#launch 启动AM方法:与对应的NodeManager通信,启动AM
public class AMLauncher implements Runnable {...}
// ContainerManagerImpl.java#startContainers方法启动Container
// Initialize the AMRMProxy service instance only if the container is of
// type AM and if the AMRMProxy service is enabled
if (amrmProxyEnabled && containerTokenIdentifier.getContainerType()
.equals(ContainerType.APPLICATION_MASTER)) {
this.getAMRMProxyService().processApplicationStartRequest(request);
}
performContainerPreStartChecks(nmTokenIdentifier, request,
containerTokenIdentifier);
// startContainerInternal后续会真正执行command启动ApplicationMaster,文末有容器启动过程
startContainerInternal(containerTokenIdentifier, request);
succeededContainers.add(containerId);
main class:
org.apache.spark.deploy.yarn.ApplicationMaster
二、Yarn部署流程(ApplicationMaster)
2.1 运行ApplicationMaster
源码位置:
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
def main(args: Array[String]): Unit = {
SignalUtils.registerLogger(log)
// 1. 封装各种参数
val amArgs = new ApplicationMasterArguments(args)
// ...
// 2. 创建一个ApplicationMaster
master = new ApplicationMaster(amArgs, sparkConf, yarnConf)
val ugi = ...
ugi.doAs(new PrivilegedExceptionAction[Unit]() {
// 3. 执行 master.run() 即正式运行Application Master逻辑
override def run(): Unit = System.exit(master.run())
})
}
2.1.1 封装参数ApplicationMasterArguments
new ApplicationMasterArguments(args)
源码位置:
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterArguments.scala
parseArgs方法一目了然(比如–jar和–class)
private def parseArgs(inputArgs: List[String]): Unit = {
// ...
while (!args.isEmpty) {
// ...
args match {
// --jar
case ("--jar") :: value :: tail =>
userJar = value
args = tail
// --class
case ("--class") :: value :: tail =>
userClass = value
args = tail
// ...
case _ =>
printUsageAndExit(1, args)
}
}
// ...
userArgs = userArgsBuffer.toList
}
2.1.2 执行master.run
master.run是一个final修饰的方法。
集群模式下运行Driver用户程序。
final def run(): Int = {
try {
val attemptID = ...
new CallerContext(
"APPMASTER", sparkConf.get(APP_CALLER_CONTEXT),
Option(appAttemptId.getApplicationId.toString), attemptID).setCurrentContext()
logInfo("ApplicationAttemptId: " + appAttemptId)
// This shutdown hook should run *after* the SparkContext is shut down.
// ...
if (isClusterMode) {
// 运行Driver端程序
runDriver()
} else {
runExecutorLauncher()
}
} catch {
// ...
} finally {
// ...
}
exitCode
}
runDriver()
private def runDriver(): Unit = {
addAmIpFilter(None, System.getenv(ApplicationConstants.APPLICATION_WEB_PROXY_BASE_ENV))
// 1. 启动用户的应用(上传的jar#main)
// 启动driver线程【已经启动了】
userClassThread = startUserApplication()
// ... 初始化Spark上下文环境
logInfo("Waiting for spark context initialization...")
val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
try {
val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
Duration(totalWaitTime, TimeUnit.MILLISECONDS))
if (sc != null) {
// 配置Driver的RPC环境
val rpcEnv = sc.env.rpcEnv
val userConf = sc.getConf
// ApplicationMaster启动所在节点的host
val host = userConf.get(DRIVER_HOST_ADDRESS)
// ApplicationMaster本次启动对外rpc的端口号
val port = userConf.get(DRIVER_PORT)
// 2. 向ResourceManager注册ApplicationMaster(host+port)
// sc.ui.map(_.webUrl)
// ApplicationMaster对外提供可追踪的web url,用户可以通过该url查看应用程序执行状态
registerAM(host, port, userConf, sc.ui.map(_.webUrl), appAttemptId)
// RPC Endpoint
val driverRef = rpcEnv.setupEndpointRef(
RpcAddress(host, port),
YarnSchedulerBackend.ENDPOINT_NAME)
// 3. 为ApplicationMaster管理的Executor分配Container
createAllocator(driverRef, userConf, rpcEnv, appAttemptId, distCacheConf)
} else {
// ...
throw new IllegalStateException("User did not initialize spark context!")
}
resumeDriver()
// 主线程等待Driver线程
userClassThread.join()
} catch {
// ...
} finally {
resumeDriver()
}
}
resumeDriver方法
private def resumeDriver(): Unit = {
// When initialization in runDriver happened the user class thread has to be resumed.
sparkContextPromise.synchronized {
sparkContextPromise.notify()
}
}
启动用户线程 startUserApplication。
Driver是个线程:启动一个线程去运行我们自己编写的Driver类的main方法。
/**
* Start the user class, which contains the spark driver, in a separate Thread.
* If the main routine exits cleanly or exits with System.exit(N) for any N
* we assume it was successful, for all other cases we assume failure.
*
* Returns the user thread that was started.
*/
private def startUserApplication(): Thread = {
logInfo("Starting the user application in a separate Thread")
var userArgs = args.userArgs
// ...
// 通过反射获取用户Driver类的main方法,是不会另起一个JVM进程的
val mainMethod = userClassLoader.loadClass(args.userClass)
.getMethod("main", classOf[Array[String]])
// 创建一个线程
val userThread = new Thread {
override def run(): Unit = {
try {
if (!Modifier.isStatic(mainMethod.getModifiers)) {
logError(s"Could not find static main method in object ${args.userClass}")
finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_EXCEPTION_USER_CLASS)
} else {
// 调用Driver类的main方法
mainMethod.invoke(null, userArgs.toArray)
finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
logDebug("Done running user class")
}
} catch {
// ...
} finally {
// ...
sparkContextPromise.trySuccess(null)
}
}
}
userThread.setContextClassLoader(userClassLoader)
// 线程名称-Driver
userThread.setName("Driver")
// 直接启动线程
userThread.start()
// 返回正在运行的线程
userThread
}
registerAM(host, port, userConf, sc.ui.map(_.webUrl))
private def registerAM(
host: String,
port: Int,
_sparkConf: SparkConf,
uiAddress: Option[String],
appAttempt: ApplicationAttemptId): Unit = {
// ...
// 这里的client是 private val client = new YarnRMClient()
// spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala
client.register(host, port, yarnConf, _sparkConf, uiAddress, historyAddress)
registered = true
}
YarnRMClient#register
注释和参数说明还是很清晰的。
/**
* Registers the application master with the RM.
*
* @param driverHost Host name where driver is running.
* @param driverPort Port where driver is listening.
* @param conf The Yarn configuration.
* @param sparkConf The Spark configuration.
* @param uiAddress Address of the SparkUI.
* @param uiHistoryAddress Address of the application on the History Server.
*/
def register(
driverHost: String,
driverPort: Int,
conf: YarnConfiguration,
sparkConf: SparkConf,
uiAddress: Option[String],
uiHistoryAddress: String): Unit = {
amClient = AMRMClient.createAMRMClient()
amClient.init(conf)
amClient.start()
this.uiHistoryAddress = uiHistoryAddress
val trackingUrl = uiAddress.getOrElse {
if (sparkConf.get(ALLOW_HISTORY_SERVER_TRACKING_URL)) uiHistoryAddress else ""
}
logInfo("Registering the ApplicationMaster")
synchronized {
// 向ResourceManager注册ApplicationMaster
// private var amClient: AMRMClient[ContainerRequest] = _
// AMRMClientImpl
amClient.registerApplicationMaster(driverHost, driverPort, trackingUrl)
registered = true
}
}
注册AM
为ApplicationMaster管理的Executor分配Container:
createAllocator(driverRef, userConf)
private def createAllocator(
driverRef: RpcEndpointRef,
_sparkConf: SparkConf,
rpcEnv: RpcEnv,
appAttemptId: ApplicationAttemptId,
distCacheConf: SparkConf): Unit = {
// 获取资源
// private val client = new YarnRMClient()
allocator = client.createAllocator(
yarnConf,
_sparkConf,
appAttemptId,
driverUrl,
driverRef,
securityMgr,
localResources)
// 分配资源
allocator.allocateResources()
// ...s
}
allocateResources
Container数量最大等于maxExecutors的数量。
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
/**
* Request resources such that, if YARN gives us all we ask for, we'll have a number of containers
* equal to maxExecutors.
*
* Deal with any containers YARN has granted to us by possibly launching executors in them.
*
* This must be synchronized because variables read in this method are mutated by other methods.
*/
def allocateResources(): Unit = synchronized {
updateResourceRequests()
val progressIndicator = 0.1f
// Poll the ResourceManager. This doubles as a heartbeat if there are no pending container requests.
// ApplicationMaster通过rpc向ResourceManager申请资源(资源以Container为单位)
val allocateResponse = amClient.allocate(progressIndicator)
// 获取分配到的容器(资源)
val allocatedContainers = allocateResponse.getAllocatedContainers()
allocatorBlacklistTracker.setNumClusterNodes(allocateResponse.getNumClusterNodes)
// 如果有可用的资源
if (allocatedContainers.size > 0) {
logDebug(("Allocated containers: %d. Current executor count: %d. " +
"Launching executor count: %d. Cluster resources: %s.")
.format(
allocatedContainers.size,
getNumExecutorsRunning,
getNumExecutorsStarting,
allocateResponse.getAvailableResources))
// 处理分配好的Container,循环启动Executor,等待分配Task
handleAllocatedContainers(allocatedContainers.asScala)
}
// ...
}
handleAllocatedContainers
运行容器
/**
* Handle containers granted by the RM by launching executors on them.
*
* Due to the way the YARN allocation protocol works, certain healthy race conditions can result
* in YARN granting containers that we no longer need. In this case, we release them.
*
* Visible for testing.
*/
def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)
// ...
// 运行分配好的容器,实际上这里就是“远程”启动Executor了
runAllocatedContainers(containersToUse)
logInfo("Received %d containers from YARN, launching executors on %d of them."
.format(allocatedContainers.size, containersToUse.size))
}
runAllocatedContainers
在分配的容器中启动executor
/**
* Launches executors in the allocated containers.
*/
private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = synchronized {
// 外层一个for循环
for (container <- containersToUse) {
// ...
if (rpRunningExecs < getOrUpdateTargetNumExecutorsForRPId(rpId)) {
getOrUpdateNumExecutorsStartingForRPId(rpId).incrementAndGet()
if (launchContainers) {
// private val launcherPool = ThreadUtils.newDaemonCachedThreadPool("ContainerLauncher", sparkConf.get(CONTAINER_LAUNCH_MAX_THREADS))
// 线程池 newDaemonCachedThreadPool
launcherPool.execute(() => {
try {
new ExecutorRunnable(
Some(container),
conf,
sparkConf,
driverUrl,
executorId,
executorHostname,
containerMem,
containerCores,
appAttemptId.getApplicationId.toString,
securityMgr,
localResources,
rp.id
).run()
updateInternalState()
} catch {
// ...
}
})
} else {
// For test only
updateInternalState()
}
} else {
// ...
}
}
}
new ExecutorRunnable(xxx).run
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
private[yarn] class ExecutorRunnable(
container: Option[Container],
conf: YarnConfiguration,
sparkConf: SparkConf,
masterAddress: String,
executorId: String,
hostname: String,
executorMemory: Int,
executorCores: Int,
appId: String,
securityMgr: SecurityManager,
localResources: Map[String, LocalResource],
resourceProfileId: Int) extends Logging {
var rpc: YarnRPC = YarnRPC.create(conf)
// YARN NodeManager客户端
var nmClient: NMClient = _
def run(): Unit = {
logDebug("Starting Executor Container")
nmClient = NMClient.createNMClient()
nmClient.init(conf)
nmClient.start()
// NodeManager启动Container
startContainer()
}
// ...
}
**startContainer**
```scala
def startContainer(): java.util.Map[String, ByteBuffer] = {
// 1. 准备命令
val commands = prepareCommand()
// 2. Send the start request to the ContainerManager
try {
nmClient.startContainer(container.get, ctx)
} catch {
// ...
}
}
准备运行Executor的命令脚本。
prepareCommand
jps看到的CoarseGrainedExecutorBackend
private def prepareCommand(): List[String] = {
// Extra options for the JVM
// 最终的执行命令行脚本
val commands = prefixEnv ++
Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
javaOpts ++
Seq("org.apache.spark.executor.YarnCoarseGrainedExecutorBackend",
"--driver-url", masterAddress,
"--executor-id", executorId,
"--hostname", hostname,
"--cores", executorCores.toString,
"--app-id", appId,
"--resourceProfileId", resourceProfileId.toString) ++
userClassPath ++
Seq(
s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")
// TODO: it would be nicer to just make sure there are no null commands here
commands.map(s => if (s == null) "null" else s).toList
}
实际运行Container(AM(Driver)或Executor)
实际运行Container是在如下源码位置:
Hadoop源码:
org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
根据StartContainerRequest获取相关参数。
比如ContainerLaunchContext对象。
首先在ContainerManagerImpl的构造方法里面看到:
// ContainerManager level dispatcher.
dispatcher = new AsyncDispatcher();
containersLauncher = createContainersLauncher(context, exec);
dispatcher.register(ContainersLauncherEventType.class, containersLauncher);
然后看startContainerInternal方法:
LOG.info("Start request for " + containerIdStr + " by user " + user);
// 1. 从request中获取ContainerLaunchContext对象
ContainerLaunchContext launchContext = request.getContainerLaunchContext();
// 安全认证信息
Credentials credentials = parseCredentials(launchContext);
// 2. 创建Container
Container container =
new ContainerImpl(getConfig(), this.dispatcher,
launchContext, credentials, metrics, containerTokenIdentifier,
context);
// 3. 初始化Container
// 处理Container的状态变化
// ContainerImpl.java#StateMachineFactory#.addTransition#ContainerEventType.INIT_CONTAINER, new RequestResourcesTransition()
// 会走到ContainerImpl.java#RequestResourcesTransition#transition#container.sendLaunchEvent();
// 最后ContainersLauncher#handle会处理ContainersLauncherEvent事件
dispatcher.getEventHandler().handle(
new ApplicationContainerInitEvent(container))
ContainersLauncher#handle
switch (event.getType()) {
// 启动Container
case LAUNCH_CONTAINER:
Application app =
context.getApplications().get(
containerId.getApplicationAttemptId().getApplicationId());
// public class ContainerLaunch implements Callable<Integer> {...}
ContainerLaunch launch =
new ContainerLaunch(context, getConfig(), dispatcher, exec, app,
event.getContainer(), dirsHandler, containerManager);
// launch是一个Callable
// containerLauncher是一个Executors.newCachedThreadPool
containerLauncher.submit(launch);
running.put(containerId, launch);
break;
最终执行ContainerLaunch#call方法
// Write out the environment 这里包含启动具体Executor进程的命令
exec.writeLaunchEnv(containerScriptOutStream, environment, localResources,
launchContext.getCommands());
writeLaunchEnv方法
public void writeLaunchEnv(OutputStream out, Map<String, String> environment, Map<Path, List<String>> resources, List<String> command) throws IOException{
// 创建Shell脚本
// return Shell.WINDOWS ? new WindowsShellScriptBuilder() : new UnixShellScriptBuilder();
ContainerLaunch.ShellScriptBuilder sb = ContainerLaunch.ShellScriptBuilder.create();
if (environment != null) {
for (Map.Entry<String,String> env : environment.entrySet()) {
sb.env(env.getKey().toString(), env.getValue().toString());
}
}
if (resources != null) {
for (Map.Entry<Path,List<String>> entry : resources.entrySet()) {
for (String linkName : entry.getValue()) {
sb.symlink(entry.getKey(), new Path(linkName));
}
}
}
// 具体执行Shell
// UnixShellScriptBuilder
// line("exec /bin/bash -c \"", StringUtils.join(" ", command), "\"");
// 比如 $JAVA_HOME/bin/java -server xxxx
sb.command(command);
PrintStream pout = null;
try {
pout = new PrintStream(out, false, "UTF-8");
sb.write(pout);
} finally {
if (out != null) {
out.close();
}
}
}
附:ContainerState
public enum ContainerState {
NEW, LOCALIZING, LOCALIZATION_FAILED, LOCALIZED, RUNNING, EXITED_WITH_SUCCESS,
EXITED_WITH_FAILURE, KILLING, CONTAINER_CLEANEDUP_AFTER_KILL,
CONTAINER_RESOURCES_CLEANINGUP, DONE
}
再后面就到了:
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend
说明:
- 程序中通过反射拿到class调用main方法,是在原主进程中运行程序,不会创建新的进程;
- 程序中通过java -server运行可执行的class,是会创建新的进程,jps -l可以看到进程对应的java main class qualified name,而原主进程可能会自然结束。