Spark源码解析-Yarn部署流程(ApplicationMaster)

可微信搜索 知了小巷 ,关注公众号支持一下,谢谢。另外,公众号后台回复 资料 ,可领取大数据2020学习视频资料。

前文【Spark源码解析Yarn部署流程(SparkSubmit)】

createContainerLaunchContext
用来运行ApplicationMaster。
主要调用是在:

yarnClient.submitApplication(appContext)。

RM:ResourceManager
At this point, the RM will have accepted the application and in the background, will go through the process of allocating a container with the required specifications and then eventually setting up and launching the AM on the allocated container.

RM服务端实现
resourcemanager#ClientRMService.java#submitApplication

// RMAppManager.java#submitApplication
rmAppManager.submitApplication(submissionContext, System.currentTimeMillis(), user);

RMAppManager.java#submitApplication

// 1. RMAppManager.java#submitApplication方法
RMAppImpl application = createAndPopulateNewRMApp(
        submissionContext, submitTime, user, false, -1, null);

// 2. createAndPopulateNewRMApp方法
// Create RMApp
RMAppImpl application =
    new RMAppImpl(applicationId, rmContext, this.conf,
        submissionContext.getApplicationName(), user,
        submissionContext.getQueue(),
        submissionContext, this.scheduler, this.masterService,
        submitTime, submissionContext.getApplicationType(),
        submissionContext.getApplicationTags(), amReqs, placementContext,
        startTime); 

if (UserGroupInformation.isSecurityEnabled()) {
	// ...
} else {
	// 向RMAppImpl 发送 START事件
	this.rmContext.getDispatcher().getEventHandler()
	    .handle(new RMAppEvent(applicationId, RMAppEventType.START));
}        

// 3. RMAppImpl.java#RMAppImpl构造方法
this.stateMachine = stateMachineFactory.make(this);       

// 后面会调用RMAppImpl里的handle方法
this.stateMachine.doTransition(event.getType(), event);

会执行到状态转化的过程:
RMAppImpl.java#StateMachineFactory

// 收到START事件,调用RMAppNewlySavingTransition#doTransition函数,并且RMAppImpl状态由NEW转化成NEW_SAVING
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
        RMAppEventType.START, new RMAppNewlySavingTransition())

后续会有一系列的事件处理(略),关键状态到调度SCHEDULED:
RMAppAttemptImpl#ScheduleTransition
调度器分配资源allocate。

// AM resource has been checked when submission
Allocation amContainerAllocation =
    appAttempt.scheduler.allocate(...);

CapacityScheduler调度的过程:
CapacityScheduler.java#allocate方法

// CapacityScheduler.java#allocate方法
@Override
@Lock(Lock.NoLock.class)
public Allocation allocate(ApplicationAttemptId applicationAttemptId,
  List<ResourceRequest> ask, List<SchedulingRequest> schedulingRequests,
  List<ContainerId> release, List<String> blacklistAdditions,
  List<String> blacklistRemovals, ContainerUpdates updateRequests) {

  	// ...
  	// Update application requests
  	// 重点在这里 updateResourceRequests
    if (application.updateResourceRequests(ask) || application
        .updateSchedulingRequests(schedulingRequests)) {
      updateDemandForQueue = (LeafQueue) application.getQueue();
    }
    // ...
}      

// 后面调到AppSchedulingInfo.java#updateResourceRequests方法
// The ApplicationMaster is updating resource requirements for the
// application, by asking for more resources and releasing resources acquired
// by the application.

// 然后是LocalityAppPlacementAllocator.java#updatePendingAsk方法
// Update resource requests
for (ResourceRequest request : requests) {
	// Update asks 把request放入Map里面等待NodeManager心跳
    resourceRequestMap.put(resourceName, request);
}

NameNode心跳过来之后,CapacityScheduler的handle方法:

@Override
public void handle(SchedulerEvent event) {
	// ...
	case NODE_UPDATE:
    {
      NodeUpdateSchedulerEvent nodeUpdatedEvent = (NodeUpdateSchedulerEvent)event;
      nodeUpdate(nodeUpdatedEvent.getRMNode());
    }
    break;
    // ...
}

// 在nodeUpdate方法里面,会调用allocateContainersToNode(rmNode.getNodeID(), true);

// 在allocateContainersToNode方法里面,会调用allocateContainersToNode(candidates, withNodeHeartbeat)

// 然后是allocateContainerOnSingleNode(candidates, node, withNodeHeartbeat)

// 然后是submitResourceCommitRequest(getClusterResource(), assignment);

// 在submitResourceCommitRequest方法中,异步或同步commit
tryCommit(cluster, request, true);

// 在tryCommit方法中
app.apply(cluster, request, updatePending)

// 在apply方法中,向RMContainerImpl发送RMContainerEvent RMContainerEventType.START事件
rmContainer.handle(
              new RMContainerEvent(containerId, RMContainerEventType.START));

// 又是一系列的事件处理...好累...
// ApplicationMasterLauncher.java#handle
case LAUNCH:
      launch(application);
      break;     
// AMLauncher#launch 启动AM方法:与对应的NodeManager通信,启动AM
public class AMLauncher implements Runnable {...}      

// ContainerManagerImpl.java#startContainers方法启动Container
// Initialize the AMRMProxy service instance only if the container is of
// type AM and if the AMRMProxy service is enabled
if (amrmProxyEnabled && containerTokenIdentifier.getContainerType()
  .equals(ContainerType.APPLICATION_MASTER)) {
this.getAMRMProxyService().processApplicationStartRequest(request);
}
performContainerPreStartChecks(nmTokenIdentifier, request,
  containerTokenIdentifier);
// startContainerInternal后续会真正执行command启动ApplicationMaster,文末有容器启动过程
startContainerInternal(containerTokenIdentifier, request);
succeededContainers.add(containerId);

main class:
org.apache.spark.deploy.yarn.ApplicationMaster

二、Yarn部署流程(ApplicationMaster)

2.1 运行ApplicationMaster

源码位置:
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala

def main(args: Array[String]): Unit = {
	SignalUtils.registerLogger(log)
	// 1. 封装各种参数
	val amArgs = new ApplicationMasterArguments(args)
	// ...
	// 2. 创建一个ApplicationMaster
	master = new ApplicationMaster(amArgs, sparkConf, yarnConf)

	val ugi = ...

	ugi.doAs(new PrivilegedExceptionAction[Unit]() {
	  // 3. 执行 master.run() 即正式运行Application Master逻辑
	  override def run(): Unit = System.exit(master.run())
	})
}
2.1.1 封装参数ApplicationMasterArguments

new ApplicationMasterArguments(args)
源码位置:
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterArguments.scala
parseArgs方法一目了然(比如–jar和–class)

private def parseArgs(inputArgs: List[String]): Unit = {
	// ...
    while (!args.isEmpty) {
      // ...
      args match {
      	// --jar
        case ("--jar") :: value :: tail =>
          userJar = value
          args = tail

        // --class
        case ("--class") :: value :: tail =>
          userClass = value
          args = tail

        // ...

        case _ =>
          printUsageAndExit(1, args)
      }
    }

    // ...

    userArgs = userArgsBuffer.toList
}
2.1.2 执行master.run

master.run是一个final修饰的方法。
集群模式下运行Driver用户程序。

final def run(): Int = {
    try {
      val attemptID = ...

      new CallerContext(
        "APPMASTER", sparkConf.get(APP_CALLER_CONTEXT),
        Option(appAttemptId.getApplicationId.toString), attemptID).setCurrentContext()

      logInfo("ApplicationAttemptId: " + appAttemptId)

      // This shutdown hook should run *after* the SparkContext is shut down.
      // ...

      if (isClusterMode) {
      	// 运行Driver端程序
        runDriver()
      } else {
        runExecutorLauncher()
      }
    } catch {
      // ...
    } finally {
      // ...
    }

    exitCode
}

runDriver()

private def runDriver(): Unit = {
    addAmIpFilter(None, System.getenv(ApplicationConstants.APPLICATION_WEB_PROXY_BASE_ENV))
    // 1. 启动用户的应用(上传的jar#main)  
    // 启动driver线程【已经启动了】
    userClassThread = startUserApplication()

    // ... 初始化Spark上下文环境
    logInfo("Waiting for spark context initialization...")
    val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
    try {
      val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
        Duration(totalWaitTime, TimeUnit.MILLISECONDS))
      if (sc != null) {
      	// 配置Driver的RPC环境
        val rpcEnv = sc.env.rpcEnv

        val userConf = sc.getConf
        // ApplicationMaster启动所在节点的host
        val host = userConf.get(DRIVER_HOST_ADDRESS)
        // ApplicationMaster本次启动对外rpc的端口号
        val port = userConf.get(DRIVER_PORT)
        // 2. 向ResourceManager注册ApplicationMaster(host+port)
        // sc.ui.map(_.webUrl)
        // ApplicationMaster对外提供可追踪的web url,用户可以通过该url查看应用程序执行状态

        registerAM(host, port, userConf, sc.ui.map(_.webUrl), appAttemptId)

        // RPC Endpoint
        val driverRef = rpcEnv.setupEndpointRef(
          RpcAddress(host, port),
          YarnSchedulerBackend.ENDPOINT_NAME)

       	// 3. 为ApplicationMaster管理的Executor分配Container
        createAllocator(driverRef, userConf, rpcEnv, appAttemptId, distCacheConf)
      } else {
        // ...
        throw new IllegalStateException("User did not initialize spark context!")
      }
      resumeDriver()
      // 主线程等待Driver线程
      userClassThread.join()
    } catch {
      // ...
    } finally {
      resumeDriver()
    }
}

resumeDriver方法

private def resumeDriver(): Unit = {
	// When initialization in runDriver happened the user class thread has to be resumed.
	sparkContextPromise.synchronized {
	  sparkContextPromise.notify()
	}
}

启动用户线程 startUserApplication
Driver是个线程:启动一个线程去运行我们自己编写的Driver类的main方法。

/**
 * Start the user class, which contains the spark driver, in a separate Thread.
 * If the main routine exits cleanly or exits with System.exit(N) for any N
 * we assume it was successful, for all other cases we assume failure.
 *
 * Returns the user thread that was started.
 */
private def startUserApplication(): Thread = {
	logInfo("Starting the user application in a separate Thread")

	var userArgs = args.userArgs
	// ...
	// 通过反射获取用户Driver类的main方法,是不会另起一个JVM进程的
	val mainMethod = userClassLoader.loadClass(args.userClass)
	  .getMethod("main", classOf[Array[String]])
	// 创建一个线程
	val userThread = new Thread {
	  override def run(): Unit = {
	    try {
	      if (!Modifier.isStatic(mainMethod.getModifiers)) {
	        logError(s"Could not find static main method in object ${args.userClass}")
	        finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_EXCEPTION_USER_CLASS)
	      } else {
	        // 调用Driver类的main方法
	        mainMethod.invoke(null, userArgs.toArray)
	        finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
	        logDebug("Done running user class")
	      }
	    } catch {
	      // ...
	    } finally {
	      // ...
	      sparkContextPromise.trySuccess(null)
	    }
	  }
	}
	userThread.setContextClassLoader(userClassLoader)
	// 线程名称-Driver
	userThread.setName("Driver")
	// 直接启动线程
	userThread.start()
	// 返回正在运行的线程
	userThread
}

registerAM(host, port, userConf, sc.ui.map(_.webUrl))

private def registerAM(
      host: String,
      port: Int,
      _sparkConf: SparkConf,
      uiAddress: Option[String],
      appAttempt: ApplicationAttemptId): Unit = {
    // ...

    // 这里的client是 private val client = new YarnRMClient()
    // spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala
    client.register(host, port, yarnConf, _sparkConf, uiAddress, historyAddress)
    registered = true
}

YarnRMClient#register
注释和参数说明还是很清晰的。

/**
 * Registers the application master with the RM.
 *
 * @param driverHost Host name where driver is running.
 * @param driverPort Port where driver is listening.
 * @param conf The Yarn configuration.
 * @param sparkConf The Spark configuration.
 * @param uiAddress Address of the SparkUI.
 * @param uiHistoryAddress Address of the application on the History Server.
 */
def register(
	  driverHost: String,
	  driverPort: Int,
	  conf: YarnConfiguration,
	  sparkConf: SparkConf,
	  uiAddress: Option[String],
	  uiHistoryAddress: String): Unit = {
	amClient = AMRMClient.createAMRMClient()
	amClient.init(conf)
	amClient.start()
	this.uiHistoryAddress = uiHistoryAddress

	val trackingUrl = uiAddress.getOrElse {
	  if (sparkConf.get(ALLOW_HISTORY_SERVER_TRACKING_URL)) uiHistoryAddress else ""
	}

	logInfo("Registering the ApplicationMaster")
	synchronized {
	  // 向ResourceManager注册ApplicationMaster
	  // private var amClient: AMRMClient[ContainerRequest] = _
	  // AMRMClientImpl
	  amClient.registerApplicationMaster(driverHost, driverPort, trackingUrl)
	  registered = true
	}
}

注册AM

为ApplicationMaster管理的Executor分配Container:
createAllocator(driverRef, userConf)

private def createAllocator(
      driverRef: RpcEndpointRef,
      _sparkConf: SparkConf,
      rpcEnv: RpcEnv,
      appAttemptId: ApplicationAttemptId,
      distCacheConf: SparkConf): Unit = {
    // 获取资源
    // private val client = new YarnRMClient()
    allocator = client.createAllocator(
      yarnConf,
      _sparkConf,
      appAttemptId,
      driverUrl,
      driverRef,
      securityMgr,
      localResources)

    // 分配资源
    allocator.allocateResources()
    // ...s
}

allocateResources
Container数量最大等于maxExecutors的数量。
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

/**
 * Request resources such that, if YARN gives us all we ask for, we'll have a number of containers
 * equal to maxExecutors.
 *
 * Deal with any containers YARN has granted to us by possibly launching executors in them.
 *
 * This must be synchronized because variables read in this method are mutated by other methods.
 */
def allocateResources(): Unit = synchronized {
    updateResourceRequests()

    val progressIndicator = 0.1f
    // Poll the ResourceManager. This doubles as a heartbeat if there are no pending container requests.
    // ApplicationMaster通过rpc向ResourceManager申请资源(资源以Container为单位)
    val allocateResponse = amClient.allocate(progressIndicator)
    // 获取分配到的容器(资源)
    val allocatedContainers = allocateResponse.getAllocatedContainers()
    allocatorBlacklistTracker.setNumClusterNodes(allocateResponse.getNumClusterNodes)
    // 如果有可用的资源
    if (allocatedContainers.size > 0) {
      logDebug(("Allocated containers: %d. Current executor count: %d. " +
        "Launching executor count: %d. Cluster resources: %s.")
        .format(
          allocatedContainers.size,
          getNumExecutorsRunning,
          getNumExecutorsStarting,
          allocateResponse.getAvailableResources))
      // 处理分配好的Container,循环启动Executor,等待分配Task
      handleAllocatedContainers(allocatedContainers.asScala)
    }

    // ...
}

handleAllocatedContainers
运行容器

/**
 * Handle containers granted by the RM by launching executors on them.
 *
 * Due to the way the YARN allocation protocol works, certain healthy race conditions can result
 * in YARN granting containers that we no longer need. In this case, we release them.
 *
 * Visible for testing.
 */
def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
    val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)

    // ...
    // 运行分配好的容器,实际上这里就是“远程”启动Executor了
    runAllocatedContainers(containersToUse)

    logInfo("Received %d containers from YARN, launching executors on %d of them."
      .format(allocatedContainers.size, containersToUse.size))
}

runAllocatedContainers
在分配的容器中启动executor

/**
 * Launches executors in the allocated containers.
 */
private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = synchronized {
	// 外层一个for循环
    for (container <- containersToUse) {
      // ...
      if (rpRunningExecs < getOrUpdateTargetNumExecutorsForRPId(rpId)) {
        getOrUpdateNumExecutorsStartingForRPId(rpId).incrementAndGet()
        if (launchContainers) {
          // private val launcherPool = ThreadUtils.newDaemonCachedThreadPool("ContainerLauncher", sparkConf.get(CONTAINER_LAUNCH_MAX_THREADS))
          // 线程池 newDaemonCachedThreadPool
          launcherPool.execute(() => {
            try {
              new ExecutorRunnable(
                Some(container),
                conf,
                sparkConf,
                driverUrl,
                executorId,
                executorHostname,
                containerMem,
                containerCores,
                appAttemptId.getApplicationId.toString,
                securityMgr,
                localResources,
                rp.id
              ).run()
              updateInternalState()
            } catch {
              // ...
            }
          })
        } else {
          // For test only
          updateInternalState()
        }
      } else {
        // ...
      }
    }
}

new ExecutorRunnable(xxx).run
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala

private[yarn] class ExecutorRunnable(
    container: Option[Container],
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    masterAddress: String,
    executorId: String,
    hostname: String,
    executorMemory: Int,
    executorCores: Int,
    appId: String,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource],
    resourceProfileId: Int) extends Logging {

  var rpc: YarnRPC = YarnRPC.create(conf)
  // YARN NodeManager客户端
  var nmClient: NMClient = _

  def run(): Unit = {
    logDebug("Starting Executor Container")
    nmClient = NMClient.createNMClient()
    nmClient.init(conf)
    nmClient.start()
    // NodeManager启动Container
    startContainer()
  }
  // ...
}  

**startContainer**  
```scala
def startContainer(): java.util.Map[String, ByteBuffer] = {
    // 1. 准备命令
    val commands = prepareCommand()

    // 2. Send the start request to the ContainerManager
    try {
      nmClient.startContainer(container.get, ctx)
    } catch {
      // ...
    }
}

准备运行Executor的命令脚本。
prepareCommand
jps看到的CoarseGrainedExecutorBackend

private def prepareCommand(): List[String] = {
    // Extra options for the JVM
    // 最终的执行命令行脚本
    val commands = prefixEnv ++
      Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
      javaOpts ++
      Seq("org.apache.spark.executor.YarnCoarseGrainedExecutorBackend",
        "--driver-url", masterAddress,
        "--executor-id", executorId,
        "--hostname", hostname,
        "--cores", executorCores.toString,
        "--app-id", appId,
        "--resourceProfileId", resourceProfileId.toString) ++
      userClassPath ++
      Seq(
        s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
        s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")

    // TODO: it would be nicer to just make sure there are no null commands here
    commands.map(s => if (s == null) "null" else s).toList
}

实际运行Container(AM(Driver)或Executor)

实际运行Container是在如下源码位置:
Hadoop源码:
org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
根据StartContainerRequest获取相关参数。
比如ContainerLaunchContext对象。
首先在ContainerManagerImpl的构造方法里面看到:

// ContainerManager level dispatcher.
dispatcher = new AsyncDispatcher();

containersLauncher = createContainersLauncher(context, exec);

dispatcher.register(ContainersLauncherEventType.class, containersLauncher);

然后看startContainerInternal方法:

LOG.info("Start request for " + containerIdStr + " by user " + user);
// 1. 从request中获取ContainerLaunchContext对象
ContainerLaunchContext launchContext = request.getContainerLaunchContext();
// 安全认证信息
Credentials credentials = parseCredentials(launchContext);

// 2. 创建Container
Container container =
    new ContainerImpl(getConfig(), this.dispatcher,
        launchContext, credentials, metrics, containerTokenIdentifier,
        context);

// 3. 初始化Container 
// 处理Container的状态变化
// ContainerImpl.java#StateMachineFactory#.addTransition#ContainerEventType.INIT_CONTAINER, new RequestResourcesTransition()
// 会走到ContainerImpl.java#RequestResourcesTransition#transition#container.sendLaunchEvent();
// 最后ContainersLauncher#handle会处理ContainersLauncherEvent事件
dispatcher.getEventHandler().handle(
          new ApplicationContainerInitEvent(container))

ContainersLauncher#handle

switch (event.getType()) {
  // 启动Container
  case LAUNCH_CONTAINER:
    Application app =
      context.getApplications().get(
          containerId.getApplicationAttemptId().getApplicationId());

    // public class ContainerLaunch implements Callable<Integer> {...}     
    ContainerLaunch launch =
        new ContainerLaunch(context, getConfig(), dispatcher, exec, app,
          event.getContainer(), dirsHandler, containerManager);
    // launch是一个Callable
    // containerLauncher是一个Executors.newCachedThreadPool
    containerLauncher.submit(launch);

    running.put(containerId, launch);
    break;

最终执行ContainerLaunch#call方法

// Write out the environment 这里包含启动具体Executor进程的命令
exec.writeLaunchEnv(containerScriptOutStream, environment, localResources,
    launchContext.getCommands());

writeLaunchEnv方法

public void writeLaunchEnv(OutputStream out, Map<String, String> environment, Map<Path, List<String>> resources, List<String> command) throws IOException{
	// 创建Shell脚本 
	// return Shell.WINDOWS ? new WindowsShellScriptBuilder() : new UnixShellScriptBuilder();
    ContainerLaunch.ShellScriptBuilder sb = ContainerLaunch.ShellScriptBuilder.create();
    if (environment != null) {
      for (Map.Entry<String,String> env : environment.entrySet()) {
        sb.env(env.getKey().toString(), env.getValue().toString());
      }
    }
    if (resources != null) {
      for (Map.Entry<Path,List<String>> entry : resources.entrySet()) {
        for (String linkName : entry.getValue()) {
          sb.symlink(entry.getKey(), new Path(linkName));
        }
      }
    }

    // 具体执行Shell
    // UnixShellScriptBuilder
    // line("exec /bin/bash -c \"", StringUtils.join(" ", command), "\"");
    // 比如 $JAVA_HOME/bin/java -server xxxx
    sb.command(command);

    PrintStream pout = null;
    try {
      pout = new PrintStream(out, false, "UTF-8");
      sb.write(pout);
    } finally {
      if (out != null) {
        out.close();
      }
    }
}

附:ContainerState

public enum ContainerState {
  NEW, LOCALIZING, LOCALIZATION_FAILED, LOCALIZED, RUNNING, EXITED_WITH_SUCCESS,
  EXITED_WITH_FAILURE, KILLING, CONTAINER_CLEANEDUP_AFTER_KILL,
  CONTAINER_RESOURCES_CLEANINGUP, DONE
}

再后面就到了:
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend

说明:

  1. 程序中通过反射拿到class调用main方法,是在原主进程中运行程序,不会创建新的进程;
  2. 程序中通过java -server运行可执行的class,是会创建新的进程,jps -l可以看到进程对应的java main class qualified name,而原主进程可能会自然结束。