spark thriftserver2 配置 spark driver_jar

向spark集群(standalone)提交作业,我们通常用如下命令

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \

这个命令行使用的是SparkSubmit向集群提交任务,具体提交流程参考上图;

首先SparkSubmit会解析命令行,将命令行中的参数映射到自身的变量中,命令行解析用的方法为prepareSubmitEnvironment(args),

val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)

这里特别要注意的是childMainClass,这个class才是向Master提交Driver的具体实现;因为我们这里使用的是standalone集群,所以childMainClass为ClientApp;

其他的方式有

private[deploy] val YARN_CLUSTER_SUBMIT_CLASS =
  "org.apache.spark.deploy.yarn.YarnClusterApplication"
private[deploy] val REST_CLUSTER_SUBMIT_CLASS = classOf[RestSubmissionClientApp].getName()
private[deploy] val STANDALONE_CLUSTER_SUBMIT_CLASS = classOf[ClientApp].getName()
private[deploy] val KUBERNETES_CLUSTER_SUBMIT_CLASS =
  "org.apache.spark.deploy.k8s.submit.KubernetesClientApplication"

确定了childMainClass类型后,SparkSubmit运用反射获得ClientApp对象,接着就调用start方法:

app.start(childArgs.toArray, sparkC

在ClientApp对象里,首先会创建NettyRpcEnv,然后获得和Master通信的masterEndpoint,最后构建ClientEndpoint,并在NettyRpcEnv上注册;clientApp通过ClientEndpint向Master发送driver注册请求

//    创建NettyRpcEnv
    val rpcEnv =
      RpcEnv.create("driverClient", Utils.localHostName(), 0, conf, new SecurityManager(conf))
//    获得和Master通信的endPoint
    val masterEndpoints = driverArgs.masters.map(RpcAddress.fromSparkURL).
      map(rpcEnv.setupEndpointRef(_, Master.ENDPOINT_NAME))
//    在NettyRpcEnv注册ClientEndpoint
    rpcEnv.setupEndpoint("client", new ClientEndpoint(rpcEnv, driverArgs, masterEndpoints, conf))

ClientEndpoint注册后,会自动调用自身的onStart() 方法,在onstart方法中调用asyncSendToMasterAndForwardReply(RequestSubmitDirver)向Master发送driver注册请求

val driverDescription = new DriverDescription(
  driverArgs.jarUrl,
  driverArgs.memory,
  driverArgs.cores,
  driverArgs.supervise,
  command)
asyncSendToMasterAndForwardReply[SubmitDriverResponse](
  RequestSubmitDriver(driverDescription))

最后我们看下Master接收到driver注册请求后的处理流程;在Master处理具有返回的请求使用的是receiveAndReply方法,MasterEndpoint接收到ClientEndpoint

发送过来的RequestSubmitDriver请求后,首先做的是,创建driver,然后用persistenceEngine对driver进行持久化,这里的持久化主要是为了master

进程recoverying时恢复dirver;最后调用schedule方法,并返回响应SubmitDriverResponse

override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
//    ClientEndpoint  RequestSubmitDriver
    case RequestSubmitDriver(description) =>
      if (state != RecoveryState.ALIVE) {
        val msg = s"${Utils.BACKUP_STANDALONE_MASTER_PREFIX}: $state. " +
          "Can only accept driver submissions in ALIVE state."
        context.reply(SubmitDriverResponse(self, false, None, msg))
      } else {
        logInfo("Driver submitted " + description.command.mainClass)
        val driver = createDriver(description)
        persistenceEngine.addDriver(driver)
        waitingDrivers += driver
        drivers.add(driver)
        schedule()
//    向ClientEndpoint 返回dirver提交响应信息
        context.reply(SubmitDriverResponse(self, true, Some(driver.id),
          s"Driver successfully submitted as ${driver.id}"))
      }

在standalone模式下,driver进程是运行在worker节点上的;而真正launch driver的方法入口是schedule(),这个方法除了在worker上启动driver外,还负责在worker

启动executor;这里我们只关注它是如何启动dirver的;在worker上启动dirver需要Master给worker发送driver启动消息;具体方法为:

private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
    logInfo("Launching driver " + driver.id + " on worker " + worker.id)
    worker.addDriver(driver)
    driver.worker = Some(worker)
//    要启动driver进程的worker是随机选出来的
    worker.endpoint.send(LaunchDriver(driver.id, driver.desc))
    driver.state = DriverState.RUNNING
  }

worker收到LaunchDriver请求后,创建DriverRunner对象,调用start方法启动driver,

//      在worker节点上启动dirver
    case LaunchDriver(driverId, driverDesc) =>
      logInfo(s"Asked to launch driver $driverId")
      val driver = new DriverRunner(
        conf,
        driverId,
        workDir,
        sparkHome,
        driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
        self,
        workerUri,
        securityMgr)
      drivers(driverId) = driver
//      启动driver
      driver.start()

启动driver其实就构建linux命令行的方式在worker启动进程

runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)<meta charset="utf-8">

当然在启动执行命令之前,还有一些下载jar和创建目录的操作,也就是配置运行环境

private[worker] def prepareAndRunDriver(): Int = {
//    创建driver目录
    val driverDir = createWorkingDirectory()
//    下载jar
    val localJarFilename = downloadUserJar(driverDir)
    def substituteVariables(argument: String): String = argument match {
      case "{{WORKER_URL}}" => workerUrl
      case "{{USER_JAR}}" => localJarFilename
      case other => other
    }
    // TODO: If we add ability to submit multiple jars they should also be added here
    val builder = CommandUtils.buildProcessBuilder(driverDesc.command, securityManager,
      driverDesc.mem, sparkHome.getAbsolutePath, substituteVariables)
//    启动driver,之后就是初始化sparkcontext
    runDriver(builder, driverDir, driverDesc.supervise)
  }

最后worker将driver进程返回状态信息发送给Master

worker.send(DriverStateChanged(driverId, finalState.get, finalException))

到这里driver就在worker启动完成了