向spark集群(standalone)提交作业,我们通常用如下命令
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
这个命令行使用的是SparkSubmit向集群提交任务,具体提交流程参考上图;
首先SparkSubmit会解析命令行,将命令行中的参数映射到自身的变量中,命令行解析用的方法为prepareSubmitEnvironment(args),
val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
这里特别要注意的是childMainClass,这个class才是向Master提交Driver的具体实现;因为我们这里使用的是standalone集群,所以childMainClass为ClientApp;
其他的方式有
private[deploy] val YARN_CLUSTER_SUBMIT_CLASS =
"org.apache.spark.deploy.yarn.YarnClusterApplication"
private[deploy] val REST_CLUSTER_SUBMIT_CLASS = classOf[RestSubmissionClientApp].getName()
private[deploy] val STANDALONE_CLUSTER_SUBMIT_CLASS = classOf[ClientApp].getName()
private[deploy] val KUBERNETES_CLUSTER_SUBMIT_CLASS =
"org.apache.spark.deploy.k8s.submit.KubernetesClientApplication"
确定了childMainClass类型后,SparkSubmit运用反射获得ClientApp对象,接着就调用start方法:
app.start(childArgs.toArray, sparkC
在ClientApp对象里,首先会创建NettyRpcEnv,然后获得和Master通信的masterEndpoint,最后构建ClientEndpoint,并在NettyRpcEnv上注册;clientApp通过ClientEndpint向Master发送driver注册请求
// 创建NettyRpcEnv
val rpcEnv =
RpcEnv.create("driverClient", Utils.localHostName(), 0, conf, new SecurityManager(conf))
// 获得和Master通信的endPoint
val masterEndpoints = driverArgs.masters.map(RpcAddress.fromSparkURL).
map(rpcEnv.setupEndpointRef(_, Master.ENDPOINT_NAME))
// 在NettyRpcEnv注册ClientEndpoint
rpcEnv.setupEndpoint("client", new ClientEndpoint(rpcEnv, driverArgs, masterEndpoints, conf))
ClientEndpoint注册后,会自动调用自身的onStart() 方法,在onstart方法中调用asyncSendToMasterAndForwardReply(RequestSubmitDirver)向Master发送driver注册请求
val driverDescription = new DriverDescription(
driverArgs.jarUrl,
driverArgs.memory,
driverArgs.cores,
driverArgs.supervise,
command)
asyncSendToMasterAndForwardReply[SubmitDriverResponse](
RequestSubmitDriver(driverDescription))
最后我们看下Master接收到driver注册请求后的处理流程;在Master处理具有返回的请求使用的是receiveAndReply方法,MasterEndpoint接收到ClientEndpoint
发送过来的RequestSubmitDriver请求后,首先做的是,创建driver,然后用persistenceEngine对driver进行持久化,这里的持久化主要是为了master
进程recoverying时恢复dirver;最后调用schedule方法,并返回响应SubmitDriverResponse
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
// ClientEndpoint RequestSubmitDriver
case RequestSubmitDriver(description) =>
if (state != RecoveryState.ALIVE) {
val msg = s"${Utils.BACKUP_STANDALONE_MASTER_PREFIX}: $state. " +
"Can only accept driver submissions in ALIVE state."
context.reply(SubmitDriverResponse(self, false, None, msg))
} else {
logInfo("Driver submitted " + description.command.mainClass)
val driver = createDriver(description)
persistenceEngine.addDriver(driver)
waitingDrivers += driver
drivers.add(driver)
schedule()
// 向ClientEndpoint 返回dirver提交响应信息
context.reply(SubmitDriverResponse(self, true, Some(driver.id),
s"Driver successfully submitted as ${driver.id}"))
}
在standalone模式下,driver进程是运行在worker节点上的;而真正launch driver的方法入口是schedule(),这个方法除了在worker上启动driver外,还负责在worker
启动executor;这里我们只关注它是如何启动dirver的;在worker上启动dirver需要Master给worker发送driver启动消息;具体方法为:
private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
logInfo("Launching driver " + driver.id + " on worker " + worker.id)
worker.addDriver(driver)
driver.worker = Some(worker)
// 要启动driver进程的worker是随机选出来的
worker.endpoint.send(LaunchDriver(driver.id, driver.desc))
driver.state = DriverState.RUNNING
}
worker收到LaunchDriver请求后,创建DriverRunner对象,调用start方法启动driver,
// 在worker节点上启动dirver
case LaunchDriver(driverId, driverDesc) =>
logInfo(s"Asked to launch driver $driverId")
val driver = new DriverRunner(
conf,
driverId,
workDir,
sparkHome,
driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
self,
workerUri,
securityMgr)
drivers(driverId) = driver
// 启动driver
driver.start()
启动driver其实就构建linux命令行的方式在worker启动进程
runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)<meta charset="utf-8">
当然在启动执行命令之前,还有一些下载jar和创建目录的操作,也就是配置运行环境
private[worker] def prepareAndRunDriver(): Int = {
// 创建driver目录
val driverDir = createWorkingDirectory()
// 下载jar
val localJarFilename = downloadUserJar(driverDir)
def substituteVariables(argument: String): String = argument match {
case "{{WORKER_URL}}" => workerUrl
case "{{USER_JAR}}" => localJarFilename
case other => other
}
// TODO: If we add ability to submit multiple jars they should also be added here
val builder = CommandUtils.buildProcessBuilder(driverDesc.command, securityManager,
driverDesc.mem, sparkHome.getAbsolutePath, substituteVariables)
// 启动driver,之后就是初始化sparkcontext
runDriver(builder, driverDir, driverDesc.supervise)
}
最后worker将driver进程返回状态信息发送给Master
worker.send(DriverStateChanged(driverId, finalState.get, finalException))
到这里driver就在worker启动完成了