文章目录

  • Container资源申请分配
  • 1. ApplicationMaster.createAllocator()
  • 2. YarnAllocator
  • 1)allocateResources()
  • 2)updateResourceRequests()
  • splitPendingAllocationsByLocality()
  • requestTotalExecutorsWithPreferredLocalities()
  • localityOfRequestedContainers()
  • 3)handleAllocatedContainers()
  • Executor的启动
  • 参考


Spark on YARN应用程序提交之后的“SparkSubmit初始化和ApplicationMaster的启动注册”我们已经在 Spark on YARN SparkSubmit初始化、ApplicationMaster的启动注册这篇文章中分析过了。本篇文章,我们就来分析一下ApplicationMaster启动注册之后的执行流程,也就是YARN是如何申请分配container资源,以及Executor是如何启动的。

我还是把之前文章的源码调用流程图贴在这里,方便大家对照着看。

yarn start 怎么在后台运行_YARN

Container资源申请分配

1. ApplicationMaster.createAllocator()

ApplicationMaster向Driver注册之后,会创建一个YarnAllocator实例用来向YARN ResourceMananger请求containers资源,并决定如何分配这些container资源。

private def createAllocator(driverRef: RpcEndpointRef, _sparkConf: SparkConf): Unit = {
  //获取YARN ApplicationId,主要用来记录日志信息和监控信息
  val appId = client.getAttemptId().getApplicationId().toString()
  //为driver创建一个名称为"CoarseGrainedScheduler"的RPC endpoint,
  val driverUrl = RpcEndpointAddress(driverRef.address.host, driverRef.address.port,
    CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
  ...
  //通过YarnRMClient(YARN ResourceManager)来创建一个YarnAllocator实例
  allocator = client.createAllocator(
    yarnConf,
    _sparkConf,
    driverUrl,
    driverRef,
    securityMgr,
    localResources)
  //credentialRenewer是AMCredentialRenewer类的实例,用来定期更新应用程序所需的token。
  //如果原始token的生命周期超过了续订时间的75%,就会创建新的token。
  //一旦ApplicationMaster向Driver进行了注册,新的创建的token会被发送到Driver endpoint。 
  credentialRenewer.foreach(_.setDriverRef(driverRef))
  
  //在初始化YarnAllocator实例之后,初始化AM endpoint,这可以确保当driver发送一个初始的executor请求时,YarnAllocator已经准备好服务的请求了。
  //AMEndpoint是ApplicationMaster的内部类,是用来和driver进行通信的
  rpcEnv.setupEndpoint("YarnAM", new AMEndpoint(rpcEnv, driverRef))

  //这才是当前方法中的重点,也就是向YARN请求资源
  allocator.allocateResources()
  
  //下面主要是监控相关的实现
  val ms = MetricsSystem.createMetricsSystem("applicationMaster", sparkConf, securityMgr)
  val prefix = _sparkConf.get(YARN_METRICS_NAMESPACE).getOrElse(appId)
  ms.registerSource(new ApplicationMasterSource(prefix, allocator))
  // do not register static sources in this case as per SPARK-25277
  ms.start(false)
  metricsSystem = Some(ms)
  reporterThread = launchReporterThread()
}

2. YarnAllocator

YarnAllocator负责向YARN ResourceManager请求containers资源,并决定如何处理YARN返回containers资源。YarnAllocator类的主要实现其实都是通过调用AMRMClient APIs,主要以3种方式与AMRMClient进行交互:

  1. 向AMRMClient告知所需要的containers资源,并更新本地关于containers资源的记录
  2. 调用AMRMClient的allocate()函数,向ResourceManager同步本地的containers资源请求,并返回YARN给予我们的containers资源。这也起到了心跳通知的作用。
  3. 处理YARN给予的containers资源,并在containers中启动Executor。

YarnAllocator中主要函数实现的功能:

  • requestTotalExecutorsWithPreferredLocalities(…):向ResourceManager请求尽可能多的executors,已达到所需的总数。
  • allocateResources():请求资源,如果YARN满足了所请求的所有资源,我们就会得到与executor数相等的containers,并在返回的containers中启动executor
  • updateResourceRequests():根据当前运行的executor数量和要请求的executor总数,来更新向ResourceManager请求的containers数
  • handleAllocatedContainers(allocatedContainers: Seq[Container]):处理向ResourceManager获取的containers,并在containers中启动executor
  • matchContainerToRequest(…):查找与给定container相匹配的位置的请求,如果找到了,就移除这个请求
  • runAllocatedContainers(containersToUse: ArrayBuffer[Container]):在分配的containers中启动executor
  • processCompletedContainers(completedContainers: Seq[ContainerStatus]):处理使用过的containers
  • splitPendingAllocationsByLocality(…):根据当前排队tasks的位置将containers请求划分为3组

1)allocateResources()

我们重点分析一下上面代码中的allocator.allocateResources()函数,也就是YarnAllocator类中的allocateResources()函数。它用来向ResourceManager请求资源,如果YARN满足了所请求的所有资源,我们就会得到与executor数相等的containers,并在返回的containers中启动executor。

def allocateResources(): Unit = synchronized {
  //根据当前运行的executor数量和要请求的executor总数,来更新向ResourceManager请求的containers数
  updateResourceRequests()

  val progressIndicator = 0.1f
  //向ResourceManager申请containers资源,并返回分配响应
  val allocateResponse = amClient.allocate(progressIndicator)
  //获取从ResourceManager返回的container列表
  val allocatedContainers = allocateResponse.getAllocatedContainers()
  //黑名单节点跟踪
  allocatorBlacklistTracker.setNumClusterNodes(allocateResponse.getNumClusterNodes)

  //如果分配的containers数大于0,就处理这些containers
  if (allocatedContainers.size > 0) {
    ...
    //处理向ResourceManager获取的containers,并在containers中启动executor
    handleAllocatedContainers(allocatedContainers.asScala)
  }
  //获取已使用containers列表,也可能是出错的containers
  val completedContainers = allocateResponse.getCompletedContainersStatuses()
  if (completedContainers.size > 0) {
    logDebug("Completed %d containers".format(completedContainers.size))
    //处理使用过的containers
    processCompletedContainers(completedContainers.asScala)
    logDebug("Finished processing %d completed containers. Current running executor count: %d."
      .format(completedContainers.size, runningExecutors.size))
  }
}

2)updateResourceRequests()

YARN container资源申请的核心就位于updateResourceRequests()函数中。

updateResourceRequests()根据当前运行的executor数量和要请求的executor总数,来同步更新向ResourceManager请求的containers数。

updateResourceRequests()请求container资源的流程:

  1. 首先,根据每个节点上要执行的tasks,将待定的container请求列表划分成3组:
  • 本地匹配的请求列表:可以将任务在当前节点执行的container请求
  • 本地未匹配的请求列表:不能将任务在当前节点执行,但是可以在与当前节点同一机架的其他节点上执行的container请求
  • 非本地的请求列表:只能将任务在其他机架上执行的container请求
  1. 对于那些本地匹配的请求列表之外的两种请求,会进行取消并重新发起请求,然后,根据container放置策略来重新计算本地性,以最大化任务的本地性执行。
def updateResourceRequests(): Unit = {
  //待定未满足的container请求列表
  val pendingAllocate = getPendingAllocate
  val numPendingAllocate = pendingAllocate.size
  
  //missing:还差的executor个数
  //targetNumExecutors: 总的要请求的executors个数, 如果没有开启动态资源分配的情况下,就是我们提交作业时指定的executor个数(--num-executors)
  //numExecutorsStarting: 正在启动的executors个数;
  //runningExecutors.size:正在运行的executors个数
  val missing = targetNumExecutors - numPendingAllocate -
    numExecutorsStarting.get - runningExecutors.size
  logDebug(s"Updating resource requests, target: $targetNumExecutors, " +
    s"pending: $numPendingAllocate, running: ${runningExecutors.size}, " +
    s"executorsStarting: ${numExecutorsStarting.get}")
    
  //将待定的container请求划分为3组:本地能匹配到的请求列表、本地不能匹配到的请求列表和非本地的请求列表。
  //对于本地能匹配到的请求列表,考虑在对应的节点上启动container,并标记为已分配的containers。
  //对于另外两种列表,取消container的请求,并重新计算container的放置策略,因为不能满足本地优先策略。
  val (localRequests, staleRequests, anyHostRequests) = splitPendingAllocationsByLocality(
    hostToLocalTaskCounts, pendingAllocate)

  if (missing > 0) {
    logInfo(s"Will request $missing executor container(s), each with " +
      s"${resource.getVirtualCores} core(s) and " +
      s"${resource.getMemory} MB memory (including $memoryOverhead MB of overhead)")

    //取消本地不能匹配到的请求。因为这些container请求已经被发送给了ResourceManager。
    staleRequests.foreach { stale =>
      amClient.removeContainerRequest(stale)
    }
    val cancelledContainers = staleRequests.size
    if (cancelledContainers > 0) {
      logInfo(s"Canceled $cancelledContainers container request(s) (locality no longer needed)")
    }

    //更新可用的containers个数
    val availableContainers = missing + cancelledContainers

    //为了最大化任务本地性,还要把没有本地优先的请求考虑在内
    val potentialContainers = availableContainers + anyHostRequests.size
    //重新计算每个container的节点本地性(node locality)和机架本地性(rack locality,当前机架其他节点)
    val containerLocalityPreferences = containerPlacementStrategy.localityOfRequestedContainers(
      potentialContainers, numLocalityAwareTasks, hostToLocalTaskCounts,
        allocatedHostToContainersMap, localRequests)

    //根据计算的containers本地性,重新实例化container请求
    val newLocalityRequests = new mutable.ArrayBuffer[ContainerRequest]
    containerLocalityPreferences.foreach {
      case ContainerLocalityPreferences(nodes, racks) if nodes != null =>
        newLocalityRequests += createContainerRequest(resource, nodes, racks)
      case _ =>
    }

    if (availableContainers >= newLocalityRequests.size) { //当前可用的containers可以满足所有新的container请求
      // more containers are available than needed for locality, fill in requests for any host
      for (i <- 0 until (availableContainers - newLocalityRequests.size)) {
        newLocalityRequests += createContainerRequest(resource, null, null)
      }
    } else { //当前可用的containers不能满足所有新的container请求,不能满足的请求会在其他机架的节点上放置container,所以会取消这些请求,来获取更好的本地性
      val numToCancel = newLocalityRequests.size - availableContainers
      // cancel some requests without locality preferences to schedule more local containers
      anyHostRequests.slice(0, numToCancel).foreach { nonLocal =>
        amClient.removeContainerRequest(nonLocal)
      }
      if (numToCancel > 0) {
        logInfo(s"Canceled $numToCancel unlocalized container requests to resubmit with locality")
      }
    }
    //重新添加container请求
    newLocalityRequests.foreach { request =>
      amClient.addContainerRequest(request)
    }
  ...
  } else if (numPendingAllocate > 0 && missing < 0) {
    val numToCancel = math.min(numPendingAllocate, -missing)
    logInfo(s"Canceling requests for $numToCancel executor container(s) to have a new desired " +
      s"total $targetNumExecutors executors.")
    // cancel pending allocate requests by taking locality preference into account
    val cancelRequests = (staleRequests ++ anyHostRequests ++ localRequests).take(numToCancel)
    cancelRequests.foreach(amClient.removeContainerRequest)
  }
}

updateResourceRequests()函数中只是一些粗粒度的实现,更加细节的实现,我们主要关注这3个函数:

  • splitPendingAllocationsByLocality(…)
  • requestTotalExecutorsWithPreferredLocalities(…)
  • localityOfRequestedContainers()
splitPendingAllocationsByLocality()

splitPendingAllocationsByLocality()根据等待中tasks的本地性将等待中的container请求划分成3组:

  1. 能匹配到本地主机的请求,也就是task对应的host可以分配container。个人理解对应的数据本地性应该是PROCESS_LOCAL 、NODE_LOCAL 。
  2. 不能匹配到本地主机的请求,task对应的host无法分配container。对应的数据本地性应该是RACK_LOCAL、ANY。
  3. 无本地性要求的请求。对应的数据本地性应该是NO_PREF,也就是从任何节点访问数据都是一样的。

对于后两种container请求,是要取消并重新计算container放置策略的。

private def splitPendingAllocationsByLocality(
    hostToLocalTaskCount: Map[String, Int], //hostToLocalTaskCount本地task对应的主机
    pendingAllocations: Seq[ContainerRequest] //本地性为“ANY_HOST”的container请求
  ): (Seq[ContainerRequest], Seq[ContainerRequest], Seq[ContainerRequest]) = {
  val localityMatched = ArrayBuffer[ContainerRequest]()
  val localityUnMatched = ArrayBuffer[ContainerRequest]()
  val localityFree = ArrayBuffer[ContainerRequest]()

  val preferredHosts = hostToLocalTaskCount.keySet
  pendingAllocations.foreach { cr =>
    val nodes = cr.getNodes
    if (nodes == null) {
      localityFree += cr //无本地性的container请求
    } else if (nodes.asScala.toSet.intersect(preferredHosts).nonEmpty) {
      localityMatched += cr //交集为能匹配到本地主机的container请求
    } else {
      localityUnMatched += cr //不能匹配到本地主机的container请求
    }
  }

  (localityMatched.toSeq, localityUnMatched.toSeq, localityFree.toSeq)
}
requestTotalExecutorsWithPreferredLocalities()

requestTotalExecutorsWithPreferredLocalities()函数会向ResourceManager请求尽可能多并能满足要求数量的Executors。如果请求的Executors个数少于当前正在运行的Executors个数,那么没有Executors会被kill掉。

def requestTotalExecutorsWithPreferredLocalities(
    requestedTotal: Int,
    localityAwareTasks: Int,
    hostToLocalTaskCount: Map[String, Int],
    nodeBlacklist: Set[String]): Boolean = synchronized {
  this.numLocalityAwareTasks = localityAwareTasks
  this.hostToLocalTaskCounts = hostToLocalTaskCount

  if (requestedTotal != targetNumExecutors) {
    logInfo(s"Driver requested a total number of $requestedTotal executor(s).")
    targetNumExecutors = requestedTotal
    allocatorBlacklistTracker.setSchedulerBlacklistedNodes(nodeBlacklist)
    true
  } else {
    false
  }
}
localityOfRequestedContainers()

localityOfRequestedContainers()是LocalityPreferredContainerPlacementStrategy类中函数,用来计算每个container的节点本地性(node locality)和机架本地性(rack locality)。

  • LocalityPreferredContainerPlacementStrategy:Container首选位置放置策略。该策略通过考虑待处理task的节点比例、所需的core/containers以及当前已存在和待分配containers的位置来计算YARN containers的最优位置。该算法的目标是最大限度地增加本地运行的tasks数。
    假设有这样一个场景,我们有20个tasks需要分配给hosts1,、host2和host3三台主机,10个tasks需要分配给host1、host2和host4,每个container有2个core,每个task占用一个cpu,那么总共需要15个containers,主机比例为(host1: 30, host2: 30, host3: 20, host4: 10),也就是3 : 3 : 2 : 1。
  1. 如果请求的container个数(18)比所需container数大(15),对应分配比例如下:
    向节点(host1, host2, host3, host4)请求5个containers;
    向节点(host1, host2, host3)请求5个containers;
    向节点(host1, host2)请求5个containers;
    剩下的3个container没有任何本地优先分配;
    这种情况下的放置比例为3 : 3 : 2 : 1。
  2. 如果请求的container个数(10)比所需container数小(15),对应分配比例如下:
    向节点(host1, host2, host3, host4)请求4个containers;
    向节点(host1, host2, host3)请求3个containers;
    向节点(host1, host2)请求3个containers;
    这种情况下的放置比例为10 : 10 : 7 : 4,接近于3 : 3 : 2 : 1
  3. 如果存在可用的containers,没有一个可以满足请求的本地性,分配规则遵循上面两种情况。
  4. 如果存在可用的containers,而且其中部分containers可以满足请求的本地性。
    例如,每个节点上可分配1个满足条件的container:(host1: 1, host2: 1: host3: 1, host4: 1),但是期望每个节点分配的container个数为:(host1: 5, host2: 5, host3: 4, host4: 2),那么每个节点上新请求的容器个数为: (host1: 4, host2: 4, host3: 3, host4: 1)。
  1. 如果要请求的containers个数(18)多于要求的containers个数(4+4+3+1=12),遵循规则1,比例为4 : 4 : 3 : 1
  2. 如果要请求的containers个数(10)多于要求的containers个数(4+4+3+1=12),遵循规则1,比例为4 : 4 : 3 : 1
  1. 如果存在可用的containers,且现有的container本地性可以完全满足所需的本地性要求。
    例如,如果每个节点上都有5个container:(host1: 5, host2: 5, host3: 5, host4: 5),可以满足当前请求所需的本地性。
def localityOfRequestedContainers(
    numContainer: Int, //要计算的container个数
    numLocalityAwareTasks: Int, //本地要求的task个数
    hostToLocalTaskCount: Map[String, Int], //首选主机和可能在其上运行的task的映射关系
    allocatedHostToContainersMap: HashMap[String, Set[ContainerId]], //主机和已分配的container的映射关系,通过已存在的containers来计算期望的首选位置
    localityMatchedPendingAllocations: Seq[ContainerRequest] //与当前所需task的位置匹配的且等待中的containers请求
  ): Array[ContainerLocalityPreferences] = {
  
  //通过已分配的container,来计算所需containers对应的主机数
  val updatedHostToContainerCount = expectedHostToContainerCount(
    numLocalityAwareTasks, hostToLocalTaskCount, allocatedHostToContainersMap,
      localityMatchedPendingAllocations)
  val updatedLocalityAwareContainerNum = updatedHostToContainerCount.values.sum

  // 待分配containers会被划分成两组:有本地优先的和无本地优先的。
  val requiredLocalityFreeContainerNum =
    math.max(0, numContainer - updatedLocalityAwareContainerNum)
  val requiredLocalityAwareContainerNum = numContainer - requiredLocalityFreeContainerNum

  //初始化一个本地优先container数组
  val containerLocalityPreferences = ArrayBuffer[ContainerLocalityPreferences]()
  if (requiredLocalityFreeContainerNum > 0) {
    for (i <- 0 until requiredLocalityFreeContainerNum) {
      containerLocalityPreferences += ContainerLocalityPreferences(
        null.asInstanceOf[Array[String]], null.asInstanceOf[Array[String]])
    }
  }

  if (requiredLocalityAwareContainerNum > 0) {
    val largestRatio = updatedHostToContainerCount.values.max
    var preferredLocalityRatio = updatedHostToContainerCount.map { case(k, ratio) =>
      val adjustedRatio = ratio.toDouble * requiredLocalityAwareContainerNum / largestRatio
      (k, adjustedRatio.ceil.toInt)
    }

    for (i <- 0 until requiredLocalityAwareContainerNum) {
      // 只过滤出那些比率比0大的,这就意味着当前主机仍然可以给新container请求分配container
      val hosts = preferredLocalityRatio.filter(_._2 > 0).keys.toArray
      val racks = hosts.map { h =>
        resolver.resolve(yarnConf, h)
      }.toSet
      containerLocalityPreferences += ContainerLocalityPreferences(hosts, racks.toArray)

      // 如果主机被使用就减1。当当前比率为0时,就意味着所有请求都被满足了。
      preferredLocalityRatio = preferredLocalityRatio.map { case (k, v) => (k, v - 1) }
    }
  }
  containerLocalityPreferences.toArray
}

3)handleAllocatedContainers()

handleAllocatedContainers()在ResourceMananger给予的containers上来启动executors。

def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
  val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)

  // 通过主机来匹配请求
  val remainingAfterHostMatches = new ArrayBuffer[Container]
  for (allocatedContainer <- allocatedContainers) {
    //查找与给定分配container相匹配的给定位置的请求。如果存在这样的位置,就删除对应的请求,这样就不会再次提交了。并把这个container放入待使用container列表中。
    matchContainerToRequest(allocatedContainer, allocatedContainer.getNodeId.getHost,
      containersToUse, remainingAfterHostMatches)
  }

  // 通过机架匹配剩余的container。
  val remainingAfterRackMatches = new ArrayBuffer[Container]
  if (remainingAfterHostMatches.nonEmpty) {
    var exception: Option[Throwable] = None
    //这里为什么要用一个单独的线程:当SparkContext被关闭后,YarnAllocator线性会被中断。
    //如果中断发生在错误的时候,将会看到这样的报错java.io.IOException: java.lang.InterruptedException...
    //这意味着被调用的YARN代码(RackResolver)正在吞下中断,所以Spark Yarnallocator线程永远都不会退出。
    //在这种情况中,allocator正在分配大量的executor,应用看起来似乎像是挂起了,但是即使SparkContext已经关闭,仍然会有很多executor出现。
    val thread = new Thread("spark-rack-resolver") {
      override def run(): Unit = {
        try {
          for (allocatedContainer <- remainingAfterHostMatches) {
            val rack = resolver.resolve(conf, allocatedContainer.getNodeId.getHost)
            matchContainerToRequest(allocatedContainer, rack, containersToUse,
              remainingAfterRackMatches)
          }
        } catch {
          case e: Throwable =>
            exception = Some(e)
        }
      }
    }
    thread.setDaemon(true)
    thread.start()

    try {
      thread.join()
    } catch {
      case e: InterruptedException =>
        thread.interrupt()
        throw e
    }

    if (exception.isDefined) {
      throw exception.get
    }
  }

  // 分配剩下既不是本地节点又不是机架本地的container
  val remainingAfterOffRackMatches = new ArrayBuffer[Container]
  for (allocatedContainer <- remainingAfterRackMatches) {
    //ANY_HOST表示数据在其他机架上
    matchContainerToRequest(allocatedContainer, ANY_HOST, containersToUse,
      remainingAfterOffRackMatches)
  }
  //释放不必要的container
  if (!remainingAfterOffRackMatches.isEmpty) {
    logDebug(s"Releasing ${remainingAfterOffRackMatches.size} unneeded containers that were " +
      s"allocated to us")
    for (container <- remainingAfterOffRackMatches) {
      internalReleaseContainer(container)
    }
  }

  //在分配的container上启动executor
  runAllocatedContainers(containersToUse)
  logInfo("Received %d containers from YARN, launching executors on %d of them."
    .format(allocatedContainers.size, containersToUse.size))
}

Executor的启动

Executor的启动也是在YarnAllocator类中,是由runAllocatedContainers()启动的。此函数所有可使用的container进行遍历,并使用一个cached thread pool来执行一个Runnable。

private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = {
  for (container <- containersToUse) {
    executorIdCounter += 1
    val executorHostname = container.getNodeId.getHost
    val containerId = container.getId
    val executorId = executorIdCounter.toString
    assert(container.getResource.getMemory >= resource.getMemory)
    logInfo(s"Launching container $containerId on host $executorHostname " +
      s"for executor with ID $executorId")

    def updateInternalState(): Unit = synchronized {
      runningExecutors.add(executorId)
      numExecutorsStarting.decrementAndGet()
      executorIdToContainer(executorId) = container
      containerIdToExecutorId(container.getId) = executorId

      val containerSet = allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
        new HashSet[ContainerId])
      containerSet += containerId
      allocatedContainerToHostMap.put(containerId, executorHostname)
    }
	//正在运行的executor小于要求的executor个数
    if (runningExecutors.size() < targetNumExecutors) {
      //正在运行的executor个数加一
      numExecutorsStarting.incrementAndGet()
      if (launchContainers) {
        //launcherPool其实就是一个ThreadPoolExecutor
        launcherPool.execute(new Runnable {
          override def run(): Unit = {
            try {
              //这里是真正启动executor的地方。
              //这个类ExecutorRunnable会用一个单独的/bin/java命令在对应的节点上使用指定的资源来启动一个executor进程。
              new ExecutorRunnable(
                Some(container),
                conf,
                sparkConf,
                driverUrl,
                executorId,
                executorHostname,
                executorMemory,
                executorCores,
                appAttemptId.getApplicationId.toString,
                securityMgr,
                localResources
              ).run()
              updateInternalState()
            } catch {
              case e: Throwable =>
                numExecutorsStarting.decrementAndGet()
                if (NonFatal(e)) {
                  logError(s"Failed to launch executor $executorId on container $containerId", e)
                  // Assigned container should be released immediately
                  // to avoid unnecessary resource occupation.
                  amClient.releaseAssignedContainer(containerId)
                } else {
                  throw e
                }
            }
          }
        })
      } else {
        // For test only
        updateInternalState()
      }
    } else {
      logInfo(("Skip launching executorRunnable as running executors count: %d " +
        "reached target executors count: %d.").format(
        runningExecutors.size, targetNumExecutors))
    }
  }
}