spark work task 源码分析
spark 的task 主要以ShuffleMapTask为主,这个类就是在 任务的driver中进行生成然后序列化传输到
work 的CoarseGrainedExecutorBackend进程中进行执行。源码如下
// 这个对象是通过序列化的方式传到work进程中进行业务处理的
/** A constructor used only in test suites. This does not require passing in an RDD. */
def this(partitionId: Int) {
this(0, 0, null, new Partition { override def index: Int = 0 }, null, null)
}
@transient private val preferredLocs: Seq[TaskLocation] = {
if (locs == null) Nil else locs.toSet.toSeq
}
override def runTask(context: TaskContext): MapStatus = {
// Deserialize the RDD using the broadcast variable.
// 在这里开始运行业务了
val deserializeStartTime = System.currentTimeMillis()
val ser = SparkEnv.get.closureSerializer.newInstance()
val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
_executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime
metrics = Some(context.taskMetrics)
var writer: ShuffleWriter[Any, Any] = null
try {
// 拿到shuffer管理器
val manager = SparkEnv.get.shuffleManager
// 拿到这一分区的数据写对象
writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
// 把rdd执行的数据写到这个对象当中
writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
writer.stop(success = true).get
} catch {
case e: Exception =>
try {
if (writer != null) {
writer.stop(success = false)
}
} catch {
case e: Exception =>
log.debug("Could not stop writer", e)
}
throw e
}
}
可以看到上面的类当中,反序列化生成RDD 和ShuffleDependency。然后就可以在这个task中进行RDD对象的执行了
可以注意到在该源码中有一个
writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
该方法根据当前的shuffleHandle对象生成对应的writer对象。然后把rdd执行的数据写进去。
查看源码实现ShuffleManager接口的类主要有SortShuffleManager 和HashShuffleManager 两个类。
现在看下 SortShuffleManager 类中是怎么生成writer对象的
/** Get a writer for a given partition. Called on executors by map tasks. */
override def getWriter[K, V](
handle: ShuffleHandle,
mapId: Int,
context: TaskContext): ShuffleWriter[K, V] = {
numMapsForShuffle.putIfAbsent(
handle.shuffleId, handle.asInstanceOf[BaseShuffleHandle[_, _, _]].numMaps)
val env = SparkEnv.get
// 可以看到下面生成不同的writer,在写入数据的时候已经直接 unshuff,mergeSort 和sortShuff了
handle match {
case unsafeShuffleHandle: SerializedShuffleHandle[K @unchecked, V @unchecked] =>
new UnsafeShuffleWriter(
env.blockManager,
shuffleBlockResolver.asInstanceOf[IndexShuffleBlockResolver],
context.taskMemoryManager(),
unsafeShuffleHandle,
mapId,
context,
env.conf)
case bypassMergeSortHandle: BypassMergeSortShuffleHandle[K @unchecked, V @unchecked] =>
new BypassMergeSortShuffleWriter(
env.blockManager,
shuffleBlockResolver.asInstanceOf[IndexShuffleBlockResolver],
bypassMergeSortHandle,
mapId,
context,
env.conf)
case other: BaseShuffleHandle[K @unchecked, V @unchecked, _] =>
new SortShuffleWriter(shuffleBlockResolver, other, mapId, context)
}
}
可以看到,在该方法当中有UnsafeShuffleWriter、BypassMergeSortShuffleWriter 和SortShuffleWriter 三个写入对象
从命名当中就可以清楚看到,写的的writer可以进行mergeSort和sort的写入。这说明在RDD进行shuff writer的时候,已经对中间
的结果进行了merger 和sort的了。
我们再分析分析SortShuffleWriter 类的过程
/** Write a bunch of records to this task's output */
override def write(records: Iterator[Product2[K, V]]): Unit = {
sorter = if (dep.mapSideCombine) {
require(dep.aggregator.isDefined, "Map-side combine without Aggregator specified!")
// 这里在map端就进行combine了
new ExternalSorter[K, V, C](
context, dep.aggregator, Some(dep.partitioner), dep.keyOrdering, dep.serializer)
} else {
// In this case we pass neither an aggregator nor an ordering to the sorter, because we don't
// care whether the keys get sorted in each partition; that will be done on the reduce side
// if the operation being run is sortByKey.
new ExternalSorter[K, V, V](
context, aggregator = None, Some(dep.partitioner), ordering = None, dep.serializer)
}
// 把所有的数据插入进去了
sorter.insertAll(records)
// Don't bother including the time to open the merged output file in the shuffle write time,
// because it just opens a single file, so is typically too fast to measure accurately
// (see SPARK-3570).
val output = shuffleBlockResolver.getDataFile(dep.shuffleId, mapId)
val tmp = Utils.tempFileWith(output)
val blockId = ShuffleBlockId(dep.shuffleId, mapId, IndexShuffleBlockResolver.NOOP_REDUCE_ID)
// 把数据写到sort文件中去了
val partitionLengths = sorter.writePartitionedFile(blockId, tmp)
shuffleBlockResolver.writeIndexFileAndCommit(dep.shuffleId, mapId, partitionLengths, tmp)
// shuffleServerId 是一个地址来的,供driver来拉取数据
mapStatus = MapStatus(blockManager.shuffleServerId, partitionLengths)
}
在该类当中,创建一个ExternalSorter对象,然后把数据往里面写入。然后把排序好的数据写到一个数据文件当中。然后返回
一个 blockManager.shuffleServerId,这个对象就是上面文件存在的服务地址。我们可以看到这是一个BlockManager对象
专门用于不同进程间block数据块的通信传输功能的。
我们注意到在创建 ExternalSorter 对象时,传入了aggregator,实现了在shuff阶段进行相关的聚合功能。
在 ExternalSorter 当中,把数据写到内存中进行排序,如果内存放不下时,就刷新到硬盘中去了
def insertAll(records: Iterator[Product2[K, V]]): Unit = {
// TODO: stop combining if we find that the reduction factor isn't high
val shouldCombine = aggregator.isDefined
// 在shuff端进行聚合
if (shouldCombine) {
// Combine values in-memory first using our AppendOnlyMap
// 合并器
val mergeValue = aggregator.get.mergeValue
// 初始化器,
val createCombiner = aggregator.get.createCombiner
var kv: Product2[K, V] = null
val update = (hadValue: Boolean, oldValue: C) => {
if (hadValue) mergeValue(oldValue, kv._2) else createCombiner(kv._2)
}
while (records.hasNext) {
addElementsRead()
kv = records.next()
// 这里进行插入分片聚合
map.changeValue((getPartition(kv._1), kv._1), update)
// 当需要时,把数据刷到硬盘中
maybeSpillCollection(usingMap = true)
}
} else {
// Stick values into our buffer
while (records.hasNext) {
addElementsRead()
val kv = records.next()
buffer.insert(getPartition(kv._1), kv._1, kv._2.asInstanceOf[C])
maybeSpillCollection(usingMap = false)
}
}
}
总结:ShuffleMapTask 就是执行RDD对象,然后把结果写到一个有序数据文件。同时在写的过程中执行聚合的功能
然后返回 shuffleServerId的服务地址给driver,然后driver就可以通过这个定位到那台机器的block
然后通过blockManager进行拉取数据了