.
- 一 .前言
- 二 .SlotPool
- 2.1. 介绍
- 2.2. 生命周期相关接口
- 2.3. resource manager 连接相关
- 2.4. Slot操作相关
- 三 .SlotPoolImpl 实现类
- 3.1. 前言
- 3.2. 属性
- 3.3. 生命周期相关接口
- 3.3.1. start
- 3.3.2. suspend
- 3.3.3. close
- 3.4. resource manager 连接相关
- 3.4.1. connectToResourceManager
- 3.4.2. disconnectResourceManager
- 3.4.3. registerTaskManager
- 3.4.4. releaseTaskManager
- 3.5. Slot操作相关
- 3.5.1. offerSlots
- 3.5.2. failAllocation
- 3.5.3. getAvailableSlotsInformation
- 3.5.4. getAllocatedSlotsInformation
- 3.5.5. allocateAvailableSlot
- 3.5.6. requestNewAllocatedSlot
- 3.5.7. requestNewAllocatedBatchSlot
- 3.5.8. disableBatchSlotRequestTimeoutCheck
- 3.5.9. createAllocatedSlotReport
一 .前言
The Interface of a slot pool that manages slots.
二 .SlotPool
2.1. 介绍
SlotPool 是JobMaster用于管理slot的pool . 是一个接口类, 定义了相关slot的管理操作…
2.2. 生命周期相关接口
接口 | 含义 |
start | 启动 |
suspend | 挂起 |
close | 关闭 |
2.3. resource manager 连接相关
接口 | 含义 |
connectToResourceManager | 与ResourceManager建立连接 |
disconnectResourceManager | 关闭ResourceManager连接 |
registerTaskManager | 通过给定的ResourceId 注册一个TaskExecutor |
releaseTaskManager | 释放TaskExecutor |
2.4. Slot操作相关
接口 | 含义 |
offerSlots | 释放slot |
failAllocation | 根据给定的allocation id 标识slot为失败 |
getAvailableSlotsInformation | 获取当前可用的slots 信息. |
getAllocatedSlotsInformation | 获取所有的slot信息 |
allocateAvailableSlot | 在给定的 request id 下使用给定的 allocation id 分配可用的slot。 如果没有具有给定分配id的插槽可用,则此方法返回{@code null}。 |
requestNewAllocatedSlot | 从resource manager 请求分配新slot。 此方法不会从池中已经可用的slot返回slot,而是将向该池添加一个新slot,该slot将立即分配并返回。 |
requestNewAllocatedBatchSlot | 从 resource manager 请求分配新的批处理slot 与普通slot不同,批处理slot只有在slot池不包含合适的slot时才会超时。 此外,它不会对来自资源管理器的故障信号做出反应。 |
disableBatchSlotRequestTimeoutCheck | 禁用批处理slot请求超时检查。 当其他人要接管超时检查职责时调用。 |
createAllocatedSlotReport | 创建有关属于指定 task manager 的已分配slot的报告。 |
三 .SlotPoolImpl 实现类
3.1. 前言
SlotPoolImpl 是SlotPool接口的实现类.
slot pool为{@link ExecutionGraph}发出的slot请求提供服务。
当它无法提供slot请求时,它将尝试从ResourceManager获取新的slot。
如果当前没有可用的ResourceManager,或者ResourceManager拒绝了它,或者请求超时,那么它将使slot请求失败。
slot pool还保存提供给它并被接受的所有slot,因此即使ResourceManager关闭,也可以提供注册的空闲slot。
slot只有在无用时才会释放,例如,当作业完全运行时,但我们仍有一些可用slot。
所有的分配或槽提供都将由自己生成的AllocationID标识,我们将使用它来消除歧义。
3.2. 属性
/**
* SlotPool在调试级别写入其slot分布的间隔(毫秒)。
*
* The interval (in milliseconds) in which the SlotPool writes its slot distribution on debug
* level.
*/
private static final long STATUS_LOG_INTERVAL_MS = 60_000;
// job ID
private final JobID jobId;
/**
* 仅当资源已注册时,才会接受和使用所有已注册的TaskManager、slot。
* All registered TaskManagers, slots will be accepted and used only if the resource is registered.
*/
private final HashSet<ResourceID> registeredTaskManagers;
/**
* 所有分配的slot的book-keeping。
* The book-keeping of all allocated slots.
* */
private final AllocatedSlots allocatedSlots;
/**
* 所有可用slot的 book-keeping
* The book-keeping of all available slots.
* */
private final AvailableSlots availableSlots;
/**
* 等待slot的所有挂起请求。
* All pending requests waiting for slots.
* */
private final DualKeyLinkedMap<SlotRequestId, AllocationID, PendingRequest> pendingRequests;
/**
* 等待连接 resource manager 的请求。
* The requests that are waiting for the resource manager to be connected.
* */
private final LinkedHashMap<SlotRequestId, PendingRequest> waitingForResourceManager;
/**
*
* 外部请求调用超时(例如,到ResourceManager或TaskExecutor)。
*
* Timeout for external request calls (e.g. to the ResourceManager or the TaskExecutor).
* */
private final Time rpcTimeout;
/**
* 释放空闲slot超时。
* Timeout for releasing idle slots. */
private final Time idleSlotTimeout;
/**
* 批处理slot请求超时。
* Timeout for batch slot requests. */
private final Time batchSlotTimeout;
private final Clock clock;
/** the fencing token of the job manager. */
private JobMasterId jobMasterId;
/** The gateway to communicate with resource manager. */
@Nullable private ResourceManagerGateway resourceManagerGateway;
// jobManager Address
private String jobManagerAddress;
// 组件主线程执行器
private ComponentMainThreadExecutor componentMainThreadExecutor;
// 批slot请求超时检查已启用
protected boolean batchSlotRequestTimeoutCheckEnabled;
- 构造方法就是对属性的赋值操作
public SlotPoolImpl(
JobID jobId,
Clock clock,
Time rpcTimeout,
Time idleSlotTimeout,
Time batchSlotTimeout) {
this.jobId = checkNotNull(jobId);
this.clock = checkNotNull(clock);
this.rpcTimeout = checkNotNull(rpcTimeout);
this.idleSlotTimeout = checkNotNull(idleSlotTimeout);
this.batchSlotTimeout = checkNotNull(batchSlotTimeout);
this.registeredTaskManagers = new HashSet<>(16);
this.allocatedSlots = new AllocatedSlots();
this.availableSlots = new AvailableSlots();
this.pendingRequests = new DualKeyLinkedMap<>(16);
this.waitingForResourceManager = new LinkedHashMap<>(16);
this.jobMasterId = null;
this.resourceManagerGateway = null;
this.jobManagerAddress = null;
this.componentMainThreadExecutor = null;
this.batchSlotRequestTimeoutCheckEnabled = true;
}
3.3. 生命周期相关接口
接口 | 含义 |
start | 启动 |
suspend | 挂起 |
close | 关闭 |
3.3.1. start
/**
* 启动slot池以接受RPC调用。
* Start the slot pool to accept RPC calls.
*
* @param jobMasterId The necessary leader id for running the job.
* @param newJobManagerAddress for the slot requests which are sent to the resource manager
* @param componentMainThreadExecutor The main thread executor for the job master's main thread.
*/
@Override
public void start(
@Nonnull JobMasterId jobMasterId,
@Nonnull String newJobManagerAddress,
@Nonnull ComponentMainThreadExecutor componentMainThreadExecutor)
throws Exception {
this.jobMasterId = jobMasterId;
this.jobManagerAddress = newJobManagerAddress;
this.componentMainThreadExecutor = componentMainThreadExecutor;
// 超时相关操作
scheduleRunAsync(this::checkIdleSlot, idleSlotTimeout);
scheduleRunAsync(this::checkBatchSlotTimeout, batchSlotTimeout);
if (log.isDebugEnabled()) {
scheduleRunAsync(
this::scheduledLogStatus, STATUS_LOG_INTERVAL_MS, TimeUnit.MILLISECONDS);
}
}
3.3.2. suspend
/**
* 挂起此池,意味着它已失去接受和分发slot的权限。
* Suspends this pool, meaning it has lost its authority to accept and distribute slots.
* */
@Override
public void suspend() {
componentMainThreadExecutor.assertRunningInMainThread();
log.info("Suspending SlotPool.");
// resourceManagerGateway 取消 SlotRequest操作
cancelPendingSlotRequests();
// do not accept any requests
jobMasterId = null;
resourceManagerGateway = null;
// Clear (but not release!) the available slots. The TaskManagers should re-register them
// at the new leader JobManager/SlotPool
clear();
}
3.3.3. close
@Override
public void close() {
log.info("Stopping SlotPool.");
// 取消挂起的SlotRequests
cancelPendingSlotRequests();
// 释放资源
// 通过释放相应的TaskExecutor来释放所有注册的插槽
// release all registered slots by releasing the corresponding TaskExecutors
for (ResourceID taskManagerResourceId : registeredTaskManagers) {
final FlinkException cause =
new FlinkException(
"Releasing TaskManager "
+ taskManagerResourceId
+ ", because of stopping of SlotPool");
releaseTaskManagerInternal(taskManagerResourceId, cause);
}
clear();
}
3.4. resource manager 连接相关
接口 | 含义 |
connectToResourceManager | 与ResourceManager建立连接 |
disconnectResourceManager | 关闭ResourceManager连接 |
registerTaskManager | 通过给定的ResourceId 注册一个TaskExecutor |
releaseTaskManager | 释放TaskExecutor |
3.4.1. connectToResourceManager
与ResourceManager建立连接, 处理阻塞/挂起的请求…
@Override
public void connectToResourceManager(@Nonnull ResourceManagerGateway resourceManagerGateway) {
this.resourceManagerGateway = checkNotNull(resourceManagerGateway);
// 处理挂起的PendingRequest 请求.
// work on all slots waiting for this connection
for (PendingRequest pendingRequest : waitingForResourceManager.values()) {
// 请求 RM / 获取资源
requestSlotFromResourceManager(resourceManagerGateway, pendingRequest);
}
// all sent off
waitingForResourceManager.clear();
}
3.4.2. disconnectResourceManager
关闭ResourceManager 连接.
@Override
public void disconnectResourceManager() {
this.resourceManagerGateway = null;
}
3.4.3. registerTaskManager
/**
*
* 将TaskManager注册到此 pool ,只有来自已注册TaskManager的slot才被视为有效。
* 它还为我们提供了一种方法,使“dead”或“abnormal”任务管理者远离这个池
*
*
* Register TaskManager to this pool, only those slots come from registered TaskManager will be considered valid.
*
* Also it provides a way for us to keep "dead" or "abnormal" TaskManagers out of this pool.
*
* @param resourceID The id of the TaskManager
*/
@Override
public boolean registerTaskManager(final ResourceID resourceID) {
componentMainThreadExecutor.assertRunningInMainThread();
// Register new TaskExecutor container_1615446205104_0025_01_000002(192.168.8.188:57958).
log.debug("Register new TaskExecutor {}.", resourceID.getStringWithMetadata());
return registeredTaskManagers.add(resourceID);
}
3.4.4. releaseTaskManager
/**
*
* 从该池中注销TaskManager,将释放所有相关slot并取消任务。
* 当我们发现某个TaskManager变得“dead”或“abnormal”,并且我们决定不再使用其中的slot时调用。
*
*
* Unregister TaskManager from this pool, all the related slots will be released and tasks be canceled.
*
* Called when we find some TaskManager becomes "dead" or "abnormal", and we decide to not using slots from it anymore.
*
* @param resourceId The id of the TaskManager
* @param cause for the releasing of the TaskManager
*/
@Override
public boolean releaseTaskManager(final ResourceID resourceId, final Exception cause) {
componentMainThreadExecutor.assertRunningInMainThread();
if (registeredTaskManagers.remove(resourceId)) {
releaseTaskManagerInternal(resourceId, cause);
return true;
} else {
return false;
}
}
3.5. Slot操作相关
接口 | 含义 |
offerSlots | 消费slot |
failAllocation | 根据给定的allocation id 标识slot为失败 |
getAvailableSlotsInformation | 获取当前可用的slots 信息. |
getAllocatedSlotsInformation | 获取所有的slot信息 |
allocateAvailableSlot | 在给定的 request id 下使用给定的 allocation id 分配可用的slot。 如果没有具有给定分配id的插槽可用,则此方法返回{@code null}。 |
requestNewAllocatedSlot | 从resource manager 请求分配新slot。 此方法不会从池中已经可用的slot返回slot,而是将向该池添加一个新slot,该slot将立即分配并返回。 |
requestNewAllocatedBatchSlot | 从 resource manager 请求分配新的批处理slot 与普通slot不同,批处理slot只有在slot池不包含合适的slot时才会超时。 此外,它不会对来自资源管理器的故障信号做出反应。 |
disableBatchSlotRequestTimeoutCheck | 禁用批处理slot请求超时检查。 当其他人要接管超时检查职责时调用。 |
createAllocatedSlotReport | 创建有关属于指定 task manager 的已分配slot的报告。 |
3.5.1. offerSlots
提供slot操作…
/**
*
* 根据AllocationID , TaskExecutor 提供Slot
*
* AllocationID最初由该 pool 生成,并通过ResourceManager传输到TaskManager
*
* 我们用它来区分我们发行的不同分配。
*
* 如果我们发现某个Slot不匹配或实际上没有等待此Slot的挂起请求(可能由其他返回的Slot完成),则Slot提供可能会被拒绝。
*
*
* Slot offering by TaskExecutor with AllocationID.
*
* The AllocationID is originally generated by this pool and transfer through the ResourceManager to TaskManager.
*
* We use it to distinguish the different allocation we issued.
*
* Slot offering may be rejected if we find something mismatching or there is actually no pending request waiting for this slot (maybe fulfilled by some other returned slot).
*
* @param taskManagerLocation location from where the offer comes from
* @param taskManagerGateway TaskManager gateway
* @param slotOffer the offered slot
* @return True if we accept the offering
*/
boolean offerSlot(
final TaskManagerLocation taskManagerLocation,
final TaskManagerGateway taskManagerGateway,
final SlotOffer slotOffer) {
componentMainThreadExecutor.assertRunningInMainThread();
// 检测 TaskManager是否有效
// check if this TaskManager is valid
final ResourceID resourceID = taskManagerLocation.getResourceID();
final AllocationID allocationID = slotOffer.getAllocationId();
// 必须是已注册的TaskManagers 中的slotOffer
if (!registeredTaskManagers.contains(resourceID)) {
log.debug(
"Received outdated slot offering [{}] from unregistered TaskManager: {}",
slotOffer.getAllocationId(),
taskManagerLocation);
return false;
}
// 检查是否已使用此slot
// check whether we have already using this slot
AllocatedSlot existingSlot;
if ((existingSlot = allocatedSlots.get(allocationID)) != null
|| (existingSlot = availableSlots.get(allocationID)) != null) {
// 我们需要弄清楚这是对完全相同的slot的重复offer,
// 还是在ResourceManager重新尝试请求后来自不同TaskManager的另一个offer
// we need to figure out if this is a repeated offer for the exact same slot,
// or another offer that comes from a different TaskManager after the ResourceManager
// re-tried the request
// 我们用比较SlotID的方式来写这个,因为SlotIDD是 TaskManager上实际slot的标识符
// we write this in terms of comparing slot IDs, because the Slot IDs are the
// identifiers of
// the actual slots on the TaskManagers
// Note: The slotOffer should have the SlotID
// 获取已存在的SlotID
final SlotID existingSlotId = existingSlot.getSlotId();
// 获取新的SlotID
final SlotID newSlotId =
new SlotID(taskManagerLocation.getResourceID(), slotOffer.getSlotIndex());
if (existingSlotId.equals(newSlotId)) {
log.info("Received repeated offer for slot [{}]. Ignoring.", allocationID);
// SlotID 相同属于重复消费
// 在此处返回true,这样发送方将获得对重试的肯定确认,并将产品标记为成功
// return true here so that the sender will get a positive acknowledgement to the
// retry and mark the offering as a success
return true;
} else {
// 分配已由另一个插槽完成,请拒绝提供,以便任务执行器将该插槽提供给资源管理器
// the allocation has been fulfilled by another slot, reject the offer so the task
// executor will offer the slot to the resource manager
return false;
}
}
// 到这里代表这个slot还没有人用过.
// 构造allocatedSlot 实例.
final AllocatedSlot allocatedSlot =
new AllocatedSlot(
allocationID,
taskManagerLocation,
slotOffer.getSlotIndex(),
slotOffer.getResourceProfile(),
taskManagerGateway);
// 使用 slot 以请求的顺序完成挂起的请求
// use the slot to fulfill pending request, in requested order
tryFulfillSlotRequestOrMakeAvailable(allocatedSlot);
// 无论如何我么都接受了这个请求.
// slot在空闲时间过长和超时后将被释放
// we accepted the request in any case.
// slot will be released after it idled for too long and timed out
return true;
}
/**
*
* 尝试使用给定的已分配slot完成挂起的slot请求,
*
* 或者如果没有匹配的请求,则将已分配的slot归还到可用slot集。
*
* Tries to fulfill with the given allocated slot a pending slot request
* or
* add the allocated slot to the set of available slots if no matching request is available.
*
* @param allocatedSlot which shall be returned
*/
private void tryFulfillSlotRequestOrMakeAvailable(AllocatedSlot allocatedSlot) {
Preconditions.checkState(!allocatedSlot.isUsed(), "Provided slot is still in use.");
// 获取PendingRequest
final PendingRequest pendingRequest = findMatchingPendingRequest(allocatedSlot);
if (pendingRequest != null) {
// Fulfilling pending slot request [
// SlotRequestId{d3517a9282334314b63f9493850f55f0}
// ] with slot [
// 3755cb8f9962a9a7738db04f2a02084c
// ]
log.debug(
"Fulfilling pending slot request [{}] with slot [{}]",
pendingRequest.getSlotRequestId(),
allocatedSlot.getAllocationId());
// 将请求从 请求队列中移除 .
removePendingRequest(pendingRequest.getSlotRequestId());
// 将当前分配的slot加入到已分配的allocatedSlots集合中, 标识已被使用.
allocatedSlots.add(pendingRequest.getSlotRequestId(), allocatedSlot);
// 回调请求,返回allocatedSlot 信息. 标识slot分配已经完成...
pendingRequest.getAllocatedSlotFuture().complete(allocatedSlot);
// 一旦相应的请求被删除,这个分配就可能成为孤立的
// this allocation may become orphan once its corresponding request is removed
final Optional<AllocationID> allocationIdOfRequest = pendingRequest.getAllocationId();
// 处理重新连接操作.
// 如果请求是由重新连接的TaskExecutor在连接ResourceManager之前直接提供的插槽完成的,
// 则分配id可以为null
// the allocation id can be null if the request was fulfilled by a slot directly offered
// by a reconnected TaskExecutor before the ResourceManager is connected
if (allocationIdOfRequest.isPresent()) {
maybeRemapOrphanedAllocation(
allocationIdOfRequest.get(), allocatedSlot.getAllocationId());
}
} else {
// 没有可用的PendingRequest , 归还allocatedSlot .
log.debug("Adding slot [{}] to available slots", allocatedSlot.getAllocationId());
availableSlots.add(allocatedSlot, clock.relativeTimeMillis());
}
}
3.5.2. failAllocation
/**
*
* 失败指定的分配和释放相应的slot,如果我们有一个。
* 当某些slot分配因rpcTimeout失败时,这可能由JobManager触发。
* 或者,当TaskManager发现slot出了问题并决定收回slot时,可能会触发这种情况。
*
* Fail the specified allocation and release the corresponding slot if we have one.
*
* This may triggered by JobManager when some slot allocation failed with rpcTimeout.
*
* Or this could be triggered by TaskManager, when it finds out something went wrong with the slot, and decided to take it back.
*
* @param allocationID Represents the allocation which should be failed
* @param cause The cause of the failure
* @return Optional task executor if it has no more slots registered
*/
@Override
public Optional<ResourceID> failAllocation(
final AllocationID allocationID, final Exception cause) {
componentMainThreadExecutor.assertRunningInMainThread();
// 获取PendingRequest
final PendingRequest pendingRequest = pendingRequests.getValueByKeyB(allocationID);
if (pendingRequest != null) {
if (isBatchRequestAndFailureCanBeIgnored(pendingRequest, cause)) {
log.debug(
"Ignoring allocation failure for batch slot request {}.",
pendingRequest.getSlotRequestId());
} else {
// request was still pending
removePendingRequest(pendingRequest.getSlotRequestId());
failPendingRequest(pendingRequest, cause);
}
return Optional.empty();
} else {
// 处理失败..
return tryFailingAllocatedSlot(allocationID, cause);
}
// TODO: add some unit tests when the previous two are ready, the allocation may failed at
// any phase
}
- 处理分配失败的slot
private Optional<ResourceID> tryFailingAllocatedSlot(
AllocationID allocationID, Exception cause) {
// 获取分配失败的AllocatedSlot
AllocatedSlot allocatedSlot = availableSlots.tryRemove(allocationID);
if (allocatedSlot == null) {
allocatedSlot = allocatedSlots.remove(allocationID);
}
if (allocatedSlot != null) {
log.debug("Failed allocated slot [{}]: {}", allocationID, cause.getMessage());
// 通知TaskExecutor 分配失败了..
// notify TaskExecutor about the failure
allocatedSlot.getTaskManagerGateway().freeSlot(allocationID, cause, rpcTimeout);
// release the slot.
// since it is not in 'allocatedSlots' any more, it will be dropped o return'
// 释放slot,并且将这个slot丢弃
allocatedSlot.releasePayload(cause);
final ResourceID taskManagerId = allocatedSlot.getTaskManagerId();
if (!availableSlots.containsTaskManager(taskManagerId)
&& !allocatedSlots.containResource(taskManagerId)) {
return Optional.of(taskManagerId);
}
}
return Optional.empty();
}
3.5.3. getAvailableSlotsInformation
获取可用的slot信息
@Override
@Nonnull
public Collection<SlotInfoWithUtilization> getAvailableSlotsInformation() {
final Map<ResourceID, Set<AllocatedSlot>> availableSlotsByTaskManager = availableSlots.getSlotsByTaskManager();
final Map<ResourceID, Set<AllocatedSlot>> allocatedSlotsByTaskManager = allocatedSlots.getSlotsByTaskManager();
return availableSlotsByTaskManager.entrySet().stream()
.flatMap(
entry -> {
final int numberAllocatedSlots =
allocatedSlotsByTaskManager
.getOrDefault(entry.getKey(), Collections.emptySet())
.size();
final int numberAvailableSlots = entry.getValue().size();
final double taskExecutorUtilization =
(double) numberAllocatedSlots
/ (numberAllocatedSlots + numberAvailableSlots);
return entry.getValue().stream()
.map(
slot ->
SlotInfoWithUtilization.from(
slot, taskExecutorUtilization));
})
.collect(Collectors.toList());
}
3.5.4. getAllocatedSlotsInformation
获取所有已分配的solt信息
@Override
public Collection<SlotInfo> getAllocatedSlotsInformation() {
return allocatedSlots.listSlotInfo();
}
3.5.5. allocateAvailableSlot
获取所有已有效的solt信息
@Override
public Optional<PhysicalSlot> allocateAvailableSlot(
@Nonnull SlotRequestId slotRequestId, @Nonnull AllocationID allocationID) {
componentMainThreadExecutor.assertRunningInMainThread();
AllocatedSlot allocatedSlot = availableSlots.tryRemove(allocationID);
if (allocatedSlot != null) {
allocatedSlots.add(slotRequestId, allocatedSlot);
return Optional.of(allocatedSlot);
} else {
return Optional.empty();
}
}
3.5.6. requestNewAllocatedSlot
从resource manager 请求分配新slot。 此方法不会从池中已经可用的slot返回slot,而是将向该池添加一个新slot,该slot将立即分配并返回。
@Nonnull
@Override
public CompletableFuture<PhysicalSlot> requestNewAllocatedSlot(
@Nonnull SlotRequestId slotRequestId,
@Nonnull ResourceProfile resourceProfile,
@Nullable Time timeout) {
componentMainThreadExecutor.assertRunningInMainThread();
// 构建PendingRequest
final PendingRequest pendingRequest =
PendingRequest.createStreamingRequest(slotRequestId, resourceProfile);
if (timeout != null) {
// 设置超时时间
// register request timeout
FutureUtils.orTimeout(
pendingRequest.getAllocatedSlotFuture(),
timeout.toMilliseconds(),
TimeUnit.MILLISECONDS,
componentMainThreadExecutor)
.whenComplete(
(AllocatedSlot ignored, Throwable throwable) -> {
if (throwable instanceof TimeoutException) {
timeoutPendingSlotRequest(slotRequestId);
}
});
}
return requestNewAllocatedSlotInternal(pendingRequest).thenApply((Function.identity()));
}
/**
* 从RM中请求一个新的slot
*
*
* Requests a new slot from the ResourceManager. If there is currently not ResourceManager
* connected, then the request is stashed and send once a new ResourceManager is connected.
*
* @param pendingRequest pending slot request
* @return An {@link AllocatedSlot} future which is completed once the slot is offered to the
* {@link SlotPool}
*/
@Nonnull
private CompletableFuture<AllocatedSlot> requestNewAllocatedSlotInternal(
PendingRequest pendingRequest) {
if (resourceManagerGateway == null) {
stashRequestWaitingForResourceManager(pendingRequest);
} else {
// 从RM中请求一个新的slot
requestSlotFromResourceManager(resourceManagerGateway, pendingRequest);
}
return pendingRequest.getAllocatedSlotFuture();
}
3.5.7. requestNewAllocatedBatchSlot
从 resource manager 请求分配新的批处理slot 与普通slot不同,批处理slot只有在slot池不包含合适的slot时才会超时。 此外,它不会对来自资源管理器的故障信号做出反应。
@Nonnull
@Override
public CompletableFuture<PhysicalSlot> requestNewAllocatedBatchSlot(
@Nonnull SlotRequestId slotRequestId, @Nonnull ResourceProfile resourceProfile) {
componentMainThreadExecutor.assertRunningInMainThread();
final PendingRequest pendingRequest =
PendingRequest.createBatchRequest(slotRequestId, resourceProfile);
return requestNewAllocatedSlotInternal(pendingRequest).thenApply(Function.identity());
}
3.5.8. disableBatchSlotRequestTimeoutCheck
禁用批处理slot请求超时检查。
当其他人要接管超时检查职责时调用。
@Override
public void disableBatchSlotRequestTimeoutCheck() {
batchSlotRequestTimeoutCheckEnabled = false;
}
3.5.9. createAllocatedSlotReport
创建有关属于指定 task manager 的已分配slot的报告。
@Override
public AllocatedSlotReport createAllocatedSlotReport(ResourceID taskManagerId) {
final Set<AllocatedSlot> availableSlotsForTaskManager =
availableSlots.getSlotsForTaskManager(taskManagerId);
final Set<AllocatedSlot> allocatedSlotsForTaskManager =
allocatedSlots.getSlotsForTaskManager(taskManagerId);
List<AllocatedSlotInfo> allocatedSlotInfos =
new ArrayList<>(
availableSlotsForTaskManager.size() + allocatedSlotsForTaskManager.size());
for (AllocatedSlot allocatedSlot :
Iterables.concat(availableSlotsForTaskManager, allocatedSlotsForTaskManager)) {
allocatedSlotInfos.add(
new AllocatedSlotInfo(
allocatedSlot.getPhysicalSlotNumber(),
allocatedSlot.getAllocationId()));
}
return new AllocatedSlotReport(jobId, allocatedSlotInfos);
}