.

  • 一 .前言
  • 二 .SlotPool
  • 2.1. 介绍
  • 2.2. 生命周期相关接口
  • 2.3. resource manager 连接相关
  • 2.4. Slot操作相关
  • 三 .SlotPoolImpl 实现类
  • 3.1. 前言
  • 3.2. 属性
  • 3.3. 生命周期相关接口
  • 3.3.1. start
  • 3.3.2. suspend
  • 3.3.3. close
  • 3.4. resource manager 连接相关
  • 3.4.1. connectToResourceManager
  • 3.4.2. disconnectResourceManager
  • 3.4.3. registerTaskManager
  • 3.4.4. releaseTaskManager
  • 3.5. Slot操作相关
  • 3.5.1. offerSlots
  • 3.5.2. failAllocation
  • 3.5.3. getAvailableSlotsInformation
  • 3.5.4. getAllocatedSlotsInformation
  • 3.5.5. allocateAvailableSlot
  • 3.5.6. requestNewAllocatedSlot
  • 3.5.7. requestNewAllocatedBatchSlot
  • 3.5.8. disableBatchSlotRequestTimeoutCheck
  • 3.5.9. createAllocatedSlotReport


一 .前言

The Interface of a slot pool that manages slots.

flink的pom简单的配置 flink no pooled slot_flink的pom简单的配置

二 .SlotPool

2.1. 介绍

SlotPool 是JobMaster用于管理slot的pool . 是一个接口类, 定义了相关slot的管理操作…

flink的pom简单的配置 flink no pooled slot_flink的pom简单的配置_02

2.2. 生命周期相关接口

接口

含义

start

启动

suspend

挂起

close

关闭

2.3. resource manager 连接相关

接口

含义

connectToResourceManager

与ResourceManager建立连接

disconnectResourceManager

关闭ResourceManager连接

registerTaskManager

通过给定的ResourceId 注册一个TaskExecutor

releaseTaskManager

释放TaskExecutor

2.4. Slot操作相关

接口

含义

offerSlots

释放slot

failAllocation

根据给定的allocation id 标识slot为失败

getAvailableSlotsInformation

获取当前可用的slots 信息.

getAllocatedSlotsInformation

获取所有的slot信息

allocateAvailableSlot

在给定的 request id 下使用给定的 allocation id 分配可用的slot。

如果没有具有给定分配id的插槽可用,则此方法返回{@code null}。

requestNewAllocatedSlot

从resource manager 请求分配新slot。

此方法不会从池中已经可用的slot返回slot,而是将向该池添加一个新slot,该slot将立即分配并返回。

requestNewAllocatedBatchSlot

从 resource manager 请求分配新的批处理slot

与普通slot不同,批处理slot只有在slot池不包含合适的slot时才会超时。

此外,它不会对来自资源管理器的故障信号做出反应。

disableBatchSlotRequestTimeoutCheck

禁用批处理slot请求超时检查。

当其他人要接管超时检查职责时调用。

createAllocatedSlotReport

创建有关属于指定 task manager 的已分配slot的报告。

三 .SlotPoolImpl 实现类

3.1. 前言

SlotPoolImpl 是SlotPool接口的实现类.

slot pool为{@link ExecutionGraph}发出的slot请求提供服务。
当它无法提供slot请求时,它将尝试从ResourceManager获取新的slot。
如果当前没有可用的ResourceManager,或者ResourceManager拒绝了它,或者请求超时,那么它将使slot请求失败。
slot pool还保存提供给它并被接受的所有slot,因此即使ResourceManager关闭,也可以提供注册的空闲slot。
slot只有在无用时才会释放,例如,当作业完全运行时,但我们仍有一些可用slot。
所有的分配或槽提供都将由自己生成的AllocationID标识,我们将使用它来消除歧义。

flink的pom简单的配置 flink no pooled slot_flink的pom简单的配置_03

3.2. 属性

/**
     * SlotPool在调试级别写入其slot分布的间隔(毫秒)。
     *
     * The interval (in milliseconds) in which the SlotPool writes its slot distribution on debug
     * level.
     */
    private static final long STATUS_LOG_INTERVAL_MS = 60_000;

    // job ID
    private final JobID jobId;

    /**
     * 仅当资源已注册时,才会接受和使用所有已注册的TaskManager、slot。
     * All registered TaskManagers, slots will be accepted and used only if the resource is registered.
     */
    private final HashSet<ResourceID> registeredTaskManagers;

    /**
     * 所有分配的slot的book-keeping。
     * The book-keeping of all allocated slots.
     * */
    private final AllocatedSlots allocatedSlots;

    /**
     * 所有可用slot的 book-keeping
     * The book-keeping of all available slots.
     * */
    private final AvailableSlots availableSlots;

    /**
     * 等待slot的所有挂起请求。
     * All pending requests waiting for slots.
     * */
    private final DualKeyLinkedMap<SlotRequestId, AllocationID, PendingRequest> pendingRequests;

    /**
     * 等待连接 resource manager 的请求。
     * The requests that are waiting for the resource manager to be connected.
     * */
    private final LinkedHashMap<SlotRequestId, PendingRequest> waitingForResourceManager;

    /**
     *
     * 外部请求调用超时(例如,到ResourceManager或TaskExecutor)。
     * 
     * Timeout for external request calls (e.g. to the ResourceManager or the TaskExecutor).
     * */
    private final Time rpcTimeout;

    /** 
     * 释放空闲slot超时。
     * Timeout for releasing idle slots. */
    private final Time idleSlotTimeout;

    /**
     * 批处理slot请求超时。
     * Timeout for batch slot requests. */
    private final Time batchSlotTimeout;

    private final Clock clock;

    /** the fencing token of the job manager. */
    private JobMasterId jobMasterId;

    /** The gateway to communicate with resource manager. */
    @Nullable private ResourceManagerGateway resourceManagerGateway;

    // jobManager   Address
    private String jobManagerAddress;

    // 组件主线程执行器
    private ComponentMainThreadExecutor componentMainThreadExecutor;

    // 批slot请求超时检查已启用
    protected boolean batchSlotRequestTimeoutCheckEnabled;
  • 构造方法就是对属性的赋值操作
public SlotPoolImpl(
            JobID jobId,
            Clock clock,
            Time rpcTimeout,
            Time idleSlotTimeout,
            Time batchSlotTimeout) {

        this.jobId = checkNotNull(jobId);
        this.clock = checkNotNull(clock);
        this.rpcTimeout = checkNotNull(rpcTimeout);
        this.idleSlotTimeout = checkNotNull(idleSlotTimeout);
        this.batchSlotTimeout = checkNotNull(batchSlotTimeout);

        this.registeredTaskManagers = new HashSet<>(16);
        this.allocatedSlots = new AllocatedSlots();
        this.availableSlots = new AvailableSlots();
        this.pendingRequests = new DualKeyLinkedMap<>(16);
        this.waitingForResourceManager = new LinkedHashMap<>(16);

        this.jobMasterId = null;
        this.resourceManagerGateway = null;
        this.jobManagerAddress = null;

        this.componentMainThreadExecutor = null;

        this.batchSlotRequestTimeoutCheckEnabled = true;
    }

3.3. 生命周期相关接口

接口

含义

start

启动

suspend

挂起

close

关闭

3.3.1. start

/**
     * 启动slot池以接受RPC调用。
     * Start the slot pool to accept RPC calls.
     *
     * @param jobMasterId The necessary leader id for running the job.
     * @param newJobManagerAddress for the slot requests which are sent to the resource manager
     * @param componentMainThreadExecutor The main thread executor for the job master's main thread.
     */
    @Override
    public void start(
            @Nonnull JobMasterId jobMasterId,
            @Nonnull String newJobManagerAddress,
            @Nonnull ComponentMainThreadExecutor componentMainThreadExecutor)
            throws Exception {

        this.jobMasterId = jobMasterId;
        this.jobManagerAddress = newJobManagerAddress;
        this.componentMainThreadExecutor = componentMainThreadExecutor;

        // 超时相关操作
        scheduleRunAsync(this::checkIdleSlot, idleSlotTimeout);
        scheduleRunAsync(this::checkBatchSlotTimeout, batchSlotTimeout);

        if (log.isDebugEnabled()) {
            scheduleRunAsync(
                    this::scheduledLogStatus, STATUS_LOG_INTERVAL_MS, TimeUnit.MILLISECONDS);
        }
    }

3.3.2. suspend

/** 
     * 挂起此池,意味着它已失去接受和分发slot的权限。
     * Suspends this pool, meaning it has lost its authority to accept and distribute slots.
     * */
    @Override
    public void suspend() {

        componentMainThreadExecutor.assertRunningInMainThread();

        log.info("Suspending SlotPool.");

        // resourceManagerGateway 取消 SlotRequest操作
        cancelPendingSlotRequests();

        // do not accept any requests
        jobMasterId = null;
        resourceManagerGateway = null;

        // Clear (but not release!) the available slots. The TaskManagers should re-register them
        // at the new leader JobManager/SlotPool
        clear();
    }

3.3.3. close

@Override
    public void close() {
        log.info("Stopping SlotPool.");

        // 取消挂起的SlotRequests
        cancelPendingSlotRequests();

        // 释放资源
        // 通过释放相应的TaskExecutor来释放所有注册的插槽
        // release all registered slots by releasing the corresponding TaskExecutors
        for (ResourceID taskManagerResourceId : registeredTaskManagers) {
            final FlinkException cause =
                    new FlinkException(
                            "Releasing TaskManager "
                                    + taskManagerResourceId
                                    + ", because of stopping of SlotPool");
            releaseTaskManagerInternal(taskManagerResourceId, cause);
        }

        clear();
    }

3.4. resource manager 连接相关

接口

含义

connectToResourceManager

与ResourceManager建立连接

disconnectResourceManager

关闭ResourceManager连接

registerTaskManager

通过给定的ResourceId 注册一个TaskExecutor

releaseTaskManager

释放TaskExecutor

3.4.1. connectToResourceManager

与ResourceManager建立连接, 处理阻塞/挂起的请求…

@Override
    public void connectToResourceManager(@Nonnull ResourceManagerGateway resourceManagerGateway) {
        this.resourceManagerGateway = checkNotNull(resourceManagerGateway);

        // 处理挂起的PendingRequest 请求. 
        // work on all slots waiting for this connection
        for (PendingRequest pendingRequest : waitingForResourceManager.values()) {
            // 请求 RM / 获取资源
            requestSlotFromResourceManager(resourceManagerGateway, pendingRequest);
        }

        // all sent off
        waitingForResourceManager.clear();
    }

3.4.2. disconnectResourceManager

关闭ResourceManager 连接.

@Override
    public void disconnectResourceManager() {
        this.resourceManagerGateway = null;
    }

3.4.3. registerTaskManager

/**
     * 
     * 将TaskManager注册到此 pool ,只有来自已注册TaskManager的slot才被视为有效。
     * 它还为我们提供了一种方法,使“dead”或“abnormal”任务管理者远离这个池
     * 
     * 
     * Register TaskManager to this pool, only those slots come from registered TaskManager will be  considered valid.
     *
     * Also it provides a way for us to keep "dead" or "abnormal" TaskManagers out of this pool.
     *
     * @param resourceID The id of the TaskManager
     */
    @Override
    public boolean registerTaskManager(final ResourceID resourceID) {

        componentMainThreadExecutor.assertRunningInMainThread();

        // Register new TaskExecutor container_1615446205104_0025_01_000002(192.168.8.188:57958).
        log.debug("Register new TaskExecutor {}.", resourceID.getStringWithMetadata());
        return registeredTaskManagers.add(resourceID);
    }

3.4.4. releaseTaskManager

/**
     * 
     * 从该池中注销TaskManager,将释放所有相关slot并取消任务。
     * 当我们发现某个TaskManager变得“dead”或“abnormal”,并且我们决定不再使用其中的slot时调用。
     * 
     * 
     * Unregister TaskManager from this pool, all the related slots will be released and tasks be canceled.
     *
     * Called when we find some TaskManager becomes "dead" or "abnormal", and we decide to not using slots from it anymore.
     *
     * @param resourceId The id of the TaskManager
     * @param cause for the releasing of the TaskManager
     */
    @Override
    public boolean releaseTaskManager(final ResourceID resourceId, final Exception cause) {

        componentMainThreadExecutor.assertRunningInMainThread();

        if (registeredTaskManagers.remove(resourceId)) {
            releaseTaskManagerInternal(resourceId, cause);
            return true;
        } else {
            return false;
        }
    }

3.5. Slot操作相关

接口

含义

offerSlots

消费slot

failAllocation

根据给定的allocation id 标识slot为失败

getAvailableSlotsInformation

获取当前可用的slots 信息.

getAllocatedSlotsInformation

获取所有的slot信息

allocateAvailableSlot

在给定的 request id 下使用给定的 allocation id 分配可用的slot。

如果没有具有给定分配id的插槽可用,则此方法返回{@code null}。

requestNewAllocatedSlot

从resource manager 请求分配新slot。

此方法不会从池中已经可用的slot返回slot,而是将向该池添加一个新slot,该slot将立即分配并返回。

requestNewAllocatedBatchSlot

从 resource manager 请求分配新的批处理slot

与普通slot不同,批处理slot只有在slot池不包含合适的slot时才会超时。

此外,它不会对来自资源管理器的故障信号做出反应。

disableBatchSlotRequestTimeoutCheck

禁用批处理slot请求超时检查。

当其他人要接管超时检查职责时调用。

createAllocatedSlotReport

创建有关属于指定 task manager 的已分配slot的报告。

3.5.1. offerSlots

提供slot操作…

/**
     *
     * 根据AllocationID , TaskExecutor 提供Slot
     *
     * AllocationID最初由该 pool 生成,并通过ResourceManager传输到TaskManager
     *
     * 我们用它来区分我们发行的不同分配。
     *
     * 如果我们发现某个Slot不匹配或实际上没有等待此Slot的挂起请求(可能由其他返回的Slot完成),则Slot提供可能会被拒绝。
     *
     *
     * Slot offering by TaskExecutor with AllocationID.
     *
     * The AllocationID is originally generated by this pool and transfer through the ResourceManager to TaskManager.
     *
     * We use it to distinguish the different allocation we issued.
     *
     * Slot offering may be rejected if we find something mismatching or there is actually no pending request waiting for this slot (maybe fulfilled by some other returned slot).
     *
     * @param taskManagerLocation location from where the offer comes from
     * @param taskManagerGateway TaskManager gateway
     * @param slotOffer the offered slot
     * @return True if we accept the offering
     */
    boolean offerSlot(
            final TaskManagerLocation taskManagerLocation,
            final TaskManagerGateway taskManagerGateway,
            final SlotOffer slotOffer) {

        componentMainThreadExecutor.assertRunningInMainThread();

        // 检测 TaskManager是否有效
        // check if this TaskManager is valid
        final ResourceID resourceID = taskManagerLocation.getResourceID();
        final AllocationID allocationID = slotOffer.getAllocationId();

        // 必须是已注册的TaskManagers 中的slotOffer
        if (!registeredTaskManagers.contains(resourceID)) {
            log.debug(
                    "Received outdated slot offering [{}] from unregistered TaskManager: {}",
                    slotOffer.getAllocationId(),
                    taskManagerLocation);
            return false;
        }

        // 检查是否已使用此slot
        // check whether we have already using this slot
        AllocatedSlot existingSlot;
        if ((existingSlot = allocatedSlots.get(allocationID)) != null
                || (existingSlot = availableSlots.get(allocationID)) != null) {

            //  我们需要弄清楚这是对完全相同的slot的重复offer,
            //  还是在ResourceManager重新尝试请求后来自不同TaskManager的另一个offer
            
            // we need to figure out if this is a repeated offer for the exact same slot,
            // or another offer that comes from a different TaskManager after the ResourceManager
            // re-tried the request

            // 我们用比较SlotID的方式来写这个,因为SlotIDD是 TaskManager上实际slot的标识符
            
            // we write this in terms of comparing slot IDs, because the Slot IDs are the
            // identifiers of
            // the actual slots on the TaskManagers
            // Note: The slotOffer should have the SlotID

            // 获取已存在的SlotID
            final SlotID existingSlotId = existingSlot.getSlotId();
            // 获取新的SlotID
            final SlotID newSlotId =
                    new SlotID(taskManagerLocation.getResourceID(), slotOffer.getSlotIndex());

            if (existingSlotId.equals(newSlotId)) {
                log.info("Received repeated offer for slot [{}]. Ignoring.", allocationID);

                // SlotID 相同属于重复消费
                // 在此处返回true,这样发送方将获得对重试的肯定确认,并将产品标记为成功

                // return true here so that the sender will get a positive acknowledgement to the
                // retry and mark the offering as a success
                return true;
            } else {
                // 分配已由另一个插槽完成,请拒绝提供,以便任务执行器将该插槽提供给资源管理器

                // the allocation has been fulfilled by another slot, reject the offer so the task
                // executor will offer the slot to the resource manager
                return false;
            }
        }

        // 到这里代表这个slot还没有人用过.
        // 构造allocatedSlot 实例.
        final AllocatedSlot allocatedSlot =
                new AllocatedSlot(
                        allocationID,
                        taskManagerLocation,
                        slotOffer.getSlotIndex(),
                        slotOffer.getResourceProfile(),
                        taskManagerGateway);

        // 使用 slot 以请求的顺序完成挂起的请求
        // use the slot to fulfill pending request, in requested order
        tryFulfillSlotRequestOrMakeAvailable(allocatedSlot);

        // 无论如何我么都接受了这个请求.
        // slot在空闲时间过长和超时后将被释放

        // we accepted the request in any case.
        // slot will be released after it idled for too long and timed out
        return true;
    }
/**
     * 
     * 尝试使用给定的已分配slot完成挂起的slot请求,
     *
     * 或者如果没有匹配的请求,则将已分配的slot归还到可用slot集。
     * 
     * Tries to fulfill with the given allocated slot a pending slot request
     * or
     * add the allocated slot to the set of available slots if no matching request is available.
     *
     * @param allocatedSlot which shall be returned
     */
    private void tryFulfillSlotRequestOrMakeAvailable(AllocatedSlot allocatedSlot) {
        Preconditions.checkState(!allocatedSlot.isUsed(), "Provided slot is still in use.");

        // 获取PendingRequest
        final PendingRequest pendingRequest = findMatchingPendingRequest(allocatedSlot);

        if (pendingRequest != null) {

            // Fulfilling pending slot request [
            //      SlotRequestId{d3517a9282334314b63f9493850f55f0}
            // ] with slot [
            //      3755cb8f9962a9a7738db04f2a02084c
            // ]
            log.debug(
                    "Fulfilling pending slot request [{}] with slot [{}]",
                    pendingRequest.getSlotRequestId(),
                    allocatedSlot.getAllocationId());

            // 将请求从 请求队列中移除 .
            removePendingRequest(pendingRequest.getSlotRequestId());


            // 将当前分配的slot加入到已分配的allocatedSlots集合中, 标识已被使用.
            allocatedSlots.add(pendingRequest.getSlotRequestId(), allocatedSlot);

            // 回调请求,返回allocatedSlot 信息.  标识slot分配已经完成...
            pendingRequest.getAllocatedSlotFuture().complete(allocatedSlot);

            // 一旦相应的请求被删除,这个分配就可能成为孤立的
            // this allocation may become orphan once its corresponding request is removed
            final Optional<AllocationID> allocationIdOfRequest = pendingRequest.getAllocationId();


            // 处理重新连接操作.
            
            // 如果请求是由重新连接的TaskExecutor在连接ResourceManager之前直接提供的插槽完成的,
            // 则分配id可以为null

            // the allocation id can be null if the request was fulfilled by a slot directly offered
            // by a reconnected TaskExecutor before the ResourceManager is connected
            if (allocationIdOfRequest.isPresent()) {
                maybeRemapOrphanedAllocation(
                        allocationIdOfRequest.get(), allocatedSlot.getAllocationId());
            }
        } else {
            // 没有可用的PendingRequest , 归还allocatedSlot .
            log.debug("Adding slot [{}] to available slots", allocatedSlot.getAllocationId());
            availableSlots.add(allocatedSlot, clock.relativeTimeMillis());
        }
    }

3.5.2. failAllocation

/**
     *
     * 失败指定的分配和释放相应的slot,如果我们有一个。
     * 当某些slot分配因rpcTimeout失败时,这可能由JobManager触发。
     * 或者,当TaskManager发现slot出了问题并决定收回slot时,可能会触发这种情况。
     *
     * Fail the specified allocation and release the corresponding slot if we have one.
     *
     * This may triggered by JobManager when some slot allocation failed with rpcTimeout.
     *
     * Or this could be triggered by TaskManager, when it finds out something went wrong with the slot, and decided to take it back.
     *
     * @param allocationID Represents the allocation which should be failed
     * @param cause The cause of the failure
     * @return Optional task executor if it has no more slots registered
     */
    @Override
    public Optional<ResourceID> failAllocation(
            final AllocationID allocationID, final Exception cause) {

        componentMainThreadExecutor.assertRunningInMainThread();

        // 获取PendingRequest
        final PendingRequest pendingRequest = pendingRequests.getValueByKeyB(allocationID);
        if (pendingRequest != null) {
            
            if (isBatchRequestAndFailureCanBeIgnored(pendingRequest, cause)) {
                log.debug(
                        "Ignoring allocation failure for batch slot request {}.",
                        pendingRequest.getSlotRequestId());
            } else {
                // request was still pending
                removePendingRequest(pendingRequest.getSlotRequestId());
                failPendingRequest(pendingRequest, cause);
            }
            return Optional.empty();
        } else {
            // 处理失败..
            return tryFailingAllocatedSlot(allocationID, cause);
        }

        // TODO: add some unit tests when the previous two are ready, the allocation may failed at
        // any phase
    }
  • 处理分配失败的slot
private Optional<ResourceID> tryFailingAllocatedSlot(
            AllocationID allocationID, Exception cause) {
        
        // 获取分配失败的AllocatedSlot 
        AllocatedSlot allocatedSlot = availableSlots.tryRemove(allocationID);

        if (allocatedSlot == null) {
            allocatedSlot = allocatedSlots.remove(allocationID);
        }

        if (allocatedSlot != null) {
            log.debug("Failed allocated slot [{}]: {}", allocationID, cause.getMessage());

            // 通知TaskExecutor 分配失败了..
            // notify TaskExecutor about the failure
            allocatedSlot.getTaskManagerGateway().freeSlot(allocationID, cause, rpcTimeout);
            // release the slot.
            // since it is not in 'allocatedSlots' any more, it will be dropped o return'
            
            // 释放slot,并且将这个slot丢弃
            allocatedSlot.releasePayload(cause);

            final ResourceID taskManagerId = allocatedSlot.getTaskManagerId();

            if (!availableSlots.containsTaskManager(taskManagerId)
                    && !allocatedSlots.containResource(taskManagerId)) {
                return Optional.of(taskManagerId);
            }
        }

        return Optional.empty();
    }

3.5.3. getAvailableSlotsInformation

获取可用的slot信息

@Override
    @Nonnull
    public Collection<SlotInfoWithUtilization> getAvailableSlotsInformation() {
        final Map<ResourceID, Set<AllocatedSlot>> availableSlotsByTaskManager = availableSlots.getSlotsByTaskManager();
        final Map<ResourceID, Set<AllocatedSlot>> allocatedSlotsByTaskManager = allocatedSlots.getSlotsByTaskManager();

        return availableSlotsByTaskManager.entrySet().stream()
                .flatMap(
                        entry -> {
                            final int numberAllocatedSlots =
                                    allocatedSlotsByTaskManager
                                            .getOrDefault(entry.getKey(), Collections.emptySet())
                                            .size();
                            final int numberAvailableSlots = entry.getValue().size();
                            final double taskExecutorUtilization =
                                    (double) numberAllocatedSlots
                                            / (numberAllocatedSlots + numberAvailableSlots);

                            return entry.getValue().stream()
                                    .map(
                                            slot ->
                                                    SlotInfoWithUtilization.from(
                                                            slot, taskExecutorUtilization));
                        })
                .collect(Collectors.toList());
    }

3.5.4. getAllocatedSlotsInformation

获取所有已分配的solt信息

@Override
    public Collection<SlotInfo> getAllocatedSlotsInformation() {
        return allocatedSlots.listSlotInfo();
    }

3.5.5. allocateAvailableSlot

获取所有已有效的solt信息

@Override
    public Optional<PhysicalSlot> allocateAvailableSlot(
            @Nonnull SlotRequestId slotRequestId, @Nonnull AllocationID allocationID) {

        componentMainThreadExecutor.assertRunningInMainThread();

        AllocatedSlot allocatedSlot = availableSlots.tryRemove(allocationID);
        if (allocatedSlot != null) {
            allocatedSlots.add(slotRequestId, allocatedSlot);
            return Optional.of(allocatedSlot);
        } else {
            return Optional.empty();
        }
    }

3.5.6. requestNewAllocatedSlot

从resource manager 请求分配新slot。 此方法不会从池中已经可用的slot返回slot,而是将向该池添加一个新slot,该slot将立即分配并返回。

@Nonnull
    @Override
    public CompletableFuture<PhysicalSlot> requestNewAllocatedSlot(
            @Nonnull SlotRequestId slotRequestId,
            @Nonnull ResourceProfile resourceProfile,
            @Nullable Time timeout) {

        componentMainThreadExecutor.assertRunningInMainThread();

        // 构建PendingRequest
        final PendingRequest pendingRequest =
                PendingRequest.createStreamingRequest(slotRequestId, resourceProfile);

        if (timeout != null) {
            // 设置超时时间
            // register request timeout
            FutureUtils.orTimeout(
                            pendingRequest.getAllocatedSlotFuture(),
                            timeout.toMilliseconds(),
                            TimeUnit.MILLISECONDS,
                            componentMainThreadExecutor)
                    .whenComplete(
                            (AllocatedSlot ignored, Throwable throwable) -> {
                                if (throwable instanceof TimeoutException) {
                                    timeoutPendingSlotRequest(slotRequestId);
                                }
                            });
        }

        return requestNewAllocatedSlotInternal(pendingRequest).thenApply((Function.identity()));
    }




    /**
     * 从RM中请求一个新的slot
     *
     *
     * Requests a new slot from the ResourceManager. If there is currently not ResourceManager
     * connected, then the request is stashed and send once a new ResourceManager is connected.
     *
     * @param pendingRequest pending slot request
     * @return An {@link AllocatedSlot} future which is completed once the slot is offered to the
     *     {@link SlotPool}
     */
    @Nonnull
    private CompletableFuture<AllocatedSlot> requestNewAllocatedSlotInternal(
            PendingRequest pendingRequest) {

        if (resourceManagerGateway == null) {
            stashRequestWaitingForResourceManager(pendingRequest);
        } else {
            // 从RM中请求一个新的slot
            requestSlotFromResourceManager(resourceManagerGateway, pendingRequest);
        }

        return pendingRequest.getAllocatedSlotFuture();
    }

3.5.7. requestNewAllocatedBatchSlot

从 resource manager 请求分配新的批处理slot 与普通slot不同,批处理slot只有在slot池不包含合适的slot时才会超时。 此外,它不会对来自资源管理器的故障信号做出反应。

@Nonnull
    @Override
    public CompletableFuture<PhysicalSlot> requestNewAllocatedBatchSlot(
            @Nonnull SlotRequestId slotRequestId, @Nonnull ResourceProfile resourceProfile) {

        componentMainThreadExecutor.assertRunningInMainThread();

        final PendingRequest pendingRequest =
                PendingRequest.createBatchRequest(slotRequestId, resourceProfile);

        return requestNewAllocatedSlotInternal(pendingRequest).thenApply(Function.identity());
    }

3.5.8. disableBatchSlotRequestTimeoutCheck

禁用批处理slot请求超时检查。
当其他人要接管超时检查职责时调用。

@Override
    public void disableBatchSlotRequestTimeoutCheck() {
        batchSlotRequestTimeoutCheckEnabled = false;
    }

3.5.9. createAllocatedSlotReport

创建有关属于指定 task manager 的已分配slot的报告。

@Override
    public AllocatedSlotReport createAllocatedSlotReport(ResourceID taskManagerId) {
        final Set<AllocatedSlot> availableSlotsForTaskManager =
                availableSlots.getSlotsForTaskManager(taskManagerId);
        final Set<AllocatedSlot> allocatedSlotsForTaskManager =
                allocatedSlots.getSlotsForTaskManager(taskManagerId);

        List<AllocatedSlotInfo> allocatedSlotInfos =
                new ArrayList<>(
                        availableSlotsForTaskManager.size() + allocatedSlotsForTaskManager.size());
        for (AllocatedSlot allocatedSlot :
                Iterables.concat(availableSlotsForTaskManager, allocatedSlotsForTaskManager)) {
            allocatedSlotInfos.add(
                    new AllocatedSlotInfo(
                            allocatedSlot.getPhysicalSlotNumber(),
                            allocatedSlot.getAllocationId()));
        }
        return new AllocatedSlotReport(jobId, allocatedSlotInfos);
    }