基于Flink1.8版本,分析Flink各节点之间的RPC实现:
- 介绍RPC相关的主要接口
- RPC节点之间的通信方式
Flink老版本处理Rpc时,各节点通过继承FlinkActor接口,接收Actor消息,根据消息类型进行不同的业务处理。此种方式将流程业务和具体通信组件耦合在一起,不利于后期更换通信组件(如使用netty),因此Flink引入了RPC调用,各节点通过GateWay方式回调,隐藏通信组件的细节,实现解耦。
RPC相关的主要接口
- RpcEndpoint
- RpcService
- RpcGateway
RpcEndpoint:远程过程调用(remote procedure calls) 的基类
RpcEndpoint是Flink RPC终端的基类,所有提供远程过程调用的分布式组件必须扩展RpcEndpoint, RpcEndpoint功能由RpcService支持。
RpcEndpoint子类
如上图所示,RpcEndpoint的子类只有四类组件:Dispatcher,JobMaster,ResourceManager,TaskExecutor,即Flink中只有这四个组件有RPC的能力,换句话说只有这四个组件有RPC的这个需求。
这也对应了Flink这的四大组件:Dispatcher,JobMaster,ResourceManager,TaskExecutor,彼此之间的通信需要依赖RPC实现。(目前通信组件依然是Akka)
RpcService:RPC服务提供者
RpcServer是RpcEndpoint的成员变量,为RpcService提供RPC服务,连接远程Server,其只有一个子类实现:AkkaRpcService,可见目前Flink的通信方式依然是Akka。
RpcServer用于启动和连接到RpcEndpoint, 连接到rpc服务器将返回一个RpcGateway,可用于调用远程过程。
Flink四大组件Dispatcher,JobMaster,ResourceManager,TaskExecutor,都是RpcEndpoint的实现,所以构建四大组件时,同步需要初始化RpcServer。
如JobManager的构造方式,第一个参数就是需要知道RpcService :
public JobMaster(
RpcService rpcService,
JobMasterConfiguration jobMasterConfiguration,
ResourceID resourceId,
JobGraph jobGraph,
HighAvailabilityServices highAvailabilityService,
SlotPoolFactory slotPoolFactory,
SchedulerFactory schedulerFactory,
JobManagerSharedServices jobManagerSharedServices,
HeartbeatServices heartbeatServices,
JobManagerJobMetricGroupFactory jobMetricGroupFactory,
OnCompletionActions jobCompletionActions,
FatalErrorHandler fatalErrorHandler,
ClassLoader userCodeLoader){}
所有的RpcService都是通过AkkaRpcServiceUtils这个工具类的createRpcService方法创建的。
RpcGateway:RPC调用的网关
RpcGateway主要实现接口有:FencedRpcEndpoint和TaskExecutorGateway,而这两个接口又分别被Flink四大组件继承,即Dispatcher,JobMaster,ResourceManager,TaskExecutor可通过各自的Gateway实现RPC调用。
- Rpc gateway interface,所有Rpc组件的网关,定义了各组件的Rpc接口
- 常见的就是Rpc实现,如JobMasterGateway,DispatcherGateway,ResourceManagerGateway,TaskExecutorGateway等
- 各组件类的成员变量都有需要通信的其他组件的GateWay实现类,便于Rpc调用
以JobMaster为例,JobMaster实现JobMasterGateway接口,JobMasterGateway接口中定义的方法如下:
public interface JobMasterGateway extends
CheckpointCoordinatorGateway,
FencedRpcGateway<JobMasterId>,
KvStateLocationOracle,
KvStateRegistryGateway {
/**
* 取消正在执行的任务(与TaskExecutorGateway交互)
*/
CompletableFuture<Acknowledge> cancel(@RpcTimeout Time timeout);
/**
* 取消正在执行的任务(与TaskExecutorGateway交互)
*/
CompletableFuture<Acknowledge> stop(@RpcTimeout Time timeout);
/**
* 修改正在运行的任务的并行度(与TaskExecutorGateway交互)
*/
CompletableFuture<Acknowledge> rescaleJob(
int newParallelism,
RescalingBehaviour rescalingBehaviour,
@RpcTimeout Time timeout);
/**
* 修改指定算子的并行度(与TaskExecutorGateway交互)
*/
CompletableFuture<Acknowledge> rescaleOperators(
Collection<JobVertexID> operators,
int newParallelism,
RescalingBehaviour rescalingBehaviour,
@RpcTimeout Time timeout);
CompletableFuture<Acknowledge> updateTaskExecutionState(
final TaskExecutionState taskExecutionState);
CompletableFuture<SerializedInputSplit> requestNextInputSplit(
final JobVertexID vertexID,
final ExecutionAttemptID executionAttempt);
CompletableFuture<ExecutionState> requestPartitionState(
final IntermediateDataSetID intermediateResultId,
final ResultPartitionID partitionId);
CompletableFuture<Acknowledge> scheduleOrUpdateConsumers(
final ResultPartitionID partitionID,
@RpcTimeout final Time timeout);
CompletableFuture<Acknowledge> disconnectTaskManager(ResourceID resourceID, Exception cause);
/**
* 和ResourceManager断开连接(与ResourceManager交互)
*/
void disconnectResourceManager(
final ResourceManagerId resourceManagerId,
final Exception cause);
/**
* Offers the given slots to the job manager. The response contains the set of accepted slots.
*
* @param taskManagerId identifying the task manager
* @param slots to offer to the job manager
* @param timeout for the rpc call
* @return Future set of accepted slots.
*/
CompletableFuture<Collection<SlotOffer>> offerSlots(
final ResourceID taskManagerId,
final Collection<SlotOffer> slots,
@RpcTimeout final Time timeout);
void failSlot(final ResourceID taskManagerId,
final AllocationID allocationId,
final Exception cause);
ableFuture<RegistrationResponse> registerTaskManager(
final String taskManagerRpcAddress,
final TaskManagerLocation taskManagerLocation,
@RpcTimeout final Time timeout);
void heartbeatFromTaskManager(
final ResourceID resourceID,
final AccumulatorReport accumulatorReport);
/**
* Sends heartbeat request from the resource manager.
*
* @param resourceID unique id of the resource manager
*/
void heartbeatFromResourceManager(final ResourceID resourceID);
CompletableFuture<JobDetails> requestJobDetails(@RpcTimeout Time timeout);
CompletableFuture<JobStatus> requestJobStatus(@RpcTimeout Time timeout);
CompletableFuture<ArchivedExecutionGraph> requestJob(@RpcTimeout Time timeout);
CompletableFuture<String> triggerSavepoint(
@Nullable final String targetDirectory,
final boolean cancelJob,
@RpcTimeout final Time timeout);
CompletableFuture<OperatorBackPressureStatsResponse> requestOperatorBackPressureStats(JobVertexID jobVertexId);
void notifyAllocationFailure(AllocationID allocationID, Exception cause);
CompletableFuture<Object> updateGlobalAggregate(String aggregateName, Object aggregand, byte[] serializedAggregationFunction);
}
上面JobMasterGateway 定义的方法有两类返回值:Void和CompletableFuture:
- Void返回值:表示从其他组件(如Dispatcher)触发动作,JobMaster中定义此方法作为Dispatcher的回调;
- CompletableFuture返回值:表示将此方法的实现由JobManager主动调用,并且该方法中一般都有其他组件的Gateway调用
总结:
之前版本跨节点的通信是直接基于Akka,现在Flink1.8基于业务需要,定义各组件的GateWay,方便直接使用Rpc,但是底层依然是Akka。好处在于,GateWay在具体组件中排出了Akka相关代码,将业务和通信方式进行分离,便于后期更换通信方式,如netty