0x06 RM调度-MR任务提交-服务端分析
上文我们提到过,Yarn中Client和RM交互的协议是ApplicationClientProtocol
,我们已经分析过这一协议在客户端的实现ApplicationClientProtocolPBClientImpl
,这一章节我们就从分析这一协议服务端的实现ClientRMService
开始。
6.1 获取JobID
6.2.1 ClientRMService
我们来看看到底这个applicationId
是怎么生成的:
@Override
public GetNewApplicationResponse getNewApplication(
GetNewApplicationRequest request) throws YarnException {
GetNewApplicationResponse response = recordFactory
.newRecordInstance(GetNewApplicationResponse.class);
// 这句话是关键 ,调用了getNewApplicationId方法
response.setApplicationId(getNewApplicationId());
// 前面说过,GetNewApplicationResponse还包含集群的当前容量信息
response.setMaximumResourceCapability(scheduler
.getMaximumResourceCapability());
return response;
}
下面看看getNewApplicationId
方法:
ApplicationId getNewApplicationId() {
// 这里就是生成ID的代码,可以看到传入了时间戳和一个自增的原applicationCounter
ApplicationId applicationId = org.apache.hadoop.yarn.server.utils.BuilderUtils
.newApplicationId(recordFactory, ResourceManager.getClusterTimeStamp(),
applicationCounter.incrementAndGet());
LOG.info("Allocated new applicationId: " + applicationId.getId());
return applicationId;
}
看看应用计数器applicationCounter
的定义,一个原子型的int值:
final private AtomicInteger applicationCounter = new AtomicInteger(0);
6.4.2 BuilderUtils
这个类的注释很简单,用来辅助构建一些不同的对象。我们看看前面用到的newApplicationId
方法:
public static ApplicationId newApplicationId(RecordFactory recordFactory,
long clustertimestamp, CharSequence id) {
return ApplicationId.newInstance(clustertimestamp,
Integer.parseInt(id.toString()));
}
是不是感觉很坑,recordFactory参数根本没用,传进来干啥呢?
6.4.3 ApplicationId
这个类注释:ApplicationId代表应用的全局唯一表示,他的唯一性是用集群时间戳(如RM的启动时间)和一个单调自增的application计数器保证的。下面我们看前面使用的newInstance
方法:
public static ApplicationId newInstance(long clusterTimestamp, int id) {
ApplicationId appId = Records.newRecord(ApplicationId.class);
appId.setClusterTimestamp(clusterTimestamp);
appId.setId(id);
appId.build();
return appId;
}
再往下用了google.protobuf来生成appId,因为作者目前还没有深入学习过protobuf,所以无法再往下深入,待有空研究后补上。
到这里我们关于ApplicationId的生成流程就讲完了,可见生成ApplicationId其实还没有真正提交应用到集群执行,下面我们开始讲Job提交流程。
6.2 Job提交
6.2.1 ClientRMService
@Override
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException {
ApplicationSubmissionContext submissionContext = request
.getApplicationSubmissionContext();
ApplicationId applicationId = submissionContext.getApplicationId();
// ApplicationSubmissionContext needs to be validated for safety - only
// those fields that are independent of the RM's configuration will be
// checked here, those that are dependent on RM configuration are validated
// in RMAppManager.
String user = null;
try {
// Safety
user = UserGroupInformation.getCurrentUser().getShortUserName();
} catch (IOException ie) {
LOG.warn("Unable to get the current user.", ie);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
ie.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw RPCUtil.getRemoteException(ie);
}
// Check whether app has already been put into rmContext,
// If it is, simply return the response
if (rmContext.getRMApps().get(applicationId) != null) {
LOG.info("This is an earlier submitted application: " + applicationId);
return SubmitApplicationResponse.newInstance();
}
if (submissionContext.getQueue() == null) {
submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
}
if (submissionContext.getApplicationName() == null) {
submissionContext.setApplicationName(
YarnConfiguration.DEFAULT_APPLICATION_NAME);
}
if (submissionContext.getApplicationType() == null) {
submissionContext
.setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
} else {
if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) {
submissionContext.setApplicationType(submissionContext
.getApplicationType().substring(0,
YarnConfiguration.APPLICATION_TYPE_LENGTH));
}
}
try {
// 调用rmAppManager来启动Application。此君在Yarn启动章节提到过。
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user);
LOG.info("Application with id " + applicationId.getId() +
" submitted by user " + user);
RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
"ClientRMService", applicationId);
} catch (YarnException e) {
LOG.info("Exception in submitting application with id " +
applicationId.getId(), e);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
e.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw e;
}
SubmitApplicationResponse response = recordFactory
.newRecordInstance(SubmitApplicationResponse.class);
return response;
}
6.2.2 RMAppManager
这就是我们提到过管理App提交、结束、恢复等操作的RMAppManager
,下面我们看他的submitApplication
方法:
protected void submitApplication(
ApplicationSubmissionContext submissionContext, long submitTime,
String user) throws YarnException {
// 从提交的应用上下文获取appID
ApplicationId applicationId = submissionContext.getApplicationId();
// 创建RMAppImpl 这个类代表RM中运行的Application
RMAppImpl application =
createAndPopulateNewRMApp(submissionContext, submitTime, user, false);
// 获取最新的ApplicationId
ApplicationId appId = submissionContext.getApplicationId();
Credentials credentials = null;
try {
credentials = parseCredentials(submissionContext);
if (UserGroupInformation.isSecurityEnabled()) {
this.rmContext.getDelegationTokenRenewer().addApplicationAsync(appId,
credentials, submissionContext.getCancelTokensWhenComplete(),
application.getUser());
} else {
// 这里就是传RMAppEventType.START事件给AsyncDispatcher处理
// Dispatcher此时还没有启动,所以这里触发并入队的START事件应该被保证在dispatcher启动后第一时间处理
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.START));
}
} catch (Exception e) {
LOG.warn("Unable to parse credentials.", e);
// Sending APP_REJECTED is fine, since we assume that the
// RMApp is in NEW state and thus we haven't yet informed the
// scheduler about the existence of the application
assert application.getState() == RMAppState.NEW;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, e.getMessage()));
throw RPCUtil.getRemoteException(e);
}
}
看一下上面用到的createAndPopulateNewRMApp
方法:
private RMAppImpl createAndPopulateNewRMApp(
ApplicationSubmissionContext submissionContext, long submitTime,
String user, boolean isRecovery) throws YarnException {
ApplicationId applicationId = submissionContext.getApplicationId();
// 验证submissionContext并创建资源请求
// ResourceRequest 代表一个由app发给RM的申请多个不同的contaner配额,
// 包括了优先级、期望的机器或者机架名(*表示任意)、所需的资源、所需的container数、本地资源松弛(默认true)
ResourceRequest amReq =
validateAndCreateResourceRequest(submissionContext, isRecovery);
// 创建RMApp
RMAppImpl application =
new RMAppImpl(applicationId, rmContext, this.conf,
submissionContext.getApplicationName(), user,
submissionContext.getQueue(),
submissionContext, this.scheduler, this.masterService,
submitTime, submissionContext.getApplicationType(),
submissionContext.getApplicationTags(), amReq);
// 注意这里就将aplication放到了romContext中activeServiceContext内的容器,
// 这个容器是一个以appId为key,RMApp为value的ConcurrentMap
// 如果app并行提交时传入了相同applicationId,会失败并抛异常
if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
null) {
String message = "Application with id " + applicationId
+ " is already present! Cannot add a duplicate!";
LOG.warn(message);
throw new YarnException(message);
}
// Inform the ACLs Manager
this.applicationACLsManager.addApplication(applicationId,
submissionContext.getAMContainerSpec().getApplicationACLs());
String appViewACLs = submissionContext.getAMContainerSpec()
.getApplicationACLs().get(ApplicationAccessType.VIEW_APP);
rmContext.getSystemMetricsPublisher().appACLsUpdated(
application, appViewACLs, System.currentTimeMillis());
return application;
}
6.2.3 AsyncDispatcher-RMAppEventType.START
前面提到过,这是一个异步的事件处理器。这里会经历GenericEventHandler.handle
->createThread
->dispatch
,最后找到跟事件类型对应的handler调用handle方法进行处理。
在提交application时我们传入的事件class是RMAppEvent
,他的type是org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType
,那我们看看该类型对应的hanler :org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher
是怎么处理该事件的:
6.2.4 ApplicationEventDispatcher
他是ResourceManager的内部类。
@Override
public void handle(RMAppEvent event) {
ApplicationId appID = event.getApplicationId();
// 从RM上下文中获取该app
RMApp rmApp = this.rmContext.getRMApps().get(appID);
if (rmApp != null) {
try {
rmApp.handle(event);
} catch (Throwable t) {
LOG.error("Error in handling event type " + event.getType()
+ " for application " + appID, t);
}
}
}
6.2.5 RMAppImpl
@Override
public void handle(RMAppEvent event) {
// 写锁锁定
this.writeLock.lock();
try {
ApplicationId appID = event.getApplicationId();
LOG.debug("Processing event for " + appID + " of type "
+ event.getType());
final RMAppState oldState = getState();
try {
// 让主服务器与状态机保持同步 这里传入的状态是RMAppEventType.START,RMAppEvent
this.stateMachine.doTransition(event.getType(), event);
} catch (InvalidStateTransitonException e) {
LOG.error("Can't handle this event at current state", e);
/* TODO fail the application on the failed transition */
}
// 如果状态变更,就记录日志
if (oldState != getState()) {
LOG.info(appID + " State change from " + oldState + " to "
+ getState());
}
} finally {
//最后解除写锁
this.writeLock.unlock();
}
}
6.2.6 StateMachineFactory
这个名字很好认啊,状态机工厂,我们在这里输入事件是Start,状态是RMAppState.NEW:
@Override
public synchronized STATE doTransition(EVENTTYPE eventType, EVENT event)
throws InvalidStateTransitonException {
// 我们传进来的是EventType: START,
// operand是RMAppImpl,currentState是RMAppState.NEW,eventType是START,event是RMAppEvent
currentState = StateMachineFactory.this.doTransition
(operand, currentState, eventType, event);
// 经过StateMachineFactory.this.doTransition 后 currentState是RMAppState.NEW_SAVIN
return currentState;
}
这里提一下RMAppState
:
public enum RMAppState {
NEW,
NEW_SAVING,
SUBMITTED,
ACCEPTED,
RUNNING,
FINAL_SAVING,
FINISHING,
FINISHED,
FAILED,
KILLING,
KILLED
}
接着看StateMachineFactory.this.doTransition(operand, currentState, eventType, event)
private STATE doTransition
(OPERAND operand, STATE oldState, EVENTTYPE eventType, EVENT event)
throws InvalidStateTransitonException {
// 理解这一步很重要,他是一个存有当前状态为key,能转移的目标状态map为value的映射表
Map<EVENTTYPE, Transition<OPERAND, STATE, EVENTTYPE, EVENT>> transitionMap
= stateMachineTable.get(oldState);
if (transitionMap != null) {
Transition<OPERAND, STATE, EVENTTYPE, EVENT> transition
= transitionMap.get(eventType);
if (transition != null) {
// 走到这里说明可以转换状态
return transition.doTransition(operand, oldState, event, eventType);
}
}
// 走到这里说明从当前状态不能转移到目标状态,抛出异常
throw new InvalidStateTransitonException(oldState, eventType);
}
紧接着看 transition.doTransition,这里是调用的内部类SingleInternalArc
:
@Override
public STATE doTransition(OPERAND operand, STATE oldState,
EVENT event, EVENTTYPE eventType) {
if (hook != null) {
hook.transition(operand, event);
}
return postState;
}
还得往下看hook.transition
:
6.2.7 RMAppImpl$RMAppNewlySavingTransition
这里又回到了RMAppImpl中,调用的是他的静态内部类RMAppNewlySavingTransition
private static final class RMAppNewlySavingTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
// 如果启用了recovery,则以非阻塞调用来存储app信息,就可以确保RM已经存储了重新启动AM所需的信息,
// 这样在RM重新启动后就无需进一步的客户端通信即可重启AM
LOG.info("Storing application with id " + app.applicationId);
app.rmContext.getStateStore().storeNewApplication(app);
}
}
6.2.8 RMStateStore
这个类我们之前提过,他管理者RM中的资源状态信息,下面看看他的方法:
/**
* 非阻塞的API
* RM服务使用这个方法来存储app状态信息
* 他不会导致调用的线程(dispatcher)阻塞
* RMAppStoredEvent将在完成时发送以通知RMApp
*/
@SuppressWarnings("unchecked")
public void storeNewApplication(RMApp app) {
ApplicationSubmissionContext context = app.getApplicationSubmissionContext();
assert context instanceof ApplicationSubmissionContextPBImpl;
// 构建app状态数据示例,他包含了一个app所有需要被持久化的状态数据
ApplicationStateData appState =
ApplicationStateData.newInstance(
app.getSubmitTime(), app.getStartTime(), context, app.getUser());
// 这里的eventHandler是AsyncDispatcher.GenericEventHandler,把STORE_APP事件放入eventQueue
dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState));
}
我们这里可以简单看下RMStateStoreAppEvent
:
public class RMStateStoreAppEvent extends RMStateStoreEvent {
private final ApplicationStateData appState;
public RMStateStoreAppEvent(ApplicationStateData appState) {
// 初始时RMStateStoreEventType为STORE_APP
super(RMStateStoreEventType.STORE_APP);
this.appState = appState;
}
public ApplicationStateData getAppState() {
return appState;
}
}
可以看到,初始时状态为STORE_APP
6.2.9 AsyncDispatcher-RMStateStoreEventType.STORE_APP
前面说到把STORE_APP事件放入了eventQueue
,那就是线程createThread
消费并处理事件。
这里是由RMStateStore.ForwardingEventHandler
内部类的handle
方法进行处理:
// 这个类作用是将store事件的处理从接口公共方法中隐蔽
private final class ForwardingEventHandler implements EventHandler<RMStateStoreEvent> {
@Override
public void handle(RMStateStoreEvent event) {
handleStoreEvent(event);
}
}
下面是handleStoreEvent
:
protected void handleStoreEvent(RMStateStoreEvent event) {
try {
this.stateMachine.doTransition(event.getType(), event);
} catch (InvalidStateTransitonException e) {
LOG.error("Can't handle this event at current state", e);
}
}
6.2.10 RMStateStore.StoreAppTransition
经过几次调用,到了RMStateStore
中的内部类StoreAppTransition
,我们来看看transition方法:
private static class StoreAppTransition
implements SingleArcTransition<RMStateStore, RMStateStoreEvent> {
@Override
public void transition(RMStateStore store, RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreAppEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return;
}
ApplicationStateData appState =
((RMStateStoreAppEvent) event).getAppState();
ApplicationId appId =
appState.getApplicationSubmissionContext().getApplicationId();
LOG.info("Storing info for app: " + appId);
try {
// 这一步就是实际持久化appState信息了
store.storeApplicationStateInternal(appId, appState);
// 这里组装了一个RMAppEventType.APP_NEW_SAVED
// 这个方法的作用是通知应用程序,新的app已经持久化(存储或更新)了
// 会发送一个RMAppEvent.APP_NEW_SAVED事件给AsyncDispatcher.GenericEventHandler的handle方法
store.notifyApplication(new RMAppEvent(appId,
RMAppEventType.APP_NEW_SAVED));
} catch (Exception e) {
LOG.error("Error storing app: " + appId, e);
store.notifyStoreOperationFailed(e);
}
};
}
6.2.11 ZKRMStateStore
在我们生产环境中,配置的是ZKRMStateStore,所以我们看看他的storeApplicationStateInternal方法:
@Override
public synchronized void storeApplicationStateInternal(ApplicationId appId,
ApplicationStateData appStateDataPB) throws Exception {
// 构建目标zk路径
String nodeCreatePath = getNodePath(rmAppRoot, appId.toString());
if (LOG.isDebugEnabled()) {
LOG.debug("Storing info for app: " + appId + " at: " + nodeCreatePath);
}
// 带重试的把数据存到zk上
byte[] appStateData = appStateDataPB.getProto().toByteArray();
createWithRetries(nodeCreatePath, appStateData, zkAcl,
CreateMode.PERSISTENT);
}
我们就不继续往下深入了,回到任务提交上来。
6.2.12 RMAppImpl.AddApplicationToSchedulerTransition
6.2.10 中提到的store.notifyApplication(new RMAppEvent(appId,RMAppEventType.APP_NEW_SAVED))
方法,经过重重调用会到达RMAppImpl.AddApplicationToSchedulerTransition
内部类的transition
方法:
private static final class AddApplicationToSchedulerTransition extends
RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
// 这里向GenericEventHandler提交一个封装了App信息的AppAddedSchedulerEvent事件
// 事件类型是SchedulerEventType.APP_ADDED
app.handler.handle(new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user,
app.submissionContext.getReservationID()));
}
}
6.2.13 ResourceManager.SchedulerEventDispatcher
经过GenericEventHandler
处理,最终会把这个APP_ADDED
事件交给ResourceManager
的内部类SchedulerEventDispatcher
,这里简单分析下吧:
// 他继承了AbstractService,这个我们已经很熟悉了
// 实现了EventHandler,意味着他也是一个事件处理类,有实现handle方法
public static class SchedulerEventDispatcher extends AbstractService
implements EventHandler<SchedulerEvent> {
// 调度器对象
private final ResourceScheduler scheduler;
// 他也有一个事件阻塞队列
private final BlockingQueue<SchedulerEvent> eventQueue =
new LinkedBlockingQueue<SchedulerEvent>();
private volatile int lastEventQueueSizeLogged = 0;
// 处理事件的队列
private final Thread eventProcessor;
// 线程应该停止与否的标志
private volatile boolean stopped = false;
// 在执行事件过程中如果遇到异常是否应该导致程序退出
private boolean shouldExitOnError = false;
// 构造方法,在前面介绍过,是RM在serviceInit方法中调用
public SchedulerEventDispatcher(ResourceScheduler scheduler) {
super(SchedulerEventDispatcher.class.getName());
this.scheduler = scheduler;
this.eventProcessor = new Thread(new EventProcessor());
this.eventProcessor.setName("ResourceManager Event Processor");
}
@Override
protected void serviceInit(Configuration conf) throws Exception {
this.shouldExitOnError =
conf.getBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY,
Dispatcher.DEFAULT_DISPATCHER_EXIT_ON_ERROR);
super.serviceInit(conf);
}
@Override
protected void serviceStart() throws Exception {
this.eventProcessor.start();
super.serviceStart();
}
// 处理事件的线程,跟前面的 createThread 线程类似
private final class EventProcessor implements Runnable {
@Override
public void run() {
SchedulerEvent event;
while (!stopped && !Thread.currentThread().isInterrupted()) {
try {
event = eventQueue.take();
} catch (InterruptedException e) {
LOG.error("Returning, interrupted : " + e);
return; // TODO: Kill RM.
}
try {
// 注意这里,把事件直接交给了我们的主角-调度器!!
scheduler.handle(event);
} catch (Throwable t) {
// An error occurred, but we are shutting down anyway.
// If it was an InterruptedException, the very act of
// shutdown could have caused it and is probably harmless.
if (stopped) {
LOG.warn("Exception during shutdown: ", t);
break;
}
LOG.fatal("Error in handling event type " + event.getType()
+ " to the scheduler", t);
if (shouldExitOnError
&& !ShutdownHookManager.get().isShutdownInProgress()) {
LOG.info("Exiting, bbye..");
System.exit(-1);
}
}
}
}
}
@Override
protected void serviceStop() throws Exception {
this.stopped = true;
this.eventProcessor.interrupt();
try {
this.eventProcessor.join();
} catch (InterruptedException e) {
throw new YarnRuntimeException(e);
}
super.serviceStop();
}
@Override
public void handle(SchedulerEvent event) {
try {
int qSize = eventQueue.size();
if (qSize != 0 && qSize % 1000 == 0
&& lastEventQueueSizeLogged != qSize) {
lastEventQueueSizeLogged = qSize;
LOG.info("Size of scheduler event-queue is " + qSize);
}
int remCapacity = eventQueue.remainingCapacity();
if (remCapacity < 1000) {
LOG.info("Very low remaining capacity on scheduler event queue: "
+ remCapacity);
}
// 处理事件就是放入自己的阻塞队列,让处理线程去处理
this.eventQueue.put(event);
} catch (InterruptedException e) {
LOG.info("Interrupted. Trying to exit gracefully.");
}
}
}
6.2.14 FairScheduler
兜兜转转了好久,终于轮到我们的主角FairScheduler
出场。
我们这里先看看前面代码触发的handle
方法,因为传入事件的是APP_ADDED
,所以会走以下分支:
case APP_ADDED:
if (!(event instanceof AppAddedSchedulerEvent)) {
throw new RuntimeException("Unexpected event type: " + event);
}
AppAddedSchedulerEvent appAddedEvent = (AppAddedSchedulerEvent) event;
addApplication(appAddedEvent.getApplicationId(),
appAddedEvent.getQueue(), appAddedEvent.getUser(),
appAddedEvent.getIsAppRecovering());
break;
以上代码在检验事件类型后就调用了addApplication方法:
// 将一个带appID 队列名 用户名的应用提交给调度器
// 就算是提交的用户或者队列已经超出配额限制,依然会接受提交,只是该app不会被标记为runnable
protected synchronized void addApplication(ApplicationId applicationId,
String queueName, String user, boolean isAppRecovering) {
if (queueName == null || queueName.isEmpty()) {
String message = "Reject application " + applicationId +
" submitted by user " + user + " with an empty queue name.";
LOG.info(message);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, message));
return;
}
// 队列名称不能以 . 开头
if (queueName.startsWith(".") || queueName.endsWith(".")) {
String message = "Reject application " + applicationId
+ " submitted by user " + user + " with an illegal queue name "
+ queueName + ". "
+ "The queue name cannot start/end with period.";
LOG.info(message);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, message));
return;
}
RMApp rmApp = rmContext.getRMApps().get(applicationId);
// 尝试去分配app到指定队列,成功后会放入QueueManager管理的队列容器
// 该方法在app被拒绝后悔调用适合的event-handler
// 因为我用的是测试用例进行调试,传入的queue为default user为chengc,这里得到的是root.chengc
FSLeafQueue queue = assignToQueue(rmApp, queueName, user);
if (queue == null) {
return;
}
// Enforce ACLs
UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(user);
if (!queue.hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi)
&& !queue.hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
String msg = "User " + userUgi.getUserName() +
" cannot submit applications to queue " + queue.getName();
LOG.info(msg);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, msg));
return;
}
SchedulerApplication<FSAppAttempt> application =
new SchedulerApplication<FSAppAttempt>(queue, user);
// 放入FairScheduler的父类AbstractYarnScheduler
// 拥有的ConcurrentMap<ApplicationId, SchedulerApplication<T>> applications
applications.put(applicationId, application);
// 增加本队列和父队列的提交任务后的资源指标
queue.getMetrics().submitApp(user);
LOG.info("Accepted application " + applicationId + " from user: " + user
+ ", in queue: " + queueName + ", currently num of applications: "
+ applications.size());
if (isAppRecovering) {
if (LOG.isDebugEnabled()) {
LOG.debug(applicationId + " is recovering. Skip notifying APP_ACCEPTED");
}
} else {
// 完成后 向AsyncDispatcher提交一个RMAppEventType.APP_ACCEPTED事件
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED));
}
}
6.2.15 RMAppImpl-createAndStartNewAttempt
经过AsyncDispatcher
和ResourceMange.ApplicationEventDispatcher
处理,到达了RMAppImpl.StartAppAttemptTransition
内部类中:
// 这个类看名字就知道是专门负责尝试启动App的
private static final class StartAppAttemptTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.createAndStartNewAttempt(false);
};
}
看看RMAppImpl.createAndStartNewAttempt方法
:
private void createAndStartNewAttempt(boolean transferStateFromPreviousAttempt) {
// 创建一个新的App尝试
createNewAttempt();
// 向AsyncDispatcher提交type为RMAppAttemptEventType.START类型的RMAppStartAttemptEvent事件
handler.handle(new RMAppStartAttemptEvent(currentAttempt.getAppAttemptId(),
transferStateFromPreviousAttempt));
}
再看看createNewAttempt
方法:
private void createNewAttempt() {
// 根据appId生成一个attemptId
ApplicationAttemptId appAttemptId =
ApplicationAttemptId.newInstance(applicationId, attempts.size() + 1);
BlacklistManager currentAMBlacklist;
// AM的Container黑名单(节点级别)
if (currentAttempt != null) {
currentAMBlacklist = currentAttempt.getAMBlacklist();
} else {
if (amBlacklistingEnabled) {
currentAMBlacklist = new SimpleBlacklistManager(
scheduler.getNumClusterNodes(), blacklistDisableThreshold);
} else {
currentAMBlacklist = new DisabledBlacklistManager();
}
}
// 如果(之前失败的尝试次数(不包括抢占,硬件错误和NM重新同步)+ 1)等于最大尝试限制,
// 则新创建的尝试可能是最后一次尝试
RMAppAttempt attempt =
new RMAppAttemptImpl(appAttemptId, rmContext, scheduler, masterService,
submissionContext, conf,
maxAppAttempts == (getNumFailedAppAttempts() + 1), amReq,
currentAMBlacklist);
attempts.put(appAttemptId, attempt);
currentAttempt = attempt;
}
经过一系列处理,会触发FairScheduler
的handle
方法,传递的是APP_ATTEMPT_ADDED
事件,然后调用addApplicationAttempt
方法:
// 向调度器(FairScheduler)提交一个app尝试
protected synchronized void addApplicationAttempt(
ApplicationAttemptId applicationAttemptId,
boolean transferStateFromPreviousAttempt,
boolean isAttemptRecovering) {
// 注意,我们在前面FairScheduler接收到APP_ADDED事件的时候已经放入了该app
SchedulerApplication<FSAppAttempt> application =
applications.get(applicationAttemptId.getApplicationId());
String user = application.getUser();
FSLeafQueue queue = (FSLeafQueue) application.getQueue();
//FSAppAttemp代表的是从FairScheduler的角度来表示app尝试
FSAppAttempt attempt =
new FSAppAttempt(this, applicationAttemptId, user,
queue, new ActiveUsersManager(getRootQueueMetrics()),
rmContext);
if (transferStateFromPreviousAttempt) {
attempt.transferStateFromPreviousAttempt(application
.getCurrentAppAttempt());
}
application.setCurrentAppAttempt(attempt);
// 检查该app是否超出资源配额
boolean runnable = maxRunningEnforcer.canAppBeRunnable(queue, user);
// 根据runnable情况放入FSLeafQueue的runnableApps或者nonRunnableApps
queue.addApp(attempt, runnable);
if (runnable) {
// 将该任务所属父队列runnableApps数量增加1;该应用提交用户对应的应用数加1
// 这样做的目的是维护最大运行应用数的限制
maxRunningEnforcer.trackRunnableApp(attempt);
} else {
// 不可运行的应用也要登记,这样的话当该应用不超过应用最大可运行数时就能变为runnable
maxRunningEnforcer.trackNonRunnableApp(attempt);
}
// 记录队列、用户指标
queue.getMetrics().submitAppAttempt(user);
LOG.info("Added Application Attempt " + applicationAttemptId
+ " to scheduler from user: " + user);
if (isAttemptRecovering) {
if (LOG.isDebugEnabled()) {
LOG.debug(applicationAttemptId
+ " is recovering. Skipping notifying ATTEMPT_ADDED");
}
} else {
// 熟悉的一句话,向AsyncDispatcher的GenericEventHandler发送RMAppAttemptEventType.ATTEMPT_ADDED事件
// 注意和前文的来自RMAppAttempt的SchedulerEventType.APP_ATTEMPT_ADDED区分
rmContext.getDispatcher().getEventHandler().handle(
new RMAppAttemptEvent(applicationAttemptId,
RMAppAttemptEventType.ATTEMPT_ADDED));
}
}
6.2.16 ResourceManager.ApplicationAttemptEventDispatcher
经过处理,会达到内部类ApplicationAttemptEventDispatcher.handle
:
public static final class ApplicationAttemptEventDispatcher implements
EventHandler<RMAppAttemptEvent> {
private final RMContext rmContext;
public ApplicationAttemptEventDispatcher(RMContext rmContext) {
this.rmContext = rmContext;
}
@Override
public void handle(RMAppAttemptEvent event) {
ApplicationAttemptId appAttemptID = event.getApplicationAttemptId();
ApplicationId appAttemptId = appAttemptID.getApplicationId();
RMApp rmApp = this.rmContext.getRMApps().get(appAttemptId);
if (rmApp != null) {
RMAppAttempt rmAppAttempt = rmApp.getRMAppAttempt(appAttemptID);
if (rmAppAttempt != null) {
try {
// 交给RMAppAttemptImpl处理该ATTEMPT_ADDED事件
rmAppAttempt.handle(event);
} catch (Throwable t) {
LOG.error("Error in handling event type " + event.getType()
+ " for applicationAttempt " + appAttemptId, t);
}
}
}
}
}
6.2.17 RMAppAttemptImpl
RMAppAttemptImpl
收到该事件后,会调用stateMachine.doTransition
方法,此时事件类型是RMAppAttemptEventType.ATTEMPT_ADDED
,状态为RMAppAttemptState.SUBMITTED
经过流转后执行以下代码:
public static final class ScheduleTransition implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> {
@Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
ApplicationSubmissionContext subCtx = appAttempt.submissionContext;
// 该提交必须属于RM管理的才会正常分配资源和启动
if (!subCtx.getUnmanagedAM()) {
// Need reset #containers before create new attempt, because this request
// will be passed to scheduler, and scheduler will deduct the number after
// AM container allocated
// 在创建新的尝试前需要重置 containers,
//因为这个请求会被传递给调度器而且调度器在为AM分配container后扣除该数字?
// 注意,当前版本代码中下面这些值域都是硬编码,后序的版本会支持修改
// 设定所需container数
appAttempt.amReq.setNumContainers(1);
// 设定优先级
appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
// 设定资源名
appAttempt.amReq.setResourceName(ResourceRequest.ANY);
// 关于RelaxLocality会在附录1.1中讲解
appAttempt.amReq.setRelaxLocality(true);
appAttempt.getAMBlacklist().refreshNodeHostCount(
appAttempt.scheduler.getNumClusterNodes());
// App持有的节点黑名单
BlacklistUpdates amBlacklist = appAttempt.getAMBlacklist()
.getBlacklistUpdates();
if (LOG.isDebugEnabled()) {
LOG.debug("Using blacklist for AM: additions(" +
amBlacklist.getAdditions() + ") and removals(" +
amBlacklist.getRemovals() + ")");
}
// AM 资源已经检查过了,所以我们可以直接提交请求
// 这一步是十分关键点代码,让调度器开始分配资源。
// AM会更新他的资源需求,而且可能会释放他不需要的container
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(
appAttempt.applicationAttemptId,
Collections.singletonList(appAttempt.amReq),
EMPTY_CONTAINER_RELEASE_LIST,
amBlacklist.getAdditions(),
amBlacklist.getRemovals());
if (amContainerAllocation != null
&& amContainerAllocation.getContainers() != null) {
assert (amContainerAllocation.getContainers().size() == 0);
}
// 分配资源的登记完成,返回 SCHEDULED 状态
return RMAppAttemptState.SCHEDULED;
} else {
// save state and then go to LAUNCHED state
appAttempt.storeAttempt();
return RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING;
}
}
}
执行完成后状态为RMAppAttemptState.SCHEDULED
以上代码中提到的RelaxLocality,更多信息请点击这里
6.2.18 FairScheduler-allocate
@Override
public Allocation allocate(ApplicationAttemptId appAttemptId,
List<ResourceRequest> ask, List<ContainerId> release,
List<String> blacklistAdditions, List<String> blacklistRemovals) {
// 确保app存在
FSAppAttempt application = getSchedulerApp(appAttemptId);
if (application == null) {
LOG.info("Calling allocate on removed " +
"or non existant application " + appAttemptId);
return EMPTY_ALLOCATION;
}
// 对资源申请的请求进行合理性检验
SchedulerUtils.normalizeRequests(ask, DOMINANT_RESOURCE_CALCULATOR,
clusterResource, minimumAllocation, getMaximumResourceCapability(),
incrAllocation);
// Record container allocation start time
application.recordContainerRequestTime(getClock().getTime());
// 释放 containers
releaseContainers(release, application);
synchronized (application) {
//ask为申请的资源,判断是否为空
if (!ask.isEmpty()) {
if (LOG.isDebugEnabled()) {
LOG.debug("allocate: pre-update" +
" applicationAttemptId=" + appAttemptId +
" application=" + application.getApplicationId());
}
// debug时打印申请资源详情
application.showRequests();
// 在AppSchedulingInfo中更新应用的container资源消耗情况
application.updateResourceRequests(ask);
application.showRequests();
}
if (LOG.isDebugEnabled()) {
LOG.debug("allocate: post-update" +
" applicationAttemptId=" + appAttemptId +
" #ask=" + ask.size() +
" reservation= " + application.getCurrentReservation());
LOG.debug("Preempting " + application.getPreemptionContainers().size()
+ " container(s)");
}
Set<ContainerId> preemptionContainerIds = new HashSet<ContainerId>();
for (RMContainer container : application.getPreemptionContainers()) {
preemptionContainerIds.add(container.getContainerId());
}
// 判断app是不是还在傻傻等待AM的Container
if (application.isWaitingForAMContainer(application.getApplicationId())) {
// 进入了这里就说明是在为AM分配Contaienr,需要更新用于AM的containers黑名单
application.updateAMBlacklist(
blacklistAdditions, blacklistRemovals);
} else {
// 更新用于非am的containers黑名单
application.updateBlacklist(blacklistAdditions, blacklistRemovals);
}
// 生成app新分配的container的token和所在NM的Token
// 其中RM分配的containerToken作用是用作NM在启动container时进行验证,这个token对app透明,由整个框架管理
// 而NMToken是和NM通信时进行身份验证。
// 当AM向RM申请资源时由RM生成NMToken,而验证是在NM侧进行
ContainersAndNMTokensAllocation allocation =
application.pullNewlyAllocatedContainersAndNMTokens();
// 记录container分配的时间
if (!(allocation.getContainerList().isEmpty())) {
application.recordContainerAllocationTime(getClock().getTime());
}
// 最终返回一个分配实体
return new Allocation(allocation.getContainerList(),
application.getHeadroom(), preemptionContainerIds, null, null,
allocation.getNMTokenList());
}
}
6.3 小结
本章主要通过分析了在服务端侧处理Map任务的过程,其实还是挺简单的,就是那个固定套路。到这里我们的任务提交过程就分析完了。
下一章,我们会讲一讲多次用到的ShutdownHookManager