所以初步怀疑是因为AM心跳汇报出现问题或则RM因为繁忙hang住,AM因为某些机制导致等待10min不汇报心跳,所以我们还是先了解,AM是如何向RM汇报心跳的。

在MRAppMaster中,ContainerAllocatorRouter负责向RM申请资源(发送心跳)


RMContainerAllocator其最终父类是RMCommunicator,它实现了RMHeartbeatHandler接口

1.  public interface RMHeartbeatHandler {http://www.kmrlyy.com/fujianyan/33454.html
2.  long getLastHeartbeatTime(); // 获取上一次心跳的时间
3.  void runOnNextHeartbeat(Runnable callback); // 回调注册到callback队列的callback函数
4.  }http://www.nvzi91.cn/niaodaoyan/29938.html
5.

复制代码

每一次心跳回来,都会执行一次注册在heartbeatCallbacks中的回调函数:


1.  allocatorThread = new Thread(new Runnable() {

2.  @Override
3.  public void run() {

4.  while (!stopped.get() && !Thread.currentThread().isInterrupted()) {

5.  ......http://www.kmrlyy.com/fujianyan/33455.html
6.  heartbeat();
7.  lastHeartbeatTime = context.getClock().getTime();// 记录上一次心跳时间
8.  executeHeartbeatCallbacks(); // 执行回调函数
9.  ....http://www.nvzi91.cn/yindaoyan/29939.html
10.  });http://www.nvzi91.cn/zigongjiliu/29942.html

复制代码

RMCommunicator类中:

1.  private void executeHeartbeatCallbacks() {

2.  Runnable callback = null;
3.  while ((callback = heartbeatCallbacks.poll()) != null) {

4.  callback.run();
5.  }
6.  }http://www.nvzi91.cn/yindaoyan/29940.html
7.  http://www.kmrlyy.com/penqiangyan/33457.html

复制代码

首先会向RM注册,把自己的host和port告诉RM,然后在启动一条线程(startAllocatorThread)定期的调用RMContainerAllocator中实现的heartbeat方法(向RM申请资源,定期汇报信息,告诉RM自己还活着)。

AM初始化同时也会初始化RMCommunicator:


    1.  protected void serviceStart() throws Exception {
    
    2.  scheduler= createSchedulerProxy(); // 获取RM的代理
    3.  register(); // 注册
    4.  startAllocatorThread(); // 心跳线程
    5.  ....
    6.  }http://www.nvzi91.cn/luanchaonanzhong/29941.html
    7.  http://www.kmrlyy.com/niaodaoyan/33458.html


    复制代码

    AM的ContainerAllocatorRouter事件处理流程如下图:


    注册流程:

    调用RMCommunicator远程调用ApplicationMasterService的registerApplicationMaster方法,设置维护responseId,然后把它加入AMLivelinessMonitor中,并使用map记录时间,用来监控AM是否因为长时间没有心跳而超时,如果AM长时间没有心跳信息更新,RM就会通知NodeManager把AM移除。

    心跳线程:

    在发送心跳的过程中,即也是获取资源的过程


      1.  @Overridehttp://m.nvzi91.cn/penqiangyan/29351.html
      2.  protected synchronized void heartbeat() throws Exception {
      
      3.  scheduleStats.updateAndLogIfChanged("Before Scheduling: ");
      4.  List<Container> allocatedContainers = getResources();// 重要的方法
      5.  if (allocatedContainers.size() > 0) {
      
      6.  scheduledRequests.assign(allocatedContainers);
      7.  }
      8.  ......
      9.  }
      10.


      复制代码

      获取资源的过程:


      1.  private List<Container> getResources() throws Exception {
      
      2.  ...
      3.  response = makeRemoteRequest(); // 和RM进行交互
      4.  ...
      5.  // 优先处理RM发送过来的命令
      6.  if (response.getAMCommand() != null) {
      
      7.  switch(response.getAMCommand()) {
      
      8.  case AM_RESYNC:
      9.  case AM_SHUTDOWN:
      10.  eventHandler.handle(new JobEvent(this.getJob().getID(),
      11.  JobEventType.JOB_AM_REBOOT));
      12.  throw new YarnRuntimeException("Resource Manager doesn't recognize AttemptId: " +
      13.  this.getContext().getApplicationID());
      14.  default:
      15.  ....
      16.  }http://m.nvzi91.cn/zigongai/29352.html
      17.  // 等等一系列处理
      18.  }
      19.  }

      复制代码

      构建请求:


      1.  protected AllocateResponse makeRemoteRequest() throws IOException {
      
      2.  AllocateRequest allocateRequest =
      3.  AllocateRequest.newInstance(lastResponseID,
      4.  super.getApplicationProgress(), new ArrayList<ResourceRequest>(ask),
      5.  new ArrayList<ContainerId>(release), blacklistRequest);
      6.  AllocateResponse allocateResponse;
      7.  allocateResponse = scheduler.allocate(allocateRequest); // RPC调用ApplicationMasterService的allocate方法
      8.  .....
      9.  }http://m.nvzi91.cn/jiankang/29353.html


      复制代码

      每一次心跳的调用都会刷新AMLivelinessMonitor的时间,代表AM还活着

      而且我们通过代码可以看出,资源请求被封装为一个ask,即一个ResourceRequest的ArrayList的资源列表 例如:

      1.  priority:20 host:host9 capability:<memory:2048, vCores:1>
      2.  priority:20 host:host2 capability:<memory:2048, vCores:1>
      3.  priority:20 host:host10 capability:<memory:2048, vCores:1>
      4.  priority:20 host:/rack/rack3203 capability:<memory:2048, vCores:1>
      5.  priority:20 host:/rack/rack3202 capability:<memory:2048, vCores:1>
      6.  priority:20 host:* capability:<memory:2048, vCores:1>
      7.  www.kmrlyy.com

      复制代码

      然而,ask是如何被构造的呢?

      RMContainerAllocator中的addMap,addReduce,assign方法中对ask的数据内容进行了修改


        1.  addContainerReq --> addResourceRequest --> addResourceRequestToAsk;
        2.  www.nvzi91.cn


        复制代码

        通过在代码自己添加日志可以看出,资源会被分为local,rack,和any级别去申请资源

        最终变为一个ask list发送到RM上:

        1.  ask Capability:<memory:2048, vCores:1> ResourceName:* NumContainers:384 Priority:20 RelaxLocality:true
        2.  ask Capability:<memory:2048, vCores:1> ResourceName:/rack/rack3201 NumContainers:227 Priority:20 RelaxLocality:true
        3.  ask Capability:<memory:2048, vCores:1> ResourceName:/rack/rack3202 NumContainers:231 Priority:20 RelaxLocality:true
        4.  ask Capability:<memory:2048, vCores:1> ResourceName:/rack/rack3203 NumContainers:152 Priority:20 RelaxLocality:true
        5.  ask Capability:<memory:2048, vCores:1> ResourceName:/rack/rack3204 NumContainers:158 Priority:20 RelaxLocality:true
        6.  ask Capability:<memory:2048, vCores:1> ResourceName:host1 NumContainers:46 Priority:20 RelaxLocality:true
        7.  ask Capability:<memory:2048, vCores:1> ResourceName:host5 NumContainers:52 Priority:20 RelaxLocality:true
        8.  ask Capability:<memory:2048, vCores:1> ResourceName:host6 NumContainers:38 Priority:20 RelaxLocality:true
        9.  m.nvzi91.cn

        复制代码

        类似日志为:

        1.  getResources() for application_1438330253091_0004: ask=29 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=24