写MapReduce程序创建一个Job执行时一般使用下面这个方法

System.exit(job.waitForCompletion(true) ? 0 : 1);

今天来分析以下Job是如何被执行的

waitForCompletion方法中真正提交job的代码如下:

/**
   * Submit the job to the cluster and wait for it to finish.
   * 提交任务到集群等待其完成
   * @param verbose 是否给用户打印任务完成相关日志
   * @return true if the job succeeded
   * @throws IOException thrown if the communication with the 
   *         <code>JobTracker</code> is lost
   */
  public boolean waitForCompletion(boolean verbose
                                   ) throws IOException, InterruptedException,
                                            ClassNotFoundException {
    if (state == JobState.DEFINE) {
      submit();
    }
    if (verbose) {
      //实时监控这个Job任务并打印相关日志
      monitorAndPrintJob();
    } else {
      //从客户端获得完成轮询的间隔时间
      int completionPollIntervalMillis = 
        Job.getCompletionPollInterval(cluster.getConf());
      while (!isComplete()) {
        try {
          Thread.sleep(completionPollIntervalMillis);
        } catch (InterruptedException ie) {
        }
      }
    }
    return isSuccessful();
  }

这里主要是这个submit方法,将任务提交,它的实现如下:

/**
     * Submit the job to the cluster and return immediately.
     * 提交任务到集群后返回
     * @throws IOException
     */
    public void submit()
            throws IOException, InterruptedException, ClassNotFoundException {
        //确认job状态
        ensureState(JobState.DEFINE);
        //默认使用新API,除非明确指定,或者配置使用了旧的Map或Reduce属性
        setUseNewAPI();
        //获得与集群连接 synchronized加锁
        connect();
        //创建JobSubmitter实例
        final JobSubmitter submitter =
                getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
        status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
            public JobStatus run() throws IOException, InterruptedException,
                    ClassNotFoundException {
                //异步调用JobSubmitter的submitJobInternal方法提交任务
                return submitter.submitJobInternal(Job.this, cluster);
            }
        });
        //更改任务状态为Running
        state = JobState.RUNNING;
        LOG.info("The url to track the job: " + getTrackingURL());
    }

submit方法首先创建了JobSubmitter实例,然后异步调用了JobSubmitter的submitJobInternal方法。

JobSubmitter的submitJobInternal方法的代码如下:

/**
 * Internal method for submitting jobs to the system.
 * 内部方法实现提交任务给系统
 * 
 * <p>The job submission process involves:
 * <ol>
 *   <li>
 *   Checking the input and output specifications of the job.
 *   </li>
 *   <li>
 *   Computing the {@link InputSplit}s for the job.
 *   </li>
 *   <li>
 *   Setup the requisite accounting information for the
 *   {@link DistributedCache} of the job, if necessary.
 *   </li>
 *   <li>
 *   Copying the job's jar and configuration to the map-reduce system
 *   directory on the distributed file-system.
 *   </li>
 *   <li>
 *   Submitting the job to the <code>JobTracker</code> and optionally
 *   monitoring it's status.
 *   </li>
 * </ol></p>
 * @param job the configuration to submit
 * @param cluster the handle to the Cluster
 * @throws ClassNotFoundException
 * @throws InterruptedException
 * @throws IOException
 */
JobStatus submitJobInternal(Job job, Cluster cluster)
        throws ClassNotFoundException, InterruptedException, IOException {

    //验证job任务的输出类型
    //主要是保证输出路径文件夹不存在
    checkSpecs(job);

    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);
    //初始化临时目录和返回的输出路径。它还监测所有必要的所有权和权限
    //用以存放作业执行过程中用到的文件
    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
    //主机名和地址设置  
    InetAddress ip = InetAddress.getLocalHost();
    if (ip != null) {
        submitHostAddress = ip.getHostAddress();
        submitHostName = ip.getHostName();
        conf.set(MRJobConfig.JOB_SUBMITHOST, submitHostName);
        conf.set(MRJobConfig.JOB_SUBMITHOSTADDR, submitHostAddress);
    }
    //获取新的JobID,此处需要RPC调用  
    JobID jobId = submitClient.getNewJobID();
    job.setJobID(jobId);
    //获取提交目录:/tmp/.....
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
        conf.set(MRJobConfig.USER_NAME,
                UserGroupInformation.getCurrentUser().getShortUserName());
        conf.set("hadoop.http.filter.initializers",
                "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
        conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
        LOG.debug("Configuring job " + jobId + " with " + submitJobDir
                + " as the submit dir");
        // get delegation token for the dir
        TokenCache.obtainTokensForNamenodes(job.getCredentials(),
                new Path[]{submitJobDir}, conf);

        populateTokenCache(conf, job.getCredentials());

        // generate a secret to authenticate shuffle transfers
        if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
            KeyGenerator keyGen;
            try {

                int keyLen = CryptoUtils.isShuffleEncrypted(conf)
                        ? conf.getInt(MRJobConfig.MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS,
                        MRJobConfig.DEFAULT_MR_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS)
                        : SHUFFLE_KEY_LENGTH;
                keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
                keyGen.init(keyLen);
            } catch (NoSuchAlgorithmException e) {
                throw new IOException("Error generating shuffle secret key", e);
            }
            SecretKey shuffleKey = keyGen.generateKey();
            TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
                    job.getCredentials());
        }

        copyAndConfigureFiles(job, submitJobDir);
        // 向集群中拷贝所需文件
        Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);

        // 为这个job创建文件拆分
        LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
        int maps = writeSplits(job, submitJobDir);
        conf.setInt(MRJobConfig.NUM_MAPS, maps);
        LOG.info("number of splits:" + maps);

        //设置队列名  
        String queue = conf.get(MRJobConfig.QUEUE_NAME,
                JobConf.DEFAULT_QUEUE_NAME);
        AccessControlList acl = submitClient.getQueueAdmins(queue);
        conf.set(toFullPropertyName(queue,
                QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

        // removing jobtoken referrals before copying the jobconf to HDFS
        // as the tasks don't need this setting, actually they may break
        // because of it if present as the referral will point to a
        // different job.
        TokenCache.cleanUpTokenReferral(conf);

        if (conf.getBoolean(
                MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
                MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
            // Add HDFS tracking ids
            ArrayList<String> trackingIds = new ArrayList<String>();
            for (Token<? extends TokenIdentifier> t :
                    job.getCredentials().getAllTokens()) {
                trackingIds.add(t.decodeIdentifier().getTrackingId());
            }
            conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
                    trackingIds.toArray(new String[trackingIds.size()]));
        }

        // Set reservation info if it exists
        ReservationId reservationId = job.getReservationId();
        if (reservationId != null) {
            conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
        }

        // Write job file to submit dir
        writeConf(conf, submitJobFile);

        //
        // Now, actually submit the job (using the submit name)
        //这里才开始真正提交job
        printTokens(jobId, job.getCredentials());
        status = submitClient.submitJob(
                jobId, submitJobDir.toString(), job.getCredentials());
        if (status != null) {
            return status;
        } else {
            throw new IOException("Could not launch job");
        }
    } finally {
        if (status == null) {
            LOG.info("Cleaning up the staging area " + submitJobDir);
            if (jtFs != null && submitJobDir != null)
                jtFs.delete(submitJobDir, true);

        }
    }
}

到这里任务调度完成

下面我们看一下文件分块操作,writeSplits方法的实现文件分块如下:

private int writeSplits(org.apache.hadoop.mapreduce.JobContext job,  
    Path jobSubmitDir) throws IOException,  
    InterruptedException, ClassNotFoundException {  
  JobConf jConf = (JobConf)job.getConfiguration();  
  int maps;  
  if (jConf.getUseNewMapper()) {  
    maps = writeNewSplits(job, jobSubmitDir);  
  } else {  
    maps = writeOldSplits(jConf, jobSubmitDir);  
  }  
  return maps;  
}

使用的是新的mapreduce API,所以最终会调用writeNewSplits方法。writeNewSplits的实现如下:

@SuppressWarnings("unchecked")
private <T extends InputSplit>
int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException,
        InterruptedException, ClassNotFoundException {
    Configuration conf = job.getConfiguration();
    InputFormat<?, ?> input =
            ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
    // 通过分块方法getSplits分块
    List<InputSplit> splits = input.getSplits(job);
    T[] array = (T[]) splits.toArray(new InputSplit[splits.size()]);

    // 分块后排序,先执行最大的块
    Arrays.sort(array, new SplitComparator());
    JobSplitWriter.createSplitFiles(jobSubmitDir, conf,
            jobSubmitDir.getFileSystem(conf), array);
    return array.length;
}

writeNewSplits方法中,划分任务数量最关键的代码即为InputFormat的getSplits方法(提示:这里是个静态方法,大家可以直接通过此处的调用,查看不同InputFormat的划分任务实现)。此时的InputFormat即为FileOutputFormat,其getSplits方法的实现如下:

/** 
   * Generate the list of files and make them into FileSplits.
   * @param job the job context
   * @throws IOException
   */
  public List<InputSplit> getSplits(JobContext job) throws IOException {
    Stopwatch sw = new Stopwatch().start();
    long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));
    long maxSize = getMaxSplitSize(job);

    // 生成分块
    List<InputSplit> splits = new ArrayList<InputSplit>();
    List<FileStatus> files = listStatus(job);
    for (FileStatus file: files) {
      Path path = file.getPath();
      long length = file.getLen();
      if (length != 0) {
        BlockLocation[] blkLocations;
        if (file instanceof LocatedFileStatus) {
          blkLocations = ((LocatedFileStatus) file).getBlockLocations();
        } else {
          FileSystem fs = path.getFileSystem(job.getConfiguration());
          blkLocations = fs.getFileBlockLocations(file, 0, length);
        }
        if (isSplitable(job, path)) {
          long blockSize = file.getBlockSize();
          long splitSize = computeSplitSize(blockSize, minSize, maxSize);

          long bytesRemaining = length;
          while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {
            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);
            splits.add(makeSplit(path, length-bytesRemaining, splitSize,
                        blkLocations[blkIndex].getHosts(),
                        blkLocations[blkIndex].getCachedHosts()));
            bytesRemaining -= splitSize;
          }

          if (bytesRemaining != 0) {
            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);
            splits.add(makeSplit(path, length-bytesRemaining, bytesRemaining,
                       blkLocations[blkIndex].getHosts(),
                       blkLocations[blkIndex].getCachedHosts()));
          }
        } else { // not splitable
          splits.add(makeSplit(path, 0, length, blkLocations[0].getHosts(),
                      blkLocations[0].getCachedHosts()));
        }
      } else { 
        //Create empty hosts array for zero length files
        splits.add(makeSplit(path, 0, length, new String[0]));
      }
    }
    // Save the number of input files for metrics/loadgen
    job.getConfiguration().setLong(NUM_INPUT_FILES, files.size());
    sw.stop();
    if (LOG.isDebugEnabled()) {
      LOG.debug("Total # of splits generated by getSplits: " + splits.size()
          + ", TimeTaken: " + sw.elapsedMillis());
    }
    return splits;
  }

getFormatMinSplitSize方法固定返回1,getMinSplitSize方法实际就是mapreduce.input.fileinputformat.split.minsize参数的值(默认为1),那么变量minSize的大小为mapreduce.input.fileinputformat.split.minsize与1之间的最大值。
getMaxSplitSize方法实际是mapreduce.input.fileinputformat.split.maxsize参数的值,那么maxSize即为mapreduce.input.fileinputformat.split.maxsize参数的值。

如果有多个输入源文件,所以List files = listStatus(job);方法返回的files列表的大小为文件数。

在遍历files列表的过程中,会获取每个文件的blockSize,最终调用computeSplitSize方法计算每个输入文件应当划分的任务数。computeSplitSize方法的实现如下:

protected long computeSplitSize(long blockSize, long minSize,  
                                long maxSize) {  
  return Math.max(minSize, Math.min(maxSize, blockSize));  
}

因此我们知道每个输入文件被划分的公式如下:
map任务要划分的大小(splitSize )=(maxSize与blockSize之间的最小值)与minSize之间的最大值
bytesRemaining 是单个输入源文件未划分的字节数
根据getSplits方法,我们知道map任务划分的数量=输入源文件数目 * (bytesRemaining / splitSize个划分任务+bytesRemaining不能被splitSize 整除的剩余大小单独划分一个任务 )