hadoop实现原理源码 hadoop源码分析

转载

mob64ca14089531 2024-04-19 11:57:51

文章标签 hadoop实现原理源码 hadoop 源码 yarn List 文章分类 Hadoop 大数据

Hadoop源码解析之distributedshell

1. 概述

本文介绍YARN自带的一个非常简单的应用程序编程实例—distributedshell，他可以看做YARN编程中的“helloworld”，它的主要功能是并行执行用户提供的shell命令或者shell脚本。本文主要介绍distributedshell的实现方法。

版本为hadoop-2.5.2

Distributedshell的源代码在文件夹

hadoop-2.5.2-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshell下。

Distributedshell的实现完全与一般YARN应用程序的编写方法完全一致。

2. 客户端解析

DistributedshellClient的入口main函数如下：

public static void main(String[]args) {
    boolean result =  false;
    try {
Client client =new Client();
      LOG.info("Initializing Client");
      try {
        boolean doRun =  client.init(args);
        if (!doRun) {
System.exit(0);
        }
      } catch (IllegalArgumentExceptione) {
System.err.println(e.getLocalizedMessage());
client.printUsage();
System.exit(-1);
      }
result =  client.run();
    } catch (Throwable t) {
      LOG.fatal("Error running CLient",t);
System.exit(1);
    }
    …
  }

2.1 构造yarn的客户端对象yarnClient。

创建时会指定本Client要用到的AM。创建yarnClient。yarn将client与RM的交互抽象出了编程库YarnClient，用以应用程序提交、状态查询和控制等，简化应用程序。

public Client(Configurationconf) throws Exception  {
    this(  //指定AM
"org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster",
conf);
  }
利用YarnClient类创建一个可以直接与ResourceManager交互的客户端yarnClient。
Client(String appMasterMainClass,Configuration conf) {
    …
yarnClient = YarnClient.createYarnClient();   //创建yarnClient
yarnClient.init(conf);
opts = new Options();
    opts.addOption("appname", true, "ApplicationName. Default value - DistributedShell");
    opts.addOption("priority", true, "ApplicationPriority. Default 0");
}

2.2 初始化

init会解析命令行传入的参数，例如使用的jar包、内存大小、cpu个数等。代码里使用GnuParser解析：init时定义所有的参数opts（可以认为是一个模板），然后将opts和实际的args传入解析后得到一个CommnadLine对象，后面查询选项直接操作该CommnadLine对象即可，如cliParser.hasOption("help")和cliParser.getOptionValue("jar")

public  boolean init(String[] args)throws ParseException {
    CommandLine cliParser =new GnuParser().parse(opts,args);
    amMemory =  Integer.parseInt(cliParser.getOptionValue("master_memory","10"));           
amVCores =  Integer.parseInt(cliParser.getOptionValue("master_vcores","1"))

;

…

2.3 运行

Run方法中，启动客户端

DistributedShellClient中最重要的是函数为run()，该函数实现过程如下：
 public  boolean run() throws IOException, YarnException {
…
//先启动yarnClient，会建立跟RM的RPC连接，之后就跟调用本地方法一样。通过此yarnClient查询NM个数、NM详细信息（ID/地址/Container个数等）
yarnClient.start();
YarnClusterMetrics clusterMetrics= yarnClient.getYarnClusterMetrics();

//通过yarnClient向ASM获取全部节点信息：

List<NodeReport>clusterNodeReports = yarnClient.getNodeReports(NodeState.RUNNING);
//收集提交AM所需的信息
YarnClientApplication  app = yarnClient.createApplication();//创建app
GetNewApplicationResponse  appResponse = app.getNewApplicationResponse();
…
//构造ApplicationSubmissionContext，用于提交ApplicationMaster。
ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();
//构造AM的container，加载上下文，包含本地资源，环境变量，实际命令。
ContainerLaunchContext  amContainer = Records.newRecord(ContainerLaunchContext.class);
//AM需要的本地资源，如jar包、log文件
Map<String, LocalResource> localResources = new HashMap<String, LocalResource>();
FileSystem fs = FileSystem.get(conf);
addToLocalResources(fs,appMasterJar,appMasterJarPath, appId.toString(),localResources,null);//添加localResource
//Set the log4j properties if needed
if (!log4jPropFile.isEmpty()) {
fs,log4jPropFile,log4jPath, appId.toString(),localResources,null);
}                           
//添加localResource到amContainer。
amContainer.setLocalResources(localResources);
//设置环境变量
Map<String, String> env = newHashMap<String, String>();
env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION,hdfsShellScriptLocation);
env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTTIMESTAMP,Long.toString(hdfsShellScriptTimestamp));
env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLEN,Long.toString(hdfsShellScriptLen));
//添加nev到amContainer。
amContainer.setEnvironment(env);
//添加命令行到amContainer
List<String>commands = new ArrayList<String>();
commands.add(command.toString());            
amContainer.setCommands(commands);
//添加验证信息到amContainer
DataOutputBuffer dob =new DataOutputBuffer();
credentials.writeTokenStorageToStream(dob);
ByteBuffer  fsTokens = ByteBuffer.wrap(dob.getData(),0, dob.getLength());
amContainer.setTokens(fsTokens);
// 添加amContainer到appContext
appContext.setAMContainerSpec(amContainer);
//设置优先级
appContext.setPriority(pri);
//设置队列
appContext.setQueue(amQueue);
//最后提交AM到yarnClient
yarnClient.submitApplication(appContext);
//启动监控。 Client只关心自己提交到RM的AM是否正常运行，而AM内部的多个task，由AM管理。如果Client要查询应用程序的任务信息，需要自己设计与AM的交互。
return monitorApplication(appId);

总的来说，Client做的事情比较简单，即建立与RM的连接，提交AM，监控AM运行状态。

3. ApplicationMaster解析

AM简化框架如下：

publicstaticvoidmain(String[]args) {
boolean result = false;
ApplicationMaster appMaster =new ApplicationMaster();
    boolean doRun =  appMaster.init(args);
    if (!doRun) {
System.exit(0);
    }
appMaster.run();
result = appMaster.finish();

yarn抽象了两个编程库，AMRMClient和NMClient(AM和RM都可以用)，简化AM编程。

3.1设置RM、NM消息的异步处理方法

//设置并启动RM消息的响应类RMCallbackHandler
    AMRMClientAsync.CallbackHandler allocListener = newRMCallbackHandler();
    amRMClient= AMRMClientAsync.createAMRMClientAsync(1000, allocListener);
amRMClient.init(conf);
amRMClient.start();
 
   //设置并启动NM消息的响应类NMCallbackHandler
containerListener
nmClientAsync = newNMClientAsyncImpl(containerListener);
nmClientAsync.init(conf);
nmClientAsync.start();

3.2 向RM注册

RegisterApplicationMasterResponse response =amRMClient
.registerApplicationMaster(appMasterHostname,appMasterRpcPort,appMasterTrackingUrl);

3.3计算需要的Container，向RM发起请求

for (int =0; i < numTotalContainersToRequest; ++i) {
ContainerRequest containerAsk = setupContainerAskForRM();
amRMClient.addContainerRequest(containerAsk);
    }
private  ContainerRequest
    Priority  pri= Records.newRecord(Priority.class);
Resource  capability = Records.newRecord(Resource.class);
//指定需要的memory/cpu能力
capability.setMemory(containerMemory);
capability.setVirtualCores(containerVirtualCores);
ContainerRequest request=new ContainerRequest(capability,null,null,pri);
    return  request;
  }

3.4 RM分配Container给AM，AM启动任务

RMCallbackHandler RM消息的响应，由RMCallbackHandler处理。示例中主要对前两种消息进行了处理。

private class MCallbackHandler  implements  AMRMClientAsync.CallbackHandler {
    //处理消息：Container执行完毕。在RM返回的心跳应答中携带。如果心跳应答中有已完成和新分配两种Container，先处理已完成
    public  void  onContainersCompleted(List<ContainerStatus> completedContainers) {}
...
    //处理消息：RM新分配Container。在RM返回的心跳应答中携带
    public  void  onContainersAllocated(List<Container> allocatedContainers) {}
 
    public  void  onShutdownRequest() {done= true;}
    //节点状态变化
    public   void  onNodesUpdated(List<NodeReport> updatedNodes) {}
    public  floatgetProgress() {}

onContainersAllocated收到分配的Container之后，会提交任务到NM。
public  void  onContainersAllocated(List<Container> allocatedContainers) {
for (Container allocatedContainer: allocatedContainers) {
    //创建runnable容器
    LaunchContainerRunnable runnableLaunchContainer=   
new
    //新建线程
new
    // launch and start the container on a separate thread to keep
    // the main thread unblocked
    // as all containers may not be allocated at one go.
    launchThreads.add(launchThread);
    //线程中提交Container到NM，不影响主流程
    launchThread.start();      
}
 
简单分析下LaunchContainerRunnable。该类实现自Runnable，其run方法准备任务命令
private  class  LaunchContainerRunnable  implements  Runnable {
    public  LaunchContainerRunnable(
        Container lcontainer, NMCallbackHandlercontainerListener) {
      this.container= lcontainer;          //创建时记录待使用的Container
      this.containerListener= containerListener;
    }
public  void  run() {
//根据命令、环境变量、本地资源等创建Container加载上下文
ContainerLaunchContext  ctx =  Records.newRecord(ContainerLaunchContext.class);
ctx.setEnvironment(shellEnv);
ctx.setLocalResources(localResources);
ctx.setCommands(commands);
ctx.setTokens(allTokens.duplicate());
containerListener.addContainer(container.getId(), container);
//异步启动Container
nmClientAsync.startContainerAsync(container, ctx);
}

onContainersCompleted的功能比较简单，收到Container执行完毕的消息，检查其执行结果，如果执行失败，则重新发起请求，直到全部完成。

NM消息的响应，由NMCallbackHandler处理。

在示例里，回调句柄对NM通知过来的各种事件的处理比较简单，只是修改AM维护的Container执行完成、失败的个数。这样等到有Container执行完毕后，可以重启发起请求。失败处理和上面Container执行完毕消息的处理类似，达到了上面问题里所说的loopback效果。

static  class  NMCallbackHandler
    implements  NMClientAsync.CallbackHandler {
    @Override
    public  void  onContainerStopped(ContainerIdcontainerId) {
    @Override
    public  void  onContainerStatusReceived(ContainerIdcontainerId,
    @Override
    public  void  onContainerStarted(ContainerId containerId,

...

总的来说，AM做的事就是向RM/NM注册回调函数，然后请求Container；得到Container后提交任务，并跟踪这些任务的执行情况，如果失败了则重新提交，直到全部任务完成。

参考：

http://www.datastart.cn/tech/2015/05/05/yarn-dist-shell.html

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：python把一个变量写到字符串 python 字符串转变量名

下一篇：应用于线性回归的数据集线性回归应用场景

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯