hadoop yarn PPT分享

原创

mob64ca12eaf194 2024-03-21 05:16:49 ©著作权

文章标签 Hadoop 资源管理器 Apache 文章分类 Hadoop 大数据

©著作权归作者所有：来自51CTO博客作者mob64ca12eaf194的原创作品，请联系作者获取转载授权，否则将追究法律责任

Hadoop YARN 简介

Hadoop YARN（Yet Another Resource Negotiator）是Apache Hadoop生态系统的一个重要组件，它是Hadoop 2版本引入的新的资源管理器。YARN的设计目标是提高Hadoop的资源利用率和扩展性，使得Hadoop可以支持更多的应用程序。

YARN 的架构

YARN的核心组件包括ResourceManager（资源管理器）和NodeManager（节点管理器）。ResourceManager负责集群中资源的分配和管理，而NodeManager负责每个节点上的资源管理和任务执行。

下面是一个简单的类图，展示了YARN的架构：

classDiagram
    ResourceManager <|-- ApplicationManager
    ResourceManager <|-- Scheduler
    ResourceManager <|-- ResourceTracker
    NodeManager <|-- ContainerManager

YARN 的工作流程

用户提交应用程序到ResourceManager。
ResourceManager将应用程序分配给一个ApplicationManager。
ApplicationManager会向ResourceManager请求资源，并接收分配的资源。
ApplicationManager通过Scheduler将任务分配给NodeManager。
NodeManager启动ContainerManager来运行任务。

示例代码

下面是一个简单的示例，演示了如何使用YARN提交一个MapReduce作业。假设我们有一个WordCount程序，需要在Hadoop集群上运行。

首先，我们需要创建一个MapReduce作业：

public class WordCount {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable>{

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
                        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
                           ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

然后，我们可以使用如下代码提交作业到YARN集群：

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://namenode:9000");
conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resourcemanager.address", "resourcemanager:8032");
conf.set("yarn.resourcemanager.scheduler.address", "resourcemanager:8030");

Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCount.TokenizerMapper.class);
job.setCombinerClass(WordCount.IntSumReducer.class);
job.setReducerClass(WordCount.IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);