es hadoop 一台主机

原创

mob64ca12f63d4f 2025-01-03 06:41:49 ©著作权

文章标签 Hadoop Elastic hadoop 文章分类 Hadoop 大数据

©著作权归作者所有：来自51CTO博客作者mob64ca12f63d4f的原创作品，请联系作者获取转载授权，否则将追究法律责任

ES Hadoop 一台主机的科学普及

近几年，随着大数据技术的发展，Hadoop和Elasticsearch（ES）成为了数据处理和分析中的重要工具。Hadoop主要用于分布式存储和处理大规模数据，而Elasticsearch则是一个强大的搜索引擎，能够实时分析大量信息。将这两个工具结合在一台主机上，可以高效地进行数据的存储、处理和分析。本文将介绍如何在一台主机上使用Hadoop和Elasticsearch，并提供相应的代码示例、流程图和甘特图以帮助理解。

环境准备

首先，你需要准备一台具有相对较高性能的计算机，以及安装好的Java环境。接下来，下载并安装Hadoop和Elasticsearch。以下是安装过程的简要步骤：

# 下载Hadoop
wget 
tar -xzvf hadoop-3.3.1.tar.gz
export HADOOP_HOME=~/hadoop-3.3.1
export PATH=$PATH:$HADOOP_HOME/bin

# 下载Elasticsearch
wget 
tar -xzvf elasticsearch-7.15.0-darwin-x86_64.tar.gz
cd elasticsearch-7.15.0
./bin/elasticsearch

工作流程

下面是将Hadoop与Elasticsearch集成的基本工作流程：

flowchart TD
    A[启动Hadoop] --> B[启动Elasticsearch]
    B --> C[将数据上传到Hadoop]
    C --> D[使用MapReduce处理数据]
    D --> E[将结果导入Elasticsearch]
    E --> F[执行搜索查询]

1. 启动Hadoop

使用以下命令启动Hadoop的伪分布式模式：

$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh

2. 启动Elasticsearch

确保Elasticsearch正在运行，执行以下命令：

cd elasticsearch-7.15.0
./bin/elasticsearch

3. 数据上传到Hadoop

将你的数据集上传到HDFS（Hadoop分布式文件系统）：

hdfs dfs -mkdir /data
hdfs dfs -put localdata.txt /data/

4. 使用MapReduce处理数据

编写一个简单的MapReduce程序来处理上传的数据。以下是一个WordCount示例：

import java.io.IOException;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount extends Configured {
    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            for (String token : value.toString().split("\\s+")) {
                word.set(token);
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Job job = Job.getInstance();
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

在命令行中运行MapReduce任务：

hadoop jar wordcount.jar WordCount /data/localdata.txt /data/output

5. 将结果导入Elasticsearch

使用Elasticsearch Hadoop库将MapReduce的输出结果导入到Elasticsearch中。

es.nodes: "localhost"
es.port: 9200
es.resource: "wordcount/result"
es.input.json: true

在Hadoop的配置文件中添加以上内容。

6. 执行搜索查询

在Elasticsearch中执行查询，确认数据已经成功导入：

curl -X GET "localhost:9200/wordcount/_search?pretty"

项目甘特图

下面是这个项目的执行进度甘特图：

gantt
    title Hadoop与Elasticsearch集成
    dateFormat  YYYY-MM-DD
    section 环境准备
    安装Hadoop          :a1, 2023-10-01, 5d
    安装Elasticsearch   :after a1  , 5d
    section 数据处理
    上传数据到Hadoop   :2023-10-06  , 2d
    运行MapReduce任务   :2023-10-08  , 3d
    导入Elasticsearch   :after a2  , 2d
    section 测试查询
    运行搜索查询       :2023-10-12  , 1d

结论

通过在一台主机上结合Hadoop与Elasticsearch，用户能够利用两者的优势进行高效的数据存储和查询。在数据日益增长的当今世界，这样的技术组合将极大地改善数据的处理和分析能力。希望这篇文章能为对Hadoop和Elasticsearch感兴趣的读者提供一个清晰的入门指南，帮助您在实际项目中更好地运用这些技术。

上一篇：java中线性安全的set

下一篇：java如何测试并发数量

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯