flume集中式架构

转载

架构领航博主 2025-01-06 19:02:14

文章标签 flume集中式架构 kafka zookeeper apache 文章分类 架构后端开发

Kafka文档

一、Kafka简介

Kafka是一个分布式的消息队列系统(Message Queue)。

官网：https://kafka.apache.org/

kafka集群有多个Broker服务器组成，每个类型的消息被定义为topic。

同一topic内部的消息按照一定的key和算法被分区(partition)存储在不同的Broker上。

消息生产者producer和消费者consumer可以在多个Broker上生产/消费topic

概念理解：

Topics and Logs：

Topic即为每条发布到Kafka集群的消息都有一个类别，topic在Kafka中可以由多个消费者订阅、消费。

每个topic包含一个或多个partition（分区），partition数量可以在创建topic时指定，每个分区日志中记录了该分区的数据以及索引信息。如下图：

Kafka只保证一个分区内的消息有序，不能保证一个主题的不同分区之间的消息有序。如果你想要保证所有的消息都绝对有序可以只为一个主题分配一个分区。

分区会给每个消息记录分配一个顺序ID号（偏移量），能够唯一地标识该分区中的每个记录。Kafka集群保留所有发布的记录，不管这个记录有没有被消费过，Kafka提供相应策略通过配置从而对旧数据处理。

实际上，每个消费者唯一保存的元数据信息就是消费者当前消费日志的位移位置。位移位置是由消费者控制，即、消费者可以通过修改偏移量读取任何位置的数据。

Distribution -- 分布式
Producers -- 生产者

指定topic来发送消息到Kafka Broker

Consumers -- 消费者

根据topic消费相应的消息

二、Kafka集群部署

集群规划：

Zookeeper集群共三台服务器，分别为：node06、node07、node08。

Kafka集群共三台服务器，分别为：node06、node07、node08。

1、Zookeeper集群准备

kafka是一个分布式消息队列，需要依赖ZooKeeper，请先安装好zk集群。

Zookeeper集群安装步骤略。

2、安装Kafka

下载压缩包（官网地址：http://kafka.apache.org/downloads.html）

解压：

tar zxvf kafka_2.10-0.9.0.1.tgz -C /opt/

mv kafka_2.10-0.9.0.1/ kafka

修改配置文件：config/server.properties

核心配置参数说明：

broker.id: broker集群中唯一标识id，0、1、2、3依次增长（broker即Kafka集群中的一台服务器）

注：

当前Kafka集群共三台节点，分别为：node1、node2、node3。对应的broker.id分别为0、1、2。

zookeeper.connect: zk集群地址列表

将当前node1服务器上的Kafka目录同步到其他node2、node3服务器上：

scp -r /opt/kafka/ node2:/opt

scp -r /opt/kafka/ node3:/opt

修改node2、node3上Kafka配置文件中的broker.id（分别在node2、3服务器上执行以下命令修改broker.id）

sed -i -e 's/broker.id=.*/broker.id=1/' /opt/kafka/config/server.properties

sed -i -e 's/broker.id=.*/broker.id=2/' /opt/kafka/config/server.properties

3、启动Kafka集群

A、启动Zookeeper集群。

B、启动Kafka集群。

分别在三台服务器上执行以下命令启动：

bin/kafka-server-start.sh config/server.properties

4、测试

创建话题

（kafka-topics.sh --help查看帮助手册）

创建topic：

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --create --replication-factor 2 --partitions 3 --topic test

（参数说明：

--replication-factor：指定每个分区的复制因子个数，默认1个

--partitions：指定当前创建的kafka分区数量，默认为1个

--topic：指定新建topic的名称）

查看topic列表：

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --list

查看“test”topic描述：

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --describe --topic test

创建生产者：

bin/kafka-console-producer.sh --broker-list node06:9092,node07:9092,node08:9092 --topic test

创建消费者：

bin/kafka-console-consumer.sh --zookeeper node06:2181,node07:2181,node08:2181 --from-beginning --topic test

注：

查看帮助手册：

bin/kafka-console-consumer.sh help

三、Flume & Kafka

1、Flume安装

Flume安装流程：

解压jar包

mv conf/flume-env.sh.template flume-env.sh

vi flume-env.sh java环境变量

./bin flume-ng version

/conf/下创建配置文件fk.conf内容如下：

`a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro a1.sources.r1.bind = node06 a1.sources.r1.port = 41414 # Describe the sink a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink a1.sinks.k1.topic = testflume a1.sinks.k1.brokerList = node06:9092,node07:9092,node08:9092 a1.sinks.k1.requiredAcks = 1 a1.sinks.k1.batchSize = 20 a1.sinks.k1.channel = c1 # Use a channel which buffers events in memory`

a1.sources    = r1
a1.sinks    = k1
a1.channels    = c1
 
#    Describe/configure the source
a1.sources.r1.type    = avro
a1.sources.r1.bind    = node06
a1.sources.r1.port    = 41414
 
#    Describe the sink
a1.sinks.k1.type    = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic    = testflume
a1.sinks.k1.brokerList    = node06:9092,node07:9092,node08:9092
a1.sinks.k1.requiredAcks    = 1
a1.sinks.k1.batchSize    = 20
a1.sinks.k1.channel    = c1
 
#    Use a channel which buffers events in memory

`a1.channels.c1.type = memory a1.channels.c1.capacity = 1000000 a1.channels.c1.transactionCapacity = 10000 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1`

a1.channels.c1.type    = memory
a1.channels.c1.capacity    = 1000000
a1.channels.c1.transactionCapacity    = 10000
 
#    Bind the source and sink to the channel
a1.sources.r1.channels    = c1
a1.sinks.k1.channel    = c1

2、Flume + Kafka

启动zk集群

A、启动Kafka集群。

bin/kafka-server-start.sh config/server.properties

B、配置Flume集群，并启动Flume集群。

bin/flume-ng agent -n a1 -c conf -f conf/fk.conf -Dflume.root.logger=DEBUG,console

3、测试

分别启动Zookeeper、Kafka、Flume集群。

zkServer.sh start

bin/kafka-server-start.sh config/server.properties

bin/flume-ng agent -n a1 -c conf -f conf/fk.conf -Dflume.root.logger=DEBUG,console

创建topic：（不用）

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --create --replication-factor 2 --partitions 3 --topic testflume

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --create --replication-factor 2 --partitions 1 --topic LogError

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --list

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --create --replication-factor 2 --partitions 1 --topic LogError

bin/kafka-console-consumer.sh --zookeeper node06:2181,node07:2181,node08:2181 --from-beginning --topic LogError

启动消费者：

bin/kafka-console-consumer.sh --zookeeper node06:2181,node07:2181,node08:2181 --from-beginning --topic testflume

启动生产者

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --create --replication-factor 2 --partitions 1 --topic mylog_cmcc

查看topic列表：

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --list

启动消费者

bin/kafka-console-consumer.sh --zookeeper node06:2181,node07:2181,node08:2181 --from-beginning --topic mylog_cmcc

运行“RpcClientDemo”代码，通过rpc请求发送数据到Flume集群。

Flume中source类型为AVRO类型，此时通过Java发送rpc请求，测试数据是否传入Kafka。

其中，Java发送Rpc请求Flume代码示例如下：

（参考Flume官方文档：http://flume.apache.org/FlumeDeveloperGuide.html）

import org.apache.flume.Event; import org.apache.flume.EventDeliveryException; import org.apache.flume.api.RpcClient; import org.apache.flume.api.RpcClientFactory; import org.apache.flume.event.EventBuilder; import java.nio.charset.Charset; /** Flume官网案例 * http://flume.apache.org/FlumeDeveloperGuide.html * @author root */ publicclass RpcClientDemo { publicstaticvoid main(String[] args) { MyRpcClientFacade client = new MyRpcClientFacade(); Flume client.init("node1", 41414); Flume // configured to listen with an AvroSource. String sampleData = "Hello Flume!"; for (int i = 0; i < 10; i++) { client.sendDataToFlume(sampleData); System.out.println("发送数据：" + sampleData); } client.cleanUp(); } } class MyRpcClientFacade { private RpcClient client; private String hostname; privateint port; publicvoid init(String hostname, int port) { // Setup the RPC connection this.hostname = hostname; this.port = port; this.client = RpcClientFactory.getDefaultInstance(hostname, port); // Use the following method to create a thrift client (instead of the // above line): hostname, port); } publicvoid sendDataToFlume(String data) { Flume Event event = EventBuilder.withBody(data, Charset.forName("UTF-8")); // Send the event try { client.append(event); } catch (EventDeliveryException e) { // clean up and recreate the client client.close(); client = null; client = RpcClientFactory.getDefaultInstance(hostname, port); // Use the following method to create a thrift client (instead of // the above line): hostname, port); } } publicvoid cleanUp() { // Close the RPC connection client.close(); } }

import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.api.RpcClient;
import org.apache.flume.api.RpcClientFactory;
import org.apache.flume.event.EventBuilder;
import java.nio.charset.Charset;
 
/**
Flume官网案例
 *    http://flume.apache.org/FlumeDeveloperGuide.html 
 * @author root
 */
publicclass RpcClientDemo {
    
    publicstaticvoid    main(String[] args) {
       MyRpcClientFacade client = new    MyRpcClientFacade();
Flume
       client.init("node1", 41414);
 
Flume
       // configured to listen with an AvroSource.
       String sampleData = "Hello Flume!";
       for (int i = 0; i < 10; i++) {
           client.sendDataToFlume(sampleData);
           System.out.println("发送数据：" + sampleData);
       }
 
       client.cleanUp();
    }
}
 
class MyRpcClientFacade {
    private RpcClient client;
    private String hostname;
    privateint port;
 
    publicvoid    init(String hostname, int port) {
       // Setup the RPC connection
       this.hostname = hostname;
       this.port = port;
       this.client = RpcClientFactory.getDefaultInstance(hostname, port);
       // Use the following method to create a thrift client (instead of the
       // above line):
hostname,    port);
    }
 
    publicvoid    sendDataToFlume(String data) {
Flume
       Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
 
       // Send the event
       try {
           client.append(event);
       } catch (EventDeliveryException e) {
           // clean up and recreate the client
           client.close();
           client = null;
           client = RpcClientFactory.getDefaultInstance(hostname, port);
           // Use the following method to create a thrift client (instead of
           // the above line):
hostname,    port);
       }
    }
 
    publicvoid    cleanUp() {
       // Close the RPC connection
       client.close();
    }
}

四、Storm & Kafka

官网地址：

http://storm.apache.org/about/integrates.html

五、flume+kafka+spout

bin/kafka-topics.sh --zookeeper node06:2181,node07:2181,node08:2181 --create --replication-factor 2 --partitions 1 --topic LogError

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：java main 入库

下一篇：Unirest 跳过SSL认证

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯