cdh集群修改flume配置文件 flume集群模式

转载

level 2024-07-05 22:53:27

文章标签 cdh集群修改flume配置文件 flume 集群实现高可用 kafka 数据 文章分类 架构后端开发

本人采用双节点的方式
1、其中两个节点都存活时：两个节点做负载均衡使用/
2、其中一个节点宕机：一个节点承担从前两个节点的流量（做到高可用）
3、kafka channel 确保数据到kafka 性能和安全性
4、断点续传功能

channel 直接对接kafka 节省资源

其中配置为（两份）

tier1.sources = source1 #对应sources名字
 tier1.channels = kafka-mobile-channel #对应channel 名字tier1.sources.source1.type = avro
 tier1.sources.source1.bind = 0.0.0.0
 tier1.sources.source1.port = 44444
 tier1.sources.source1.channels = kafka-mobile-channel
 tier1.sources.source1.selector.type = multiplexing
 tier1.sources.source1.selector.header = topic
 tier1.sources.source1.selector.mapping.mobile = kafka-mobile-channeltier1.channels.kafka-mobile-channel.type = org.apache.flume.channel.kafka.KafkaChannel
 tier1.channels.kafka-mobile-channel.parseAsFlumeEvent = false #用了配置是否后面要解析 Flume 头信息内容
 tier1.channels.kafka-mobile-channel.kafka.topic = tomcat-mobile
 tier1.channels.kafka-mobile-channel.kafka.consumer.group.id = flume-tomcat-mobile
 tier1.channels.kafka-mobile-channel.kafka.consumer.auto.offset.reset = earliest
 tier1.channels.kafka-mobile-channel.kafka.bootstrap.servers = ZW0804-hadoop-89:9092,ZW0804-hadoop-90:9092,ZW0804-hadoop-91:9092他的上游配置为
agent
collector.sources = taildir-source
 collector.channels = file-channel
 collector.sinks = avro-forward-sink-node2 avro-forward-sink-node3source
collector.sources.taildir-source.type = TAILDIR
 collector.sources.taildir-source.channels = file-channel
 collector.sources.taildir-source.positionFile = /var/log/flume-ng/taildir_position.json
 collector.sources.taildir-source.filegroups = f1
 collector.sources.taildir-source.filegroups.f1 = /tmp/nginx/.+.log
 collector.sources.taildir-source.fileHeader = true
 collector.sources.taildir-source.interceptors = topic UUID
 collector.sources.taildir-source.interceptors.topic.type = static
 collector.sources.taildir-source.interceptors.topic.key = topic
 collector.sources.taildir-source.interceptors.topic.value = we-user
 collector.sources.taildir-source.interceptors.topic.preserveExisting = false
 collector.sources.taildir-source.interceptors.UUID.type=org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
 collector.sources.taildir-source.interceptors.UUID.headerName=key
 collector.sources.taildir-source.interceptors.UUID.prefix=NODE_
 collector.sources.taildir-source.interceptors.UUID.preserveExisting=false
 collector.sources.taildir-source.skipToEnd = truechannel
collector.channels.file-channel.type=file
 collector.channels.file-channel.checkpointDir = /var/log/flume-ng/file-channel/checkpoint #channel 的备份文件方式
 collector.channels.file-channel.dataDirs = /var/log/flume-ng/file-channel/data #数据存储路径sink 采用分发的方式
collector.sinks.avro-forward-sink-node2.type = avro
 collector.sinks.avro-forward-sink-node2.channel = file-channel
 collector.sinks.avro-forward-sink-node2.hostname = node2 #对应 负载均衡的ip
 collector.sinks.avro-forward-sink-node2.port = 44444collector.sinks.avro-forward-sink-node3.type = avro
 collector.sinks.avro-forward-sink-node3.channel = file-channel
 collector.sinks.avro-forward-sink-node3.hostname = node3 #对应 负载均衡的ip
 collector.sinks.avro-forward-sink-node3.port = 44444load balance
collector.sinkgroups = g1
 collector.sinkgroups.g1.sinks = avro-forward-sink-node2 avro-forward-sink-node3
 collector.sinkgroups.g1.processor.type = load_balance
 collector.sinkgroups.g1.processor.backoff = true断点续传功能
 flume 采取采用 TAILDIR
 偏移量存储在： /var/log/flume-ng/taildir_position.json
 （注： [{“inode”:52299335,“pos”:13,“file”:"/tmp/nginx/aa.log"},{“inode”:52299428,“pos”:81,“file”:"/tmp/nginx/test.log"}]）

这里inode就是标记文件的，文件名称改变，这个iNode不会变，pos记录偏移量（按字符计算），file就是绝对路径

测试方法：关闭kafka 然后在其监控路径下生产数据（/tmp/nginx/.+.log）
发现记录偏移量的 pos 更新了（此时kafka 停滞状态）
发现采集chanel 数据存储和备份的文件路径下文件的大小基本不变（说明采集端的flume采集成功并成功发送到了后面的flume集群）
15 分钟后启动 kafka 发现flume 接收到了flume 停滞时间的数据（实现了断点传输和兼容kafka 挂掉）

测试采集flume 写入集群flume 没有成功（在数据路径下 tail -f /var/log/flume-ng/file-channel/data ）有写入 data的操作
把采集flume 配置改正确，后数据又把没传输成功的数据又传输到了kafka 消息队列中

flume 有时我们需要解析header中的信息 todo
1、常见的需求是我们解析业务日志时候，由于每条日志没有可能唯一标志这是唯一一条日志的字段，所以我们一般都加一个字段，进行区分，已确保后续日志的相关去重操作。
例子：
我们在日志采集端每条日志添加一个uuid 操作
source.interceptors.UUID.type=org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
collector.sources.taildir-source.interceptors.UUID.headerName=key

让后在代码中解析出uuid

cdh集群修改flume配置文件 flume集群模式_数据