1.Flink on yarn 的模式下,利用 log4j(log4j2) KafkaAppender 把日志直接打到 kafka(无kerberos认证)
在 Flink 1.11.0 之前 Flink 使用的日志是 Log4j. 在 1.11.0 之后使用的是 Log4j2. 这两者的配置稍有不同:
log4j 配置 (Flink 1.11.0 之前)
#KafkaLog4jAppender 可能需要导入相关jar包
log4j.appender.kafka=org.apache.kafka.log4jappender.KafkaLog4jAppender
log4j.appender.kafka.brokerList=master:9092,storm1:9092,storm2:9092
log4j.appender.kafka.topic=flink_log_test
log4j.appender.kafka.compressionType=none
log4j.appender.kafka.requiredNumAcks=0
log4j.appender.kafka.syncSend=true
log4j.appender.kafka.layout=org.apache.log4j.PatternLayout
# 自定义日志格式
log4j.appender.kafka.layout.ConversionPattern={"log_level":"%p",\
"log_timestamp":"%d{ISO8601}",\
"log_package":"%C",\
"log_thread":"%t",\
"log_file":"%F",\
"log_line":"%L",\
"log_message":"%m",\
"log_path":"%X{log_path}",\
"flink_job_name":"${sys:flink_job_name}"}
log4j.appender.kafka.level=INFO
# for package com.demo.kafka, log would be sent to kafka appender.
log4j.logger.kafka=INFO
# 打印源为kafka时指定log默认打印级别
log4j.logger.org.apache.kafka=WARN
# 日志的布局格式
#log4j.appender.kafka.layout=net.logstash.log4j.JSONEventLayoutV1
## 添加自定义参数 k:v 格式,如果有多个 , 隔开
#log4j.appender.kafka.layout.UserFields=flink_job_name:${sys:flink_job_name},yarnContainerId:${sys:yarnContainerId}
为了简化下游的处理,我们需要把日志格式化成 JSON 格式,这里有两种方案,第一种是自己拼接一个 JSON 字符串,第二种是利用官方提供的 net.logstash.log4j.JSONEventLayoutV1 来格式化,如果这两种方案都不能满足你的需求,你可以自己定义 appender 继承 AppenderSkeleton 即可.这里还有另外一个问题,我们如何区分不同任务的日志呢?,如果运行多个 Flink 应用程序的话,多个 container 可能会运行在同一个机器上,那么就没有办法区分日志是哪个任务打的,所以我们这里利用 UserFields 添加了两个自定义的字段用来区分日志 flink_job_name 和 yarnContainerId,这样的话日志就非常清晰了.后面也可以根据 flink_job_name 来检索,所以这里还需要设置一个系统属性 yarnContainerId 让 log4j 可以解析到环境变量里的 yarnContainerId, Flink 默认是没有加这个属性的,所以需要我们自己添加.
flink-conf.yaml 配置
添加下面两行即可,这样就可以拿到 containerId.
env.java.opts.taskmanager: -DyarnContainerId=$CONTAINER_ID
env.java.opts.jobmanager: -DyarnContainerId=$CONTAINER_ID
log4j2 配置(Flink 1.11.0(不包括) 之后)
# kafka appender config
rootLogger.appenderRef.kafka.ref = Kafka
appender.kafka.type=Kafka
appender.kafka.name=Kafka
appender.kafka.syncSend=true
appender.kafka.ignoreExceptions=false
appender.kafka.topic=flink_log_test
appender.kafka.property.type=Property
appender.kafka.property.name=bootstrap.servers
appender.kafka.property.value=master:9092,storm1:9092,storm2:9092
appender.kafka.layout.type=JSONLayout
apender.kafka.layout.value=net.logstash.log4j.JSONEventLayoutV1
appender.kafka.layout.compact=true
appender.kafka.layout.complete=false
appender.kafka.layout.additionalField1.type=KeyValuePair
appender.kafka.layout.additionalField1.key=logdir
appender.kafka.layout.additionalField1.value=${sys:log.file}
appender.kafka.layout.additionalField2.type=KeyValuePair
appender.kafka.layout.additionalField2.key=flink_job_name
appender.kafka.layout.additionalField2.value=${sys:flink_job_name}
appender.kafka.layout.additionalField3.type=KeyValuePair
appender.kafka.layout.additionalField3.key=yarnContainerId
appender.kafka.layout.additionalField3.value=${sys:yarnContainerId}
# 自定义布局格式
#appender.kafka.layout.type=PatternLayout
#appender.kafka.layout.pattern={"log_level":"%p","log_timestamp":"%d{ISO8601}","log_thread":"%t","log_file":"%F", "log_line":"%L","log_message":"'%m'","log_path":"%X{log_path}","job_name":"${sys:flink_job_name}"}%n
log4j2 同样也可以自定义 JSON 字符串或者利用 JSONEventLayoutV1 格式化日志,添加额外字段和 log4j 不太一样,需要通过 appender.kafka.layout.additionalField1 来添加,格式如下:
appender.kafka.layout.additionalField1.type=KeyValuePair
appender.kafka.layout.additionalField1.key=logdir
appender.kafka.layout.additionalField1.value=${sys:log.file}
提交任务
# 第一个任务
flink run -d -m yarn-cluster \
-Dyarn.application.name=test \
-Dyarn.application.queue=flink \
-Dmetrics.reporter.promgateway.groupingKey="jobname=test" \
-Dmetrics.reporter.promgateway.jobName=test \
-c flink.streaming.FlinkStreamingDemo \
-Denv.java.opts="-Dflink_job_name=test" \
/home/jason/bigdata/flink/flink-1.13.2/flink-1.13.0-1.0-SNAPSHOT.jar
# 第二个任务
flink run -d -m yarn-cluster \
-Dyarn.application.name=test1 \
-Dyarn.application.queue=spark \
-Dmetrics.reporter.promgateway.groupingKey="jobname=test1" \
-Dmetrics.reporter.promgateway.jobName=test1 \
-c flink.streaming.FlinkStreamingDemo \
-Denv.java.opts="-Dflink_job_name=test1" \
/home/jason/bigdata/flink/flink-1.13.2/flink-1.13.0-1.0-SNAPSHOT.jar
这里需要注意的是,flink_job_name 也需要通过 -Dflink_job_name=test 方式设置一下.然后来消费一下 flink_log_test 这个 topic 看看日志数据如下所示:
{
"thread":"Checkpoint Timer",
"level":"INFO",
"loggerName":"org.apache.flink.runtime.checkpoint.CheckpointCoordinator",
"message":"Triggering checkpoint 7 (type=CHECKPOINT) @ 1629016409942 for job dbb2fb501566711e3ba3a0feca2bcd59.",
"endOfBatch":false,
"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger",
"instant":{
"epochSecond":1629016409,
"nanoOfSecond":948000000
},
"threadId":70,
"threadPriority":5,
"logdir":"/home/jason/bigdata/hadoop/hadoop-2.9.0/logs/userlogs/application_1629044405912_0003/container_1629044405912_0003_01_000001/jobmanager.log",
"flink_job_name":"test",
"yarnContainerId":"container_1629044405912_0003_01_000001"
}
{
"thread":"jobmanager-future-thread-1",
"level":"INFO",
"loggerName":"org.apache.flink.runtime.checkpoint.CheckpointCoordinator",
"message":"Completed checkpoint 5 for job a1b2a78965da9340168ff964a92729a0 (50960 bytes in 57 ms).",
"endOfBatch":false,
"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger",
"instant":{
"epochSecond":1629016456,
"nanoOfSecond":304000000
},
"threadId":52,
"threadPriority":5,
"logdir":"/home/jason/bigdata/hadoop/hadoop-2.9.0/logs/userlogs/application_1629044405912_0004/container_1629044405912_0004_01_000001/jobmanager.log",
"flink_job_name":"test1",
"yarnContainerId":"container_1629044405912_0004_01_000001"
}
可以看到我们增加的 3 个字段都能正常显示.至此,我们的应用程序日志最终都保存在 Kafka 中.
2.Flink on yarn 的模式下,利用 log4j(log4j2) KafkaAppender 把日志直接打到 kafka(有kerberos认证)
如果正常在代码中访问有Kerberos认证的Kafka,代码中配置如下:
String jaas_linux = "......./kafka_***_jaas.conf";
String krb5_linux = "......./krb5.conf";
System.setProperty("java.security.auth.login.config", jaas_linux);
System.setProperty("java.security.krb5.conf", krb5_linux);
Properties properties = new Properties();
properties.setProperty("bootstrap.servers","localhost:9092");
properties.setProperty("group.id", "flink-group");
properties.setProperty("security.protocol","SASL_PLAINTEXT");
properties.setProperty("sasl.mechanism","GSSAPI");
properties.setProperty("sasl.kerberos.service.name","kafka");
在新版本的flink(flink 1.13)的日志中KafkaAppender 配置:
//kerberos 环境认证
String krb5 = "src/main/resources/krb5.conf";
String jaas = "src/main/resources/kafka_server_jaas.conf";
System.setProperty("java.security.krb5.conf",krb5);
System.setProperty("java.security.auth.login.config",jaas);
//要在认证之后
Logger logger = LoggerFactory.getLogger(FlinkKafkaConsumerTest.class);
rootLogger.level = INFO
rootLogger.appenderRef.kafka.ref = Kafka
rootLogger.appenderRef.console.ref = ConsoleAppender
appender.kafka.type = Kafka
appender.kafka.name = Kafka
appender.kafka.layout.type = PatternLayout
appender.kafka.layout.pattern = %d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n
appender.kafka.property.type = Property
appender.kafka.property.name= bootstrap.servers
appender.kafka.property.value= localhost:9092
appender.kafka.sasl1.type = Property
appender.kafka.sasl1.name = sasl.mechanism
appender.kafka.sasl1.value = GSSAPI
appender.kafka.security.type = Property
appender.kafka.security.name = security.inter.broker.protocol
appender.kafka.security.value = SASL_PLAINTEXT
appender.kafka.sasl2.type = Property
appender.kafka.sasl2.name = sasl.kerberos.service.name
appender.kafka.sasl2.value = kafka
appender.kafka.topic = test
appender.console.name = ConsoleAppender
appender.console.type = CONSOLE
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n