前阵子在服务器上搭了个 kafka,搭好后安装在 /usr/local/kafka 下:
[root@lucas kafka]# pwd
/usr/local/kafka
[root@lucas kafka]# ll
total 56
drwxr-xr-x 3 root root 4096 Jan 16 18:22 bin
drwxr-xr-x 2 root root 4096 Jan 16 18:49 config
drwxr-xr-x 2 root root 4096 Dec 29 18:20 libs
-rw-r--r-- 1 root root 29975 Jul 29 02:16 LICENSE
drwxr-xr-x 2 root root 4096 Jan 16 18:22 logs
-rw-r--r-- 1 root root 337 Jul 29 02:16 NOTICE
drwxr-xr-x 2 root root 4096 Jul 29 02:20 site-docs
启动服务端命令如下:
./bin/kafka-server-start.sh -daemon config/server.properties
无意中打开 /bin/kafka-server-start.sh 文件看了一眼,发现还挺短的,那咱就看看吧。文件长了逻辑复杂了咱也看不懂是吧!
#!/bin/bash
if [ $# -lt 1 ];
then
echo "USAGE: $0 [-daemon] server.properties [--override property=value]*"
exit 1
fi
base_dir=$(dirname $0)
if [ "x$KAFKA_LOG4J_OPTS" = "x" ]; then
export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:$base_dir/../config/log4j.properties"
fi
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
fi
EXTRA_ARGS=${EXTRA_ARGS-'-name kafkaServer -loggc'}
COMMAND=$1
case $COMMAND in
-daemon)
EXTRA_ARGS="-daemon "$EXTRA_ARGS
shift
;;
*)
;;
esac
exec $base_dir/kafka-run-class.sh $EXTRA_ARGS kafka.Kafka "$@"
很短是吧?!大概分以下几步:
- 判断参数有没有,参数个数小于1就提示用法;
- 获取脚本当前路径赋值给变量 base_dir;
- 判断日志参数 KAFKA_LOG4J_OPTS 是否为空,为空就给它一个值;
- 判断堆参数 KAFKA_HEAP_OPTS是否为空,为空就默认给它赋值为 "-Xmx1G -Xms1G";
- 判断启动命令中第一个参数是否为 -daemon,如果是就以守护进程启动(其实不是,是赋给另一个变量 EXTRA_ARGS);
- 执行命令。
一开始以为这么短就完了,真完了?看到最后一步中还有个 kafka-run-class.sh 文件,问题果然没这么简单!看都看了一半了,就继续看下去吧。
kafka-run-class.sh 文件就没这么短小精悍了,共319行。其实它是 kafka 的一个公共脚本,关于 kafka 的所有操作都是最终调用这个脚本执行的,只是参数不同而已。可以看下 bin 目录下调用该脚本情况:
[root@lucas bin]# pwd
/usr/local/kafka/bin
[root@lucas bin]# grep -rn "kafka-run-class.sh" ./*
./connect-distributed.sh:45:exec $(dirname $0)/kafka-run-class.sh $EXTRA_ARGS org.apache.kafka.connect.cli.ConnectDistributed "$@"
./connect-mirror-maker.sh:45:exec $(dirname $0)/kafka-run-class.sh $EXTRA_ARGS org.apache.kafka.connect.mirror.MirrorMaker "$@"
./connect-standalone.sh:45:exec $(dirname $0)/kafka-run-class.sh $EXTRA_ARGS org.apache.kafka.connect.cli.ConnectStandalone "$@"
./kafka-acls.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.AclCommand "$@"
./kafka-broker-api-versions.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.BrokerApiVersionsCommand "$@"
./kafka-configs.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.ConfigCommand "$@"
./kafka-console-consumer.sh:21:exec $(dirname $0)/kafka-run-class.sh kafka.tools.ConsoleConsumer "$@"
./kafka-console-producer.sh:20:exec $(dirname $0)/kafka-run-class.sh kafka.tools.ConsoleProducer "$@"
./kafka-consumer-groups.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.ConsumerGroupCommand "$@"
./kafka-consumer-perf-test.sh:20:exec $(dirname $0)/kafka-run-class.sh kafka.tools.ConsumerPerformance "$@"
./kafka-delegation-tokens.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.DelegationTokenCommand "$@"
./kafka-delete-records.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.DeleteRecordsCommand "$@"
./kafka-dump-log.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.tools.DumpLogSegments "$@"
./kafka-leader-election.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.LeaderElectionCommand "$@"
./kafka-log-dirs.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.LogDirsCommand "$@"
./kafka-mirror-maker.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.tools.MirrorMaker "$@"
./kafka-preferred-replica-election.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.PreferredReplicaLeaderElectionCommand "$@"
./kafka-producer-perf-test.sh:20:exec $(dirname $0)/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance "$@"
./kafka-reassign-partitions.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.ReassignPartitionsCommand "$@"
./kafka-replica-verification.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.tools.ReplicaVerificationTool "$@"
./kafka-server-start.sh:44:exec $base_dir/kafka-run-class.sh $EXTRA_ARGS kafka.Kafka "$@"
./kafka-streams-application-reset.sh:21:exec $(dirname $0)/kafka-run-class.sh kafka.tools.StreamsResetter "$@"
./kafka-topics.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.TopicCommand "$@"
./kafka-verifiable-consumer.sh:20:exec $(dirname $0)/kafka-run-class.sh org.apache.kafka.tools.VerifiableConsumer "$@"
./kafka-verifiable-producer.sh:20:exec $(dirname $0)/kafka-run-class.sh org.apache.kafka.tools.VerifiableProducer "$@"
./trogdor.sh:50:exec $(dirname $0)/kafka-run-class.sh "${CLASS}" "$@"
./zookeeper-security-migration.sh:17:exec $(dirname $0)/kafka-run-class.sh kafka.admin.ZkSecurityMigrator "$@"
./zookeeper-server-start.sh:44:exec $base_dir/kafka-run-class.sh $EXTRA_ARGS org.apache.zookeeper.server.quorum.QuorumPeerMain "$@"
./zookeeper-shell.sh:23:exec $(dirname $0)/kafka-run-class.sh org.apache.zookeeper.ZooKeeperMainWithTlsSupportForKafka -server "$@"
回到脚本内容,脚本太长咱就不直接贴代码了,先从最后一部分看起。
# Launch mode
if [ "x$DAEMON_MODE" = "xtrue" ]; then
nohup "$JAVA" $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS $KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp "$CLASSPATH" $KAFKA_OPTS "$@" > "$CONSOLE_OUTPUT_FILE" 2>&1 < /dev/null &
else
exec "$JAVA" $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS $KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp "$CLASSPATH" $KAFKA_OPTS "$@"
fi
看到这是不是就明白前面那300多行是干嘛的了,无非就是给最后这部分各个变量赋值嘛!可以先看看最后这条命令打印出来是啥:
/usr/local/java/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent
-XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xloggc:/usr/local/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/usr/local/kafka/bin/../logs -Dlog4j.configuration=file:./bin/../config/log4j.properties
-cp /usr/local/kafka/libs/*.jar kafka.Kafka config/server.properties
大概就这么个样子,其实变量 $CLASSPATH 内容很长很长,这里是直接省略,用 * 代替了。最后的 kafka.Kafka config/server.properties 就是 kafka-server-start.sh 文件中执行 kafka-run-class.sh 脚本时后面的参数列表,这里用的 "$@" 表示, kafka 的其他操作执行脚本时也就是这里不同了。另外这段是不是还有点熟悉?看看运行中的 kafka 进程:
[root@lucas kafka]# ps -ef | grep kafka
root 10096 1 1 19:22 pts/2 00:00:05 /usr/local/java/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
-XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xloggc:/usr/local/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/usr/local/kafka/bin/../logs
-Dlog4j.configuration=file:./bin/../config/log4j.properties -cp /usr/local/kafka/bin/../libs/*.jar kafka.Kafka config/server.properties
进程名也省去了那一大串哈。这不恰好就是脚本中最后执行的那段启动命令么!回到 kafka-run-class.sh 文件中,如果想看哪个参数值怎么来的,就可以看前面相应部分怎么给它赋值了。譬如想看参数 $KAFKA_GC_LOG_OPTS,可以看如下部分:
if [ -z "$KAFKA_GC_LOG_OPTS" ]; then
GC_LOG_ENABLED="true"
fi
shift
;;
...
esac
done
# GC options
GC_FILE_SUFFIX='-gc.log'
GC_LOG_FILE_NAME=''
if [ "x$GC_LOG_ENABLED" = "xtrue" ]; then
GC_LOG_FILE_NAME=$DAEMON_NAME$GC_FILE_SUFFIX
# ...
JAVA_MAJOR_VERSION=$("$JAVA" -version 2>&1 | sed -E -n 's/.* version "([0-9]*).*$/\1/p')
if [[ "$JAVA_MAJOR_VERSION" -ge "9" ]] ; then
KAFKA_GC_LOG_OPTS="-Xlog:gc*:file=$LOG_DIR/$GC_LOG_FILE_NAME:time,tags:filecount=10,filesize=102400"
else
KAFKA_GC_LOG_OPTS="-Xloggc:$LOG_DIR/$GC_LOG_FILE_NAME -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M"
fi
fi
至于参数值具体啥意思,就可以一个个去查了哈。作为非java开发,我也看不懂-XX之类的都是啥意思。
说点题外话,由于业务中有用到maxwell、logstash、elasticsearch、以及kafka,之前查看它们进程时候,总被这一大串进程名弄的有点懵,就搞不懂为啥 java 连进程名就这么复杂呢!?直到看了kafka服务端启动脚本,才明白原来是这么来的。然后又去看了其他几个中间件的启动脚本,果然都是这么回事哈!