消费端参数配置
序号 | 参数 | 默认值 | 等级 | 描述 |
1 | bootstrap.servers | 高 | 指定连接kafka集群所需的broker地址清单 | |
2 | client.dns.lookup | default | 中 | default,use_all_dns_ips,resolve_canonical_bootstrap_servers_only |
3 | group.id | 高 | 消费者隶属的消费组名称,如果设置为空会抛出异常:Exception in thread “main” org.apache.kafka.common.errors.InvalidGroupIdException:The configured groupId is Invalid | |
4 | group.instance.id | 中 | ||
5 | session.timeout.ms | 10000 | 高 | |
6 | heartbeat.interval.ms | 3000ms | 高 | 当使用kafka分组管理功能时,心跳到消费者协调器之间的预计时间,心跳用于确保消费者的会话保持活动状态,当有新的消费者加入或离开组时方便重新平衡,该值必须必session.timeout.ms小,通常不高于1/3 |
7 | partition.assignment.strategy | RangeAssignor | 中 | 分区分配策略 |
8 | metadata.max.age.ms | 300000ms(5分钟) | 低 | 用来配置元数据的过期时间,如果元数据在此参数所限定的时间范围内没有进行更新,则会被强制更新,即使没有任何分区变化或有新的broker加入 |
9 | enable.auto.commit | TRUE | 中 | |
10 | auto.commit.interval.ms | 5000 | 低 | 表示开启自动提交位移的时间间隔 |
11 | client.id | 低 | ||
12 | client.rack | 低 | ||
13 | max.partition.fetch.bytes | 1048576B(1MB) | 高 | 配置每个分区里返回给Consumer的最大数据量,与fetch.max.bytes参数相似,超过不会影响 |
14 | send.buffer.bytes: | 131072B(128KB) | 中 | 设置Socket发送消息缓冲区 |
15 | receive.buffer.bytes | 65536B(64KB) | 中 | 用来设置Socket接收消息缓冲区的大小,如果设置为-1,则使用操作系统的默认值 |
16 | fetch.min.bytes | 1B | 高 | Consumer在一次拉取请求中能从kafka中拉取的最小数据量 |
17 | fetch.max.bytes | 5242880B(50M) | 中 | 拉取的最大数据量,这个设置不是绝对的最大值,如果在第一个非空分区中拉取的第一条消息大于该值,那么该消息将仍然可以消费,kafka所能接受的最大消息,Message.max,bytes(对应于主题端参数max.message.bytes)来设置 |
18 | fetch.max.wait.ms | 500ms | 低 | 指定kafka的等待时间 |
19 | reconnect.backoff.ms | 50ms | 低 | 用来配置尝试重新连接指定主机之前的等待时间,避免频繁的连接主机 |
20 | reconnect.backoff.max.ms | 1000 | 低 | |
21 | retry.backoff.ms: | 100ms | 低 | 用来配置尝试重新发送失败的请求到指定的主题分区之前的等待时间,避免频繁发送 |
22 | auto.offset.reet | latest | 中 | earliest、latest、none |
23 | check.crcs | TRUE | 低 | |
24 | metrics.sample.window.ms | 30000 | 低 | |
25 | metrics.num.samples | 2 | 低 | |
26 | metrics.recording.level | INFO | INFO,DEBUG | |
27 | metric.reporters | 低 | ||
28 | key.decerializer | 高 | 解码方式 | |
29 | value.decerializer | 高 | 解码方式 | |
30 | request.timeout.ms | 30000ms | 中 | 用来配置Consumer等待请求响应的最长时间 |
31 | default.api.timeout.ms | 60 * 1000 | 中 | |
32 | connection.max.idle.ms | 540000ms(9分钟) | 中 | 指定在多久之后关闭限制的链接 |
33 | interceptor.class | 低 | 拦截器 | |
34 | max.poll.records | 500条 | 中 | 配置Consumer在一次拉取请求中拉取的最大消息数 |
35 | max.poll.interval.ms | 300000 | 中 | 当通过消费组管理消费者时,该配置指定拉取消息新城最长空闲时间,若超过这个时间间隔还未发起poll操作,则消费组认为该消费组已离开了消费组,将进行再均衡操作 |
36 | exclude.internal.topics | TRUE | 中 | :指定kafka中的内部主题是否可以向消费者公开,默认为true,如果为true,则只能使用subscribe(Collection)的方式而不能使用subscribe(Pattern)的方式来订阅内部主题 |
37 | isolation.level | read_uncommitted | 中 | 这个参数用来配置消费者的事务隔离级别,有效值为“read_uncommitted"和"read_committed",表示消费者所消费到的位置,如果为read_committed,那么消费者就会忽略事务未提交的消息,只能消费到LSO(LastStableOffet)的位置,默认为read_uncommitted,可以消费到HW处的位置 |
38 | allow.auto.create.topics | TRUE | 中 | |
39 | security.providers | 低 | ||
40 | security.protocol | 中 |
跟组重平衡有关的参数
- session.timeout.ms
- max.poll.interval.ms
- heartbeat.interval.ms
- group.id
- group.instance.id
- retry.backoff.ms
- internal.leave.group.on.close
跟consumerMetadata有关的数据
- retry.backoff.ms
- metadata.max.age.ms
- exclude.internal.topics
- allow.auto.create.topics
consume启动样例
public class MyConsume {
public static void main(String[] args) {
Consumer<String,String> consumer = new KafkaConsumer<String, String>(MyConsume.getConsumeProp());
consumer.subscribe(Collections.singletonList("test_2"));
System.out.println("开始-------------------------------");
while(true){
ConsumerRecords<String,String> records = consumer.poll(Duration.ofMillis(1000));
if(!records.isEmpty()) {
Map<TopicPartition,OffsetAndMetadata> offsets = new HashMap<>();
try {
for (TopicPartition partition : records.partitions()) {
List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
for (ConsumerRecord<String, String> record : partitionRecords) {
System.out.println("-----------------------------------"+record.value());
}
long lastConsumeOffset = partitionRecords.get(partitionRecords.size() - 1).offset();
offsets.put(partition,new OffsetAndMetadata(lastConsumeOffset+1));
}
consumer.commitAsync();
}catch (Exception e){
}
}
}
}
public static Properties getConsumeProp(){
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093,localhost:9094");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,false);
props.put(ConsumerConfig.GROUP_ID_CONFIG,"mykafka-group");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
return props;
}
}
客户端初始化源码分析
org.apache.kafka.clients.consumer.KafkaConsumer#KafkaConsumer(java.util.Properties)
private KafkaConsumer(ConsumerConfig config, Deserializer<K> keyDeserializer, Deserializer<V> valueDeserializer) {
try {
//跟组重平衡有关的参数
GroupRebalanceConfig groupRebalanceConfig = new GroupRebalanceConfig(config,
GroupRebalanceConfig.ProtocolType.CONSUMER);
//初始化groupId、clientId
this.groupId = Optional.ofNullable(groupRebalanceConfig.groupId);
this.clientId = buildClientId(config.getString(CommonClientConfigs.CLIENT_ID_CONFIG), groupRebalanceConfig);
LogContext logContext;
// If group.instance.id is set, we will append it to the log context.
if (groupRebalanceConfig.groupInstanceId.isPresent()) {
logContext = new LogContext("[Consumer instanceId=" + groupRebalanceConfig.groupInstanceId.get() +
", clientId=" + clientId + ", groupId=" + groupId.orElse("null") + "] ");
} else {
logContext = new LogContext("[Consumer clientId=" + clientId + ", groupId=" + groupId.orElse("null") + "] ");
}
this.log = logContext.logger(getClass());
boolean enableAutoCommit = config.getBoolean(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG);
if (!groupId.isPresent()) { // overwrite in case of default group id where the config is not explicitly provided
if (!config.originals().containsKey(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG)) {
enableAutoCommit = false;
} else if (enableAutoCommit) {
throw new InvalidConfigurationException(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG + " cannot be set to true when default group id (null) is used.");
}
} else if (groupId.get().isEmpty()) {
log.warn("Support for using the empty group id by consumers is deprecated and will be removed in the next major release.");
}
log.debug("Initializing the Kafka consumer");
this.requestTimeoutMs = config.getInt(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG);
this.defaultApiTimeoutMs = config.getInt(ConsumerConfig.DEFAULT_API_TIMEOUT_MS_CONFIG);
this.time = Time.SYSTEM;
//客户端指标
this.metrics = buildMetrics(config, time, clientId);
this.retryBackoffMs = config.getLong(ConsumerConfig.RETRY_BACKOFF_MS_CONFIG);
// 加载客户端自定义拦截器
Map<String, Object> userProvidedConfigs = config.originals();
userProvidedConfigs.put(ConsumerConfig.CLIENT_ID_CONFIG, clientId);
List<ConsumerInterceptor<K, V>> interceptorList = (List) (new ConsumerConfig(userProvidedConfigs, false)).getConfiguredInstances(ConsumerConfig.INTERCEPTOR_CLASSES_CONFIG,
ConsumerInterceptor.class);
this.interceptors = new ConsumerInterceptors<>(interceptorList);
//key及value序列化方式
if (keyDeserializer == null) {
this.keyDeserializer = config.getConfiguredInstance(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, Deserializer.class);
this.keyDeserializer.configure(config.originals(), true);
} else {
config.ignore(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG);
this.keyDeserializer = keyDeserializer;
}
if (valueDeserializer == null) {
this.valueDeserializer = config.getConfiguredInstance(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, Deserializer.class);
this.valueDeserializer.configure(config.originals(), false);
} else {
config.ignore(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG);
this.valueDeserializer = valueDeserializer;
}
//offset重置策略
OffsetResetStrategy offsetResetStrategy = OffsetResetStrategy.valueOf(config.getString(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG).toUpperCase(Locale.ROOT));
//初始化SubscriptionState
this.subscriptions = new SubscriptionState(logContext, offsetResetStrategy);
ClusterResourceListeners clusterResourceListeners = configureClusterResourceListeners(keyDeserializer,
valueDeserializer, metrics.reporters(), interceptorList);
//初始化ConsumerMetadata
this.metadata = new ConsumerMetadata(retryBackoffMs,
config.getLong(ConsumerConfig.METADATA_MAX_AGE_CONFIG),
!config.getBoolean(ConsumerConfig.EXCLUDE_INTERNAL_TOPICS_CONFIG),
config.getBoolean(ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG),
subscriptions, logContext, clusterResourceListeners);
List<InetSocketAddress> addresses = ClientUtils.parseAndValidateAddresses(
config.getList(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG), config.getString(ConsumerConfig.CLIENT_DNS_LOOKUP_CONFIG));
this.metadata.bootstrap(addresses);
String metricGrpPrefix = "consumer";
FetcherMetricsRegistry metricsRegistry = new FetcherMetricsRegistry(Collections.singleton(CLIENT_ID_METRIC_TAG), metricGrpPrefix);
ChannelBuilder channelBuilder = ClientUtils.createChannelBuilder(config, time, logContext);
IsolationLevel isolationLevel = IsolationLevel.valueOf(
config.getString(ConsumerConfig.ISOLATION_LEVEL_CONFIG).toUpperCase(Locale.ROOT));
Sensor throttleTimeSensor = Fetcher.throttleTimeSensor(metrics, metricsRegistry);
int heartbeatIntervalMs = config.getInt(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG);
ApiVersions apiVersions = new ApiVersions();
NetworkClient netClient = new NetworkClient(
new Selector(config.getLong(ConsumerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG), metrics, time, metricGrpPrefix, channelBuilder, logContext),
this.metadata,
clientId,
100, // a fixed large enough value will suffice for max in-flight requests
config.getLong(ConsumerConfig.RECONNECT_BACKOFF_MS_CONFIG),
config.getLong(ConsumerConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG),
config.getInt(ConsumerConfig.SEND_BUFFER_CONFIG),
config.getInt(ConsumerConfig.RECEIVE_BUFFER_CONFIG),
config.getInt(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG),
ClientDnsLookup.forConfig(config.getString(ConsumerConfig.CLIENT_DNS_LOOKUP_CONFIG)),
time,
true,
apiVersions,
throttleTimeSensor,
logContext);
this.client = new ConsumerNetworkClient(
logContext,
netClient,
metadata,
time,
retryBackoffMs,
config.getInt(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG),
heartbeatIntervalMs); //Will avoid blocking an extended period of time to prevent heartbeat thread starvation
//获取分区分配策略
this.assignors = getAssignorInstances(config.getList(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG), config.originals());
// no coordinator will be constructed for the default (null) group id
//初始化ConsumerCoordinator及ConsumerGroupMetadata
this.coordinator = !groupId.isPresent() ? null :
new ConsumerCoordinator(groupRebalanceConfig,
logContext,
this.client,
assignors,
this.metadata,
this.subscriptions,
metrics,
metricGrpPrefix,
this.time,
enableAutoCommit,
config.getInt(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG),
this.interceptors);
this.fetcher = new Fetcher<>(
logContext,
this.client,
config.getInt(ConsumerConfig.FETCH_MIN_BYTES_CONFIG),
config.getInt(ConsumerConfig.FETCH_MAX_BYTES_CONFIG),
config.getInt(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG),
config.getInt(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG),
config.getInt(ConsumerConfig.MAX_POLL_RECORDS_CONFIG),
config.getBoolean(ConsumerConfig.CHECK_CRCS_CONFIG),
config.getString(ConsumerConfig.CLIENT_RACK_CONFIG),
this.keyDeserializer,
this.valueDeserializer,
this.metadata,
this.subscriptions,
metrics,
metricsRegistry,
this.time,
this.retryBackoffMs,
this.requestTimeoutMs,
isolationLevel,
apiVersions);
//客户端统计指标
this.kafkaConsumerMetrics = new KafkaConsumerMetrics(metrics, metricGrpPrefix);
config.logUnused();
AppInfoParser.registerAppInfo(JMX_PREFIX, clientId, metrics, time.milliseconds());
log.debug("Kafka consumer initialized");
} catch (Throwable t) {
// call close methods if internal objects are already constructed; this is to prevent resource leak. see KAFKA-2121
// we do not need to call `close` at all when `log` is null, which means no internal objects were initialized.
if (this.log != null) {
close(0, true);
}
// now propagate the exception
throw new KafkaException("Failed to construct kafka consumer", t);
}
}
总结
客户端初始化主要做了以下几件事:
1、读一些配置
2、初始化SubscriptionState
3、初始化客户端指标fetch-throttle-time-avg,fetch-throttle-time-max
4、初始化客户端元数据 ConsumerMetadata
5、读取分区分配策略
6、初始化客户端协调器
- 初始化消费组元数据 ConsumerGroupMetadata
- 初始化消费组有关的统计指标
7、初始化消费者有关的统计指标
客户端绑定topic源码分析
org.apache.kafka.clients.consumer.KafkaConsumer#subscribe(java.util.Collection<java.lang.String>, org.apache.kafka.clients.consumer.ConsumerRebalanceListener)
public void subscribe(Collection<String> topics, ConsumerRebalanceListener listener) {
acquireAndEnsureOpen();
try {
//校验groupId
maybeThrowInvalidGroupIdException();
if (topics == null)
throw new IllegalArgumentException("Topic collection to subscribe to cannot be null");
if (topics.isEmpty()) {
// treat subscribing to empty topic list as the same as unsubscribing
this.unsubscribe();
} else {
for (String topic : topics) {
if (topic == null || topic.trim().isEmpty())
throw new IllegalArgumentException("Topic collection to subscribe to cannot contain null or empty topic");
}
//如果没有分区分配规则,则抛错,清除没有绑定的topic
throwIfNoAssignorsConfigured();
fetcher.clearBufferedDataForUnassignedTopics(topics);
log.info("Subscribed to topic(s): {}", Utils.join(topics, ", "));
//更新客户端元数据
if (this.subscriptions.subscribe(new HashSet<>(topics), listener))
metadata.requestUpdateForNewTopics();
}
} finally {
release();
}
}
总结
1、校验groupId
2、如果绑定的topic为空则执行解绑操作
3、设置立即更新客户端元数据 ConsumerMetadata
客户端绑定topic是支持动态刷新的,如果topic有变动,绑定topic的代码会请求立即更新元数据
客户端poll做的事
org.apache.kafka.clients.consumer.KafkaConsumer#poll(org.apache.kafka.common.utils.Timer, boolean)
private ConsumerRecords<K, V> poll(final Timer timer, final boolean includeMetadataInTimeout) {
//获取轻量级的锁,防止多线程使用kafkaConsumer
acquireAndEnsureOpen();
try {
//之前已经初始化的kafkaConsumer指标
this.kafkaConsumerMetrics.recordPollStart(timer.currentTimeMs());
if (this.subscriptions.hasNoSubscriptionOrUserAssignment()) {
throw new IllegalStateException("Consumer is not subscribed to any topics or assigned any partitions");
}
do {
//判断客户端是否能被唤醒,如果线程没有在执行不可中断请求,且被中断,则抛异常
client.maybeTriggerWakeup();
if (includeMetadataInTimeout) {
// 这个方法有三个作用
//1、更新元数据
//2、加入消费组,非阻塞
//3、获取消费位点
updateAssignmentMetadataIfNeeded(timer, false);
} else {
while (!updateAssignmentMetadataIfNeeded(time.timer(Long.MAX_VALUE), true)) {
log.warn("Still waiting for metadata");
}
}
//fetch数据
final Map<TopicPartition, List<ConsumerRecord<K, V>>> records = pollForFetches(timer);
if (!records.isEmpty()) {
// before returning the fetched records, we can send off the next round of fetches
// and avoid block waiting for their responses to enable pipelining while the user
// is handling the fetched records.
//
// NOTE: since the consumed position has already been updated, we must not allow
// wakeups or any other errors to be triggered prior to returning the fetched records.
if (fetcher.sendFetches() > 0 || client.hasPendingRequests()) {
client.transmitSends();
}
//调用自定义拦截器过滤数据
return this.interceptors.onConsume(new ConsumerRecords<>(records));
}
//判断time是否有超时,没有的话继续poll数据
} while (timer.notExpired());
return ConsumerRecords.empty();
} finally {
release();
this.kafkaConsumerMetrics.recordPollEnd(timer.currentTimeMs());
}
}
总结
1、与最小负载节点通信找到group节点
2、加入group,rebanlance
3、与其他节点同步rebanlance结果
4、fetch数据
5、处理数据
6、与group交互提交位点
客户端poll数据简易流程图