目录
1、多线程生产者
2、多线程消费者
2.1、Consumer为何需要实现多线程
2.2、多线程的Kafka Consumer 模型类别
2.2.1、模型一:多个Consumer且每一个Consumer有自己的线程
2.2.2、模型二:一个Consumer且有多个Worker线程
1、多线程生产者
kafka目前在0.9版本后采用java版本实现,生产者KafkaProducer是线程安全对象,所以我们建议KafkaProducer采用单例模式,多个线程共享一个实例。
代码:
- ProducerThread
package com.qibo.base.controller.kafkaThread;
import java.util.Properties;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.serialization.StringSerializer;
import org.apache.log4j.Logger;
import com.qibo.base.controller.kafka.MQDict;
public class ProducerThread implements Runnable {
static Logger log = Logger.getLogger(Producer.class);
private static KafkaProducer<String, String> producer = null;
/*
* 初始化生产者
*/
static {
Properties configs = initConfig();
producer = new KafkaProducer<String, String>(configs);
}
/*
* 初始化配置
*/
private static Properties initConfig() {
Properties props = new Properties();
props.put("bootstrap.servers", MQDict.MQ_ADDRESS_COLLECTION);
props.put("acks", "1");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());
return props;
}
@Override
public void run() {
System.out.println("主线程序号:"+Thread.currentThread().getId()+" ");
int j = 0;
while (true) {
j++;
// 消息实体
ProducerRecord<String, String> record = null;
for (int i = 0; i < 10; i++) {
record = new ProducerRecord<String, String>(MQDict.PRODUCER_TOPIC, "value" + i);
// 发送消息
producer.send(record, new Callback() {
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (null != e) {
log.info("send error" + e.getMessage());
} else {
System.out.println("线程序号:"+Thread.currentThread().getId()+" "+String.format("发送信息---offset:%s,partition:%s", recordMetadata.offset(),
recordMetadata.partition()));
}
}
});
}
// producer.close();
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
if (j > 5)
break;
}
}
}
- 调用
@RequestMapping("/sendThread")
public void sendThread() {
ExecutorService runnableService = Executors.newFixedThreadPool(3);
runnableService.submit(new ProducerThread());
runnableService.submit(new ProducerThread());
runnableService.submit(new ProducerThread());
runnableService.shutdown();
}
效果:
开了三个线程跑,但是KafkaProducer是线程安全的。
如果是多个partition,会分散在不通的partition:
2、多线程消费者
2.1、Consumer为何需要实现多线程
假设我们正在开发一个消息通知模块,该模块允许用户订阅其他用户发送的通知/消息。该消息通知模块采用Apache Kafka,那么整个架构应该是消息的发布者通过Producer调用API写入消息到Kafka Cluster中,然后消息的订阅者通过Consumer读取消息,刚开始的时候系统架构图如下:
但是,随着用户数量的增多,通知的数据也会对应的增长。总会达到一个阈值,在这个点上,Producer产生的数量大于Consumer能够消费的数量。那么Broker中未消费的消息就会逐渐增多。即使Kafka使用了优秀的消息持久化机制来保存未被消费的消息,但是Kafka的消息保留机制限制(时间,分区大小,消息Key)也会使得始终未被消费的Message被永久性的删除。另一方面从业务上讲,一个消息通知系统的高延迟几乎算作是废物了。所以多线程的Consumer模型是非常有必要的。
清除机制:
清理超过指定时间清理:
log.retention.hours=16
超过指定大小后,删除旧的消息:
log.retention.bytes=1073741824
2.2、多线程的Kafka Consumer 模型类别
基于Consumer的多线程模型有两种类型:
模型一:多个Consumer且每一个Consumer有自己的线程,对应的架构图如下:
模型二:一个Consumer且有多个Worker线程
两种实现方式的优点/缺点比较如下:
可以在$KAFKA_HOME/config/server.properties中通过配置项num.partitions来指定新建Topic的默认Partition数量,也可在创建Topic时通过参数指定,同时也可以在Topic创建之后通过Kafka提供的工具修改
2.2.1、模型一:多个Consumer且每一个Consumer有自己的线程
- ConsumerThread
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import com.qibo.base.controller.kafka.Consumer;
import com.qibo.base.controller.kafka.MQDict;
public class ConsumerThread implements Runnable {
static Logger log = Logger.getLogger(Consumer.class);
private static KafkaConsumer<String, String> consumer;
/**
* 初始化消费者
*/
static {
Properties configs = initConfig();
consumer = new KafkaConsumer<String, String>(configs);
consumer.subscribe(Arrays.asList(MQDict.CONSUMER_TOPIC));
}
/**
* 初始化配置
*/
private static Properties initConfig() {
Properties props = new Properties();
props.put("bootstrap.servers", MQDict.MQ_ADDRESS_COLLECTION);
props.put("group.id", MQDict.CONSUMER_GROUP_ID);
props.put("enable.auto.commit", MQDict.CONSUMER_ENABLE_AUTO_COMMIT);
props.put("auto.commit.interval.ms", MQDict.CONSUMER_AUTO_COMMIT_INTERVAL_MS);
props.put("session.timeout.ms", MQDict.CONSUMER_SESSION_TIMEOUT_MS);
props.put("max.poll.records", MQDict.CONSUMER_MAX_POLL_RECORDS);
props.put("auto.offset.reset", "earliest");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
return props;
}
@Override
public void run() {
System.out.println("主线程序号:"+Thread.currentThread().getId()+" ");
// int i = 1 ;
while (true) {
ConsumerRecords<String, String> records = consumer.poll(MQDict.CONSUMER_POLL_TIME_OUT);
records.forEach((ConsumerRecord<String, String> record) -> {
log.info("线程序号:"+Thread.currentThread().getId()+" partition:"+record.partition()+" 收到消息: key ===" + record.key() + " value ====" + record.value() + " topic ==="
+ record.topic());
});
// i++;
// //每次拉10条CONSUMER_MAX_POLL_RECORDS = 10;
// if (i >5 ){
// consumer.commitSync();
//
// break;
// }
}
// consumer.close();
}
}
controller:
@RequestMapping("/receiveThread")
public void receiveThread() {
ExecutorService runnableService = Executors.newFixedThreadPool(3);
runnableService.submit(new ConsumerThread());
runnableService.submit(new ConsumerThread());
runnableService.submit(new ConsumerThread());
runnableService.shutdown();
}
效果:
以上如果有多个partition,消费段一个consumer对应一个partition,多出来的consumer消费不到partion。
2.2.2、模型二:一个Consumer且有多个Worker线程
生产者跟都一样
ConsumerThreadHandler:
import org.apache.kafka.clients.consumer.ConsumerRecord;
public class ConsumerThreadHandler implements Runnable {
private ConsumerRecord consumerRecord;
public ConsumerThreadHandler(ConsumerRecord consumerRecord) {
this.consumerRecord = consumerRecord;
}
@Override
public void run() {
//结合自己的业务处理
System.out.println("Consumer Message:" + consumerRecord.value() + ",Partition:" + consumerRecord.partition()
+ "Offset:" + consumerRecord.offset());
}
}
ConsumerThread2:
import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.log4j.Logger;
import com.qibo.base.controller.kafka.Consumer;
import com.qibo.base.controller.kafka.MQDict;
public class ConsumerThread2 implements Runnable {
static Logger log = Logger.getLogger(Consumer.class);
private static KafkaConsumer<String, String> consumer;
private ExecutorService executor;
/**
* 初始化消费者
*/
static {
Properties configs = initConfig();
consumer = new KafkaConsumer<String, String>(configs);
consumer.subscribe(Arrays.asList(MQDict.CONSUMER_TOPIC));
}
/**
* 初始化配置
*/
private static Properties initConfig() {
Properties props = new Properties();
props.put("bootstrap.servers", MQDict.MQ_ADDRESS_COLLECTION);
props.put("group.id", MQDict.CONSUMER_GROUP_ID);
props.put("enable.auto.commit", MQDict.CONSUMER_ENABLE_AUTO_COMMIT);
props.put("auto.commit.interval.ms", MQDict.CONSUMER_AUTO_COMMIT_INTERVAL_MS);
props.put("session.timeout.ms", MQDict.CONSUMER_SESSION_TIMEOUT_MS);
props.put("max.poll.records", MQDict.CONSUMER_MAX_POLL_RECORDS);
props.put("auto.offset.reset", "earliest");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
return props;
}
@Override
public void run() {
System.out.println("主线程序号:" + Thread.currentThread().getId() + " ");
executor = new ThreadPoolExecutor(3,3,0L, TimeUnit.MILLISECONDS,
new ArrayBlockingQueue<Runnable>(4), new ThreadPoolExecutor.CallerRunsPolicy());
while (true){
//循环不断拉取100消息
ConsumerRecords<String,String> consumerRecords = consumer.poll(100);
for (ConsumerRecord<String,String> item : consumerRecords){
executor.submit(new ConsumerThreadHandler(item));
}
}
}
}
controller:
@RequestMapping("/receiveThread")
public void receiveThread() {
ExecutorService runnableService = Executors.newFixedThreadPool(3);
runnableService.submit(new ConsumerThread());
runnableService.submit(new ConsumerThread());
runnableService.submit(new ConsumerThread());
runnableService.shutdown();
}
以上实现每次consumer拉取100条消息放入多线程的线程池后处理业务。