目录

1、多线程生产者

2、多线程消费者

2.1、Consumer为何需要实现多线程

2.2、多线程的Kafka Consumer 模型类别

2.2.1、模型一:多个Consumer且每一个Consumer有自己的线程

2.2.2、模型二:一个Consumer且有多个Worker线程


1、多线程生产者

kafka目前在0.9版本后采用java版本实现,生产者KafkaProducer是线程安全对象,所以我们建议KafkaProducer采用单例模式,多个线程共享一个实例。

代码:

  • ProducerThread
package com.qibo.base.controller.kafkaThread;

import java.util.Properties;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.serialization.StringSerializer;
import org.apache.log4j.Logger;

import com.qibo.base.controller.kafka.MQDict;

public class ProducerThread implements Runnable {
	static Logger log = Logger.getLogger(Producer.class);

	private static KafkaProducer<String, String> producer = null;

	/*
	 * 初始化生产者
	 */
	static {
		Properties configs = initConfig();
		producer = new KafkaProducer<String, String>(configs);
	}

	/*
	 * 初始化配置
	 */
	private static Properties initConfig() {
		Properties props = new Properties();
		props.put("bootstrap.servers", MQDict.MQ_ADDRESS_COLLECTION);
		props.put("acks", "1");
		props.put("retries", 0);
		props.put("batch.size", 16384);
		props.put("key.serializer", StringSerializer.class.getName());
		props.put("value.serializer", StringSerializer.class.getName());
		return props;
	}

	@Override
	public void run() {
		System.out.println("主线程序号:"+Thread.currentThread().getId()+" ");
		int j = 0;
		while (true) {
			j++;
			// 消息实体
			ProducerRecord<String, String> record = null;
			for (int i = 0; i < 10; i++) {
				record = new ProducerRecord<String, String>(MQDict.PRODUCER_TOPIC, "value" + i);
				// 发送消息
				producer.send(record, new Callback() {
					@Override
					public void onCompletion(RecordMetadata recordMetadata, Exception e) {
						if (null != e) {
							log.info("send error" + e.getMessage());
						} else {
							System.out.println("线程序号:"+Thread.currentThread().getId()+" "+String.format("发送信息---offset:%s,partition:%s", recordMetadata.offset(),
									recordMetadata.partition()));
						}
					}
				});
			}
			// producer.close();
			try {
				
				Thread.sleep(3000);
			} catch (InterruptedException e) {
				e.printStackTrace();
			}
			if (j > 5)
				break;
		}

	}

}
  • 调用
@RequestMapping("/sendThread")
	public void sendThread() {
		ExecutorService runnableService = Executors.newFixedThreadPool(3);
		runnableService.submit(new ProducerThread());
		runnableService.submit(new ProducerThread());
		runnableService.submit(new ProducerThread());
		runnableService.shutdown();
	}

效果:

kafka 多线程消费 java kafka多线程生产_kafka

开了三个线程跑,但是KafkaProducer是线程安全的。

如果是多个partition,会分散在不通的partition:

kafka 多线程消费 java kafka多线程生产_kafka_02

2、多线程消费者

2.1、Consumer为何需要实现多线程

假设我们正在开发一个消息通知模块,该模块允许用户订阅其他用户发送的通知/消息。该消息通知模块采用Apache Kafka,那么整个架构应该是消息的发布者通过Producer调用API写入消息到Kafka Cluster中,然后消息的订阅者通过Consumer读取消息,刚开始的时候系统架构图如下:

kafka 多线程消费 java kafka多线程生产_kafka_03

但是,随着用户数量的增多,通知的数据也会对应的增长。总会达到一个阈值,在这个点上,Producer产生的数量大于Consumer能够消费的数量。那么Broker中未消费的消息就会逐渐增多。即使Kafka使用了优秀的消息持久化机制来保存未被消费的消息,但是Kafka的消息保留机制限制(时间,分区大小,消息Key)也会使得始终未被消费的Message被永久性的删除。另一方面从业务上讲,一个消息通知系统的高延迟几乎算作是废物了。所以多线程的Consumer模型是非常有必要的。

清除机制:

清理超过指定时间清理: 
log.retention.hours=16
超过指定大小后,删除旧的消息:
log.retention.bytes=1073741824

2.2、多线程的Kafka Consumer 模型类别

基于Consumer的多线程模型有两种类型:

模型一:多个Consumer且每一个Consumer有自己的线程,对应的架构图如下:

kafka 多线程消费 java kafka多线程生产_apache_04

模型二:一个Consumer且有多个Worker线程

kafka 多线程消费 java kafka多线程生产_kafka 多线程消费 java_05

两种实现方式的优点/缺点比较如下:

kafka 多线程消费 java kafka多线程生产_kafka_06

可以在$KAFKA_HOME/config/server.properties中通过配置项num.partitions来指定新建Topic的默认Partition数量,也可在创建Topic时通过参数指定,同时也可以在Topic创建之后通过Kafka提供的工具修改

2.2.1、模型一:多个Consumer且每一个Consumer有自己的线程

  • ConsumerThread
import java.util.Arrays;
import java.util.Properties;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;

import com.qibo.base.controller.kafka.Consumer;
import com.qibo.base.controller.kafka.MQDict;

public class ConsumerThread implements Runnable {

	static Logger log = Logger.getLogger(Consumer.class);

	private static KafkaConsumer<String, String> consumer;

	/**
	 * 初始化消费者
	 */
	static {
		Properties configs = initConfig();
		consumer = new KafkaConsumer<String, String>(configs);
		consumer.subscribe(Arrays.asList(MQDict.CONSUMER_TOPIC));
	}

	/**
	 * 初始化配置
	 */
	private static Properties initConfig() {
		Properties props = new Properties();
		props.put("bootstrap.servers", MQDict.MQ_ADDRESS_COLLECTION);
		props.put("group.id", MQDict.CONSUMER_GROUP_ID);
		props.put("enable.auto.commit", MQDict.CONSUMER_ENABLE_AUTO_COMMIT);
		props.put("auto.commit.interval.ms", MQDict.CONSUMER_AUTO_COMMIT_INTERVAL_MS);
		props.put("session.timeout.ms", MQDict.CONSUMER_SESSION_TIMEOUT_MS);
		props.put("max.poll.records", MQDict.CONSUMER_MAX_POLL_RECORDS);
		props.put("auto.offset.reset", "earliest");
		props.put("key.deserializer", StringDeserializer.class.getName());
		props.put("value.deserializer", StringDeserializer.class.getName());
		return props;
	}

	@Override
	public void run() {
		System.out.println("主线程序号:"+Thread.currentThread().getId()+" ");

//		int i = 1 ;
		while (true) {
			ConsumerRecords<String, String> records = consumer.poll(MQDict.CONSUMER_POLL_TIME_OUT); 
			records.forEach((ConsumerRecord<String, String> record) -> {
				
				log.info("线程序号:"+Thread.currentThread().getId()+" partition:"+record.partition()+" 收到消息: key ===" + record.key() + " value ====" + record.value() + " topic ==="
						+ record.topic());
			});
//			 i++;
//	            //每次拉10条CONSUMER_MAX_POLL_RECORDS = 10;		
//	            if (i >5 ){
//	                consumer.commitSync();
//	              
//	                break;
//	            }
		}
//		  consumer.close();
	}

}

controller:

@RequestMapping("/receiveThread")
	public void receiveThread() {
		ExecutorService runnableService = Executors.newFixedThreadPool(3);
		runnableService.submit(new ConsumerThread());
		runnableService.submit(new ConsumerThread());
		runnableService.submit(new ConsumerThread());
		runnableService.shutdown();
	}

效果:

kafka 多线程消费 java kafka多线程生产_kafka_07

以上如果有多个partition,消费段一个consumer对应一个partition,多出来的consumer消费不到partion。

2.2.2、模型二:一个Consumer且有多个Worker线程

生产者跟都一样


ConsumerThreadHandler:

import org.apache.kafka.clients.consumer.ConsumerRecord;

public class ConsumerThreadHandler implements Runnable {

	private ConsumerRecord consumerRecord;

	public ConsumerThreadHandler(ConsumerRecord consumerRecord) {
		this.consumerRecord = consumerRecord;
	}

	@Override
	public void run() {
		//结合自己的业务处理
		System.out.println("Consumer Message:" + consumerRecord.value() + ",Partition:" + consumerRecord.partition()
				+ "Offset:" + consumerRecord.offset());
	}

}

ConsumerThread2:

import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.log4j.Logger;

import com.qibo.base.controller.kafka.Consumer;
import com.qibo.base.controller.kafka.MQDict;

public class ConsumerThread2 implements Runnable {

	static Logger log = Logger.getLogger(Consumer.class);

	private static KafkaConsumer<String, String> consumer;

	private ExecutorService executor;

	/**
	 * 初始化消费者
	 */
	static {
		Properties configs = initConfig();
		consumer = new KafkaConsumer<String, String>(configs);
		consumer.subscribe(Arrays.asList(MQDict.CONSUMER_TOPIC));
	}

	/**
	 * 初始化配置
	 */
	private static Properties initConfig() {
		Properties props = new Properties();
		props.put("bootstrap.servers", MQDict.MQ_ADDRESS_COLLECTION);
		props.put("group.id", MQDict.CONSUMER_GROUP_ID);
		props.put("enable.auto.commit", MQDict.CONSUMER_ENABLE_AUTO_COMMIT);
		props.put("auto.commit.interval.ms", MQDict.CONSUMER_AUTO_COMMIT_INTERVAL_MS);
		props.put("session.timeout.ms", MQDict.CONSUMER_SESSION_TIMEOUT_MS);
		props.put("max.poll.records", MQDict.CONSUMER_MAX_POLL_RECORDS);
		props.put("auto.offset.reset", "earliest");
		props.put("key.deserializer", StringDeserializer.class.getName());
		props.put("value.deserializer", StringDeserializer.class.getName());
		return props;
	}

	@Override
	public void run() {
		System.out.println("主线程序号:" + Thread.currentThread().getId() + " ");
		 executor = new ThreadPoolExecutor(3,3,0L, TimeUnit.MILLISECONDS,
	                new ArrayBlockingQueue<Runnable>(4), new ThreadPoolExecutor.CallerRunsPolicy());
	        while (true){
	        	//循环不断拉取100消息
	            ConsumerRecords<String,String> consumerRecords = consumer.poll(100);
	            for (ConsumerRecord<String,String> item : consumerRecords){
	                executor.submit(new ConsumerThreadHandler(item));
	            }
	        }
		
		
		
	}

}

controller:

@RequestMapping("/receiveThread")
	public void receiveThread() {
		ExecutorService runnableService = Executors.newFixedThreadPool(3);
		runnableService.submit(new ConsumerThread());
		runnableService.submit(new ConsumerThread());
		runnableService.submit(new ConsumerThread());
		runnableService.shutdown();
	}

以上实现每次consumer拉取100条消息放入多线程的线程池后处理业务。