一. kafka结构
特点:
- 支持高Throughput的应用
- scale out:无需停机即可扩展机器
- 持久化:通过将数据持久化到硬盘以及replication防止数据丢失
- 支持online和offline的场景。
消息层次:
- Topic:一类消息,例如page view日志,click日志等都可以以topic的形式存在,kafka集群能够同时负责多个topic的分发
- Partition: Topic物理上的分组,一个topic可以分为多个partition,每个partition是一个有序的队列。partition中的每条消息都会被分配一个有序的id(offset)。
- Message:消息,最小订阅单元
Producer:消息发布者
Broker:消息中间件处理结点,一个kafka节点就是一个broker
Consumer:消息订阅者
具体流程:
- Producer根据指定的partition方法(round-robin、hash等),将消息发布到指定topic的partition里面
- kafka集群接收到Producer发过来的消息后,将其持久化到硬盘,并保留消息指定时长(可配置),而不关注消息是否被消费。
- Consumer从kafka集群pull数据,并控制获取消息的offset
二. kafka设计考虑
吞吐率
High Throughput是kafka需要实现的核心目标之一,为此kafka做了以下一些设计:
- 数据磁盘持久化:消息不在内存中cache,直接写入到磁盘,充分利用磁盘的顺序读写性能
- zero-copy:减少IO操作步骤
- 数据批量发送
- 数据压缩
- Topic划分为多个partition,提高并行性
负载均衡/高可用
- producer根据用户指定的算法,将消息发送到指定的partition
- 存在多个partiiton,每个partition有自己的replica,每个replica分布在不同的Broker节点上
- 多个partition需要选取出lead partition,lead partition负责读写,并由zookeeper负责fail over
- 通过zookeeper管理broker与consumer的动态加入与离开
适合pull数据
由于kafka broker会持久化数据,broker没有内存压力,因此,consumer非常适合采取pull的方式消费数据,具有以下几点好处:
- 简化kafka设计
- consumer根据消费能力自主控制消息拉取速度
- consumer根据自身情况自主选择消费模式,例如批量,重复消费,从尾端开始消费等
横向扩展
当需要增加broker结点时,新增的broker会向zookeeper注册,而producer及consumer会根据注册在zookeeper上的watcher感知这些变化,并及时作出调整。
三. 环境搭建
启动zookeeper
准备启动一个3个broker节点的kafka集群
# server0.properties
config/server0.properties:
broker.id=0
port=9092
log.dirs=/home/zsh/tmp/kafka/kafka-logs-0
# server1.properties
config/server0.properties:
broker.id=1
port=9093
log.dirs=/home/zsh/tmp/kafka/kafka-logs-1
# server2.properties
config/server0.properties:
broker.id=2
port=9094
log.dirs=/home/zsh/tmp/kafka/kafka-logs-2
参数:
- broker.id: broker节点的唯一标识
- port: broker节点使用端口号
- logs.dir: 消息目录位置
启动三个broker
bin/kafka-server-start.sh config/server0.properties &
bin/kafka-server-start.sh config/server1.properties &
bin/kafka-server-start.sh config/server2.properties &
四. 本地测试
创建topic
#创建一个名为test的topic,只有一个副本,一个分区
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
查看topic
kafka-topics.sh -list -zookeeper localhost:2181
启动producer
# 启动producer连接某个端口9092,然后发送消息
kafka-console-producer.sh --broker-list localhost:9092 --topic test
test
hello boy
启动consumer
#启动consumer之后就可以在console中看到producer发送的消息了
kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
五. java开发
kafka服务端,不能配置为localhost,否则会连接不上;
配置参数
public interface KafkaProperties {
final static String ubuntu_ip = "192.168.188.129";
final static String zkConnect = ubuntu_ip+":2181";
final static String groupId = "group1";
final static String topic = "zsh-topic";
final static String kafkaServerURL = ubuntu_ip;
final static int kafkaServerPort = 9092;
final static int kafkaProducerBufferSize = 64 * 1024;
final static int connectionTimeOut = 20000;
final static int reconnectInterval = 10000;
final static String topic2 = "topic2";
final static String topic3 = "topic3";
final static String clientId = "SimpleConsumerDemoClient";
}
producer
public class KafkaProducer extends Thread {
private final kafka.javaapi.producer.Producer<Integer, String> producer;
private final String topic;
private final Properties props = new Properties();
public KafkaProducer(String topic) {
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("metadata.broker.list", KafkaProperties.ubuntu_ip+":9092");
producer = new kafka.javaapi.producer.Producer<Integer, String>(new ProducerConfig(props));
this.topic = topic;
}
@Override
public void run() {
int messageNo = 1;
while (true)
{
String messageStr = new String("Message_" + messageNo);
System.out.println("Send:" + messageStr);
producer.send(new KeyedMessage<Integer, String>(topic, messageStr));
messageNo++;
try {
sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) {
KafkaProducer producerThread = new KafkaProducer(KafkaProperties.topic);
producerThread.start();
}
}
consumer
public class KafkaConsumer extends Thread {
private final ConsumerConnector consumer;
private final String topic;
public KafkaConsumer(String topic) {
consumer = kafka.consumer.Consumer.createJavaConsumerConnector(
createConsumerConfig());
this.topic = topic;
}
private static ConsumerConfig createConsumerConfig() {
Properties props = new Properties();
props.put("zookeeper.connect", KafkaProperties.zkConnect);
props.put("group.id", KafkaProperties.groupId);
props.put("zookeeper.session.timeout.ms", "40000");
props.put("zookeeper.sync.time.ms", "200");
props.put("auto.commit.interval.ms", "1000");
return new ConsumerConfig(props);
}
@Override
public void run() {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, new Integer(1));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
KafkaStream<byte[], byte[]> stream = consumerMap.get(topic).get(0);
ConsumerIterator<byte[], byte[]> it = stream.iterator();
while (it.hasNext()) {
System.out.println("receive:" + new String(it.next().message()));
try {
sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) {
KafkaConsumer consumerThread = new KafkaConsumer(KafkaProperties.topic);
consumerThread.start();
}
}