4-1 -课程目录
分布式消息队列Kafka
Kafka概述 Kafka架构及核心概念 Kafka部署及使用
Kafka容错性测试 Kafka API编程 Kafka实战
4-2 -Kafka概述
和信息系统类似
信息中间者:生产者和消费者
馒头铺:生产者
你:消费者
馒头:数据流
正常情况下:生产者一个 消费一个
其他情况:
1、 一直生产,你吃到某个馒头时,你卡住了(机器故障),馒头就丢失了。
2、一直生产,做馒头速度快,你吃都来不及,馒头也就丢失了。
解决方案:
拿个碗/篮子,馒头做好以后先放到篮子里面,你要吃的时候去篮子里面取出来吃
篮子/碗:就是Kafka
当篮子满了,馒头就装不下了,咋办?
多准备几个篮子===kafka的扩容。
4-3 -Kafka架构及核心概念
Kafka架构
producer:生产者,就是生产馒头(生产者)
consumer:消费者,就是吃馒头
broker:篮子
topic:主题,给馒头带个标签,topica的馒头是给你吃的, topicb的馒头是给你弟弟吃的
First a few concepts:
- Kafka is run as a cluster on one or more servers that can span multiple datacenters.
- The Kafka cluster stores streams of records in categories called topics.
- Each record consists of a key, a value, and a timestamp.
4-4 -Kafka单节点单Broker部署之Zookeeper安装
Kafka部署及使用
单节点单Broker部署及使用
单节点多Broker部署及使用
多节点多Broker部署及使用
可以参照:http://kafka.apache.org/quickstart
Step 1: Download the code
Download the 1.1.0 release and un-tar it.
tar -xzf kafka_2.11-1.1.0.tgz
cd kafka_2.11-1.1.0
Step 2: Start the server
Kafka uses ZooKeeper so you need to first start a ZooKeeper server
安装zookeeper
tar -zxvf zookeeper-3.4.5-cdh5.7.0.tar.gz ~c app
配置zookeeper环境变量
vi ~/.bash_profile
export ZK_HOME=/home/hadoop/app/zookeeper-3.4.5-cdh5.7.0
export PHTH=$ZK_HOME/bin:$PATH
source ~/.base_profile
编辑zoo_sample.cfg
配置zookeeper目录
dataDir=/home/hadoop/app/tep/zk
启动zookeeper
bin
./zkServer.sh
4-5 -Kafka单节点单broker的部署及使用
查看官网:
http://kafka.apache.org/quickstart
步骤1、获取Kafka
wget https://archive.apache.org/dist/kafka/0.8.2.2/kafka_2.9.1-0.8.2.2.tgz
步骤2、解压kafka
tar -zxvf kafka_2.9.1-0.8.2.2.tgz -C ~/app
步骤3、配置环境变量 vi ~/.bash_profile
export KAFKA_HOME=/home/hadoop/app/kafka_2.11-0.9.0.0
export PHTH=$KAFKA_HOME/bin:$PATH
source ~/.base_profile
步骤4、配置kafka的配置信息
broker.id=0
listeners
host.name
log.dirs=/home/hadoop/app/tmp/kafka.logs
zookeeper.connect
启动kafka
bin/kafka-server-start.sh config/server.properties
kafka-server-start.sh $KAFKA_HOME/config/server.properties
步骤5、验证是否启动起来
jps-->出现 Kafka进程
步骤6、创建topic
bin/kafka-topics.sh --create --zookeeper hadoop:2181 --replication-factor 1 --partitions 1 --topic hello_topic
步骤7、查看所有的topic zk
bin/kafka-topics.sh --list --zookeeper localhost:2181
步骤8、发送消息 broker
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic hello_topic
步骤9、消费消息 zk
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic hello_topic--from-beginning
4-6 -Kafka单节点多broker部署及使用
参考:http://kafka.apache.org/quickstart
Step 6: Setting up a multi-broker cluster
First we make a config file for each of the brokers (on Windows use the copy command instead):
cp config/server.properties config/server-1.properties
cp config/server.properties config/server-2.properties
Now edit these new files and set the following properties:
config/server-1.properties:
broker.id=1
listeners=PLAINTEXT://:9093
log.dir=/tmp/kafka-logs-1
config/server-2.properties:
broker.id=2
listeners=PLAINTEXT://:9094
log.dir=/tmp/kafka-logs-2
The broker.id property is the unique and permanent name of each node in the cluster.
bin/kafka-server-start.sh config/server-1.properties & bin/kafka-server-start.sh config/server-2.properties
Now create a new topic with a replication factor of three:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
Okay but now that we have a cluster how can we know which broker is doing what? To see that run the "describe topics" command:
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
We can run the same command on the original topic we created to see where it is:
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Let's publish a few messages to our new topic:bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
...
my test message 1
my test message 2
^C
Now let's consume these messages:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
...
my test message 1
my test message 2
^C
....
4-7 -Kafka容错性测试
三个节点的,删除其中二个节点,可以选择一个节点来维持正常使用,所以容错性很好。
4-8 -使用IDEA+Maven构建开发环境
Kafka API编程
IDEA+Maven构建开发环境
Producer API使用
Consumer API使用
代码源码地址:
https://gitee.com/sag888/big_data/tree/master/Spark%20Streaming%E5%AE%9E%E6%97%B6%E6%B5%81%E5%A4%84%E7%90%86%E9%A1%B9%E7%9B%AE%E5%AE%9E%E6%88%98/project/l2118i/sparktrain
一、引入依赖
<!-- Kafka 依赖-->
<!--
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>${kafka.version}</version>
</dependency>
-->
二、创建项目目录结构:可参照项目
4-9 -Kafka Producer Java API编程
代码地址:
https://gitee.com/sag888/big_data/tree/master/Spark%20Streaming%E5%AE%9E%E6%97%B6%E6%B5%81%E5%A4%84%E7%90%86%E9%A1%B9%E7%9B%AE%E5%AE%9E%E6%88%98/project/l2118i/sparktrain/src/main/java/com/imooc/spark/kafka
源码:
第一步:配置信息
package com.imooc.spark.kafka;
/**
* Kafka常用配置文件
*/
public class KafkaProperties {
public static final String ZK = "192.168.199.111:2181";
public static final String TOPIC = "hello_topic";
public static final String BROKER_LIST = "192.168.199.111:9092";
public static final String GROUP_ID = "test_group1";
}
第二步:kafka生产者
package com.imooc.spark.kafka;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.Properties;
/**
* Kafka生产者
*/
public class KafkaProducer extends Thread{
private String topic;
private Producer<Integer, String> producer;
public KafkaProducer(String topic) {
this.topic = topic;
Properties properties = new Properties();
properties.put("metadata.broker.list",KafkaProperties.BROKER_LIST);
properties.put("serializer.class","kafka.serializer.StringEncoder");
properties.put("request.required.acks","1");
producer = new Producer<Integer, String>(new ProducerConfig(properties));
}
@Override
public void run() {
int messageNo = 1;
while(true) {
String message = "message_" + messageNo;
producer.send(new KeyedMessage<Integer, String>(topic, message));
System.out.println("Sent: " + message);
messageNo ++ ;
try{
Thread.sleep(2000);
} catch (Exception e){
e.printStackTrace();
}
}
}
}
第四步、测试类
package com.imooc.spark.kafka;
/**
* Kafka Java API测试
*/
public class KafkaClientApp {
public static void main(String[] args) {
new KafkaProducer(KafkaProperties.TOPIC).start();
}
}
第五步:客服端测试
4-9 -Kafka Producer Java API编程
代码地址:
https://gitee.com/sag888/big_data/tree/master/Spark%20Streaming%E5%AE%9E%E6%97%B6%E6%B5%81%E5%A4%84%E7%90%86%E9%A1%B9%E7%9B%AE%E5%AE%9E%E6%88%98/project/l2118i/sparktrain/src/main/java/com/imooc/spark/kafka
第一步:kafka消费者
package com.imooc.spark.kafka;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
/**
* Kafka消费者
*/
public class KafkaConsumer extends Thread{
private String topic;
public KafkaConsumer(String topic) {
this.topic = topic;
}
private ConsumerConnector createConnector(){
Properties properties = new Properties();
properties.put("zookeeper.connect", KafkaProperties.ZK);
properties.put("group.id",KafkaProperties.GROUP_ID);
return Consumer.createJavaConsumerConnector(new ConsumerConfig(properties));
}
@Override
public void run() {
ConsumerConnector consumer = createConnector();
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, 1);
// topicCountMap.put(topic2, 1);
// topicCountMap.put(topic3, 1);
// String: topic
// List<KafkaStream<byte[], byte[]>> 对应的数据流
Map<String, List<KafkaStream<byte[], byte[]>>> messageStream = consumer.createMessageStreams(topicCountMap);
KafkaStream<byte[], byte[]> stream = messageStream.get(topic).get(0); //获取我们每次接收到的数据
ConsumerIterator<byte[], byte[]> iterator = stream.iterator();
while (iterator.hasNext()) {
String message = new String(iterator.next().message());
System.out.println("rec: " + message);
}
}
}
第二步:测试
package com.imooc.spark.kafka;
/**
* Kafka Java API测试
*/
public class KafkaClientApp {
public static void main(String[] args) {
new KafkaProducer(KafkaProperties.TOPIC).start();
new KafkaConsumer(KafkaProperties.TOPIC).start();
}
}
4-11 -Kafka实战之整合Flume和Kafka完成实时数据采集