//kafka的安装配置及sparksteaming消费
# by coco
# 2015-07-06
前期准备 zookeeper在如下机器上运行
192.168.8.94
192.168.8.95
192.168.8.96
目前安装kafka集群模式:
192.168.8.98
192.168.8.97
1. 安装zookeeper集群。(略)
2. 安装kafka
wget http://apache.dataguru.cn/kafka/0.8.2.1/kafka_2.10-0.8.2.1.tgz
或者:
curl -L -O http://mirrors.cnnic.cn/apache/kafka/0.9.0.0/kafka_2.10-0.9.0.0.tgz
解压: tar -xvf kafka_2.10-0.8.1.tar -C /usr/local/
修改配置文件:vim ./config/server.properties
log.dirs=/data/kafka-logs
zookeeper.connect=192.168.8.94:2181
启动:/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties &
测试发送信息: 该信息发送的97服务器上的topic
^C[root@bogon config]# /usr/local/kafka/bin/kafka-console-producer.sh --broker-list 192.168.8.97:9092 --topic test
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
eeeee
dddddd
ttttt
测试接受端:
[root@bogon ~]# /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper 192.168.8.94:2181 --topic test --from-beginning
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
eeeee
dddddd
ttttt
表示服务正常可用。
二、高可用测试
关闭97服务器上的kafka服务,发送信息接受信息还能使用。因为与98服务器是分布式模式。发现测试的过程中如果是异常关闭服务,会丢失个别信息。cccc信息丢失。
下面是脚本每秒比低于20个消息打入kafka队列的脚本:
from gcutils.queue import KafkaMgr
import time
mgr=KafkaMgr("192.168.8.98:9092")
while 1:
mgr.send_message("test","aaaaaa")
time.sleep(0.01)
下面测试spark streaming消费kafka队列的消息:
####### 安装spark服务:
下载spark版本:
spark -> /usr/local/spark-2.0.2-bin-hadoop2.6/ 具体包在:192.168.8.98服务器上。
其中修改3个配置文件:
[root@hadoop98 conf]# cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration><property>
<name>fs.defaultFS</name>
<value> hdfs://192.168.8.94:8020</value>
</property>
</configuration>
修改与Hive结合的配置:
[root@hadoop98 conf]# cat hive-site.xml
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value> jdbc:mysql://192.168.8.94:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value> hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value> gc895316</value>
</property>
</configuration>
spark默认的配置,这里全部注释了。没有启用
[root@hadoop98 conf]# cat spark-defaults.conf
# Example:
#spark.master spark://172.17.17.105:7077
# spark.eventLog.enabled true
# spark.eventLog.dir hdfs://namenode:8021/directory
# spark.serializer org.apache.spark.serializer.KryoSerializer
#spark.driver.memory 2g
#spark.executor.cores 1
#spark.executor.memory 2g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
启动spark
[root@hadoop98 kafkatest]# /usr/local/spark/bin/pyspark --jars spark-streaming-kafka-0-8-assembly_2.11-2.0.2.jar //这里只是带jar包的启动方式,简单的 pyspark即可启动。
测试spark
[root@hadoop98 kafkatest]# /usr/local/spark/bin/pyspark --jars spark-streaming-kafka-0-8-assembly_2.11-2.0.2.jar
Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Dec 6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/04/07 12:02:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/07 12:02:52 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.0.2
/_/
Using Python version 2.7.11 (default, Dec 6 2015 18:08:32)
SparkSession available as 'spark'.
>>>
>>> from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
>>> from pyspark.streaming.kafka import KafkaUtils
>>> ssc = StreamingContext(sc, 2)
kvs = KafkaUtils.createStream(ssc, "192.168.8.94:2181", "spark-streaming-consumer", {"test": 2})
>>> kvs = KafkaUtils.createStream(ssc, "192.168.8.94:2181", "spark-streaming-consumer", {"test": 2})
count = kvs.count()
count.pprint()
ssc.start()>>> count = kvs.count()
>>> count.pprint()
>>> ssc.start()
>>> 17/04/07 12:03:23 WARN AppInfo$: Can't read Kafka version from MANIFEST.MF. Possible cause: java.lang.NullPointerException
17/04/07 12:03:23 WARN RangeAssignor: No broker partitions consumed by consumer thread spark-streaming-consumer_hadoop98-1491537803168-745cc09a-0 for topic test
17/04/07 12:03:23 WARN RangeAssignor: No broker partitions consumed by consumer thread spark-streaming-consumer_hadoop98-1491537803168-745cc09a-1 for topic test
-------------------------------------------
Time: 2017-04-07 12:03:22
-------------------------------------------
-------------------------------------------
Time: 2017-04-07 12:03:24
—————————————————————
spark streaming成功消费kafka队列中的消息。