第一章:CDH添加kafka服务
- 1.1 在集群中add service
第二章:Spark2部署
第一章:CDH添加Kafka服务
添加kafka服务的时候会跳出来一句话:
- Before adding this service, ensure that either the kafka parcel is activated or the kafka package is installed.
那我们去哪儿确认呢,回到主页,单击Hosts --> Parcels,远程download安装取决于网络环境:
- CDK是cloudera公司对kafka版本的叫法。
1、点击添加服务:add service
2、添加kafka服务的时候会提示:Before adding this service, ensure that either the Kafka parcel is activated or the Kafka package is installed.
3、如何去确认:
我们单击右上角的Configurations:
- 这个地址能让我们知道哪些地址可以下kafka包裹文件
- 这些是远程安装仓库地址:受限于当前集群与网络环境,如果网路环境强大,可以远程;但是大多数公司都是离线部署的
- 有些地址我们不需要,但是这些地址能让我们知道哪些地方能下载包裹文件。
- 单击check for new检查网络环境:
- 此时我们去到cdh的kafka官网:https://docs.cloudera.com/documentation/index.html
- 单击Apache Kafka
- CentOS是7.X版本,cdhshi5.16.1,cdh6.0版本我们没有使用,J总最近一到两年不建议使用;CDH6.X版本有多少bug需要各位去踩坑。
- cloudera公司给kafka取的名字的版本。
- cloudera公司把Apache Kafka拿过来,打的一些不定,Apache Kafka的版本是2.1.0,kafka4.0指的就是cloudera公司打的版本号;17是打的补丁号。
- 进入到如下网址下载包裹文件:http://archive.cloudera.com/kafka/parcels/4.0.0/
- 我们要下载三个文件:KAFKA-4.0.0-1.4.0.0.p0.1-el7.parcel、KAFKA-4.0.0-1.4.0.0.p0.1-el7.parcel.sha1、manifest.json,下载好后上传到云主机上。
[root@hadoop001 kafka_parcels]# ll
total 83904
-rw-r--r-- 1 root root 85897902 Oct 22 22:53 KAFKA-4.0.0-1.4.0.0.p0.1-el7.parcel
-rw-r--r-- 1 root root 41 Oct 22 22:52 KAFKA-4.0.0-1.4.0.0.p0.1-el7.parcel.sha
-rw-r--r-- 1 root root 5212 Oct 22 22:56 manifest.json
2、选择install and upgrade:https://docs.cloudera.com/documentation/kafka/4-0-x/topics/kafka_install.html
- 单机这个链接
- 进入到如下网址:官方步骤
https://docs.cloudera.com/documentation/kafka/4-0-x/topics/kafka_installing.html#concept_ngx_4l4_4r - 还缺少一步,配置离线源:yum install httpd
[root@hadoop001 kafka_parcels]# cd /var/www/html
[root@hadoop001 html]# ll
total 0
[root@hadoop001 html]# service httpd start
Redirecting to /bin/systemctl start httpd.service
[root@hadoop001 html]# ps -ef|grep httpd
root 16306 1 0 23:02 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 16307 16306 0 23:02 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 16308 16306 0 23:02 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 16309 16306 0 23:02 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 16310 16306 0 23:02 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 16311 16306 0 23:02 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
root 16323 15264 0 23:02 pts/1 00:00:00 grep --color=auto httpd
[root@hadoop001 html]# netstat -nlp|grep 16306
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 16306/httpd
注意:阿里云安全组规则配置中放开80端口
- 使用:hadoop001的外网IP+80端口访问:说明httpd服务已经安装成功
- 第二步:将下载好的kafka离线源移动到当前文件夹下,目的:构造和官网一样的路径:
- [root@hadoop001 html]# mv /root/kafka_parcels /var/www/html/
思考问题:部署的时候建议使用主机名(内网IP)访问 - 拷贝我们的仓库链接地址:我们通过的是http服务
- download分布式,activiate激活;激活之后提示重启集群,如果并不需要重启集群,那就关闭忽略提示。
- 1、步骤:download --> activiate(需要手动的),激活好了之后,状态也是distributed、activiate
1.1 在集群中add service
1、add service
2、选中kafka服务,下一步,kafka brokers三台机器都要选中
Kafka启动过程中正常是没有问题的,如果有问题:可能是由于内存小的原因。
- 回到主界面,尝试手工启动。
报错(猜测是因为内存的原因,第一台机器的角色太多了):
注意:
- 日志打的是starting,但是jps查看命令查看不到进程了,是因为linux的oom进程把它杀掉了的原因。
[root@hadoop001 kafka]# pwd
/var/log/kafka
[root@hadoop001 kafka]# ll
total 52
-rw-r--r-- 1 kafka kafka 45805 Oct 22 23:33 kafka-broker-hadoop001.log
drwxr-xr-x 2 kafka kafka 4096 Oct 22 23:33 stacks
- 修改Java Heap Memory(java堆栈内存)
- 我们使用的是离线包裹源部署的文件:
[root@hadoop001 kafka]# cd /opt/cloudera/parcels
[root@hadoop001 parcels]# pwd
/opt/cloudera/parcels
[root@hadoop001 parcels]# ll
total 8
lrwxrwxrwx 1 root root 27 Oct 22 21:44 CDH -> CDH-5.16.1-1.cdh5.16.1.p0.3
drwxr-xr-x 11 root root 4096 Nov 22 2018 CDH-5.16.1-1.cdh5.16.1.p0.3
lrwxrwxrwx 1 root root 24 Oct 22 23:19 KAFKA -> KAFKA-4.0.0-1.4.0.0.p0.1
drwxr-xr-x 6 root root 4096 Apr 3 2019 KAFKA-4.0.0-1.4.0.0.p0.1
[root@hadoop001 parcels]# cd KAFKA
[root@hadoop001 KAFKA]# ll
total 16
drwxr-xr-x 2 root root 4096 Apr 3 2019 bin
drwxr-xr-x 5 root root 4096 Apr 3 2019 etc
drwxr-xr-x 3 root root 4096 Apr 3 2019 lib
drwxr-xr-x 2 root root 4096 Apr 3 2019 meta
- Kafka部署的家目录:
[root@hadoop001 KAFKA]# cd lib
[root@hadoop001 lib]# ll
total 4
drwxr-xr-x 6 root root 4096 Apr 3 2019 kafka
[root@hadoop001 lib]# cd kafka/
[root@hadoop001 kafka]# ll
total 60
drwxr-xr-x 2 root root 4096 Apr 3 2019 bin
drwxr-xr-x 2 root root 4096 Apr 3 2019 cloudera
lrwxrwxrwx 1 root root 15 Apr 3 2019 config -> /etc/kafka/conf
drwxr-xr-x 2 root root 12288 Apr 3 2019 libs
-rwxr-xr-x 1 root root 32216 Apr 3 2019 LICENSE
-rwxr-xr-x 1 root root 336 Apr 3 2019 NOTICE
drwxr-xr-x 2 root root 4096 Apr 3 2019 site-docs
[root@hadoop001 kafka]# ll cloudera/
total 4
-rwxr-xr-x 1 root root 532 Apr 3 2019 cdh_version.properties
[root@hadoop001 kafka]# pwd
/opt/cloudera/parcels/KAFKA/lib/kafka
第二章:Spark2部署
https://docs.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
1、检查软件环境是否满足要求:
- Check that all the software prerequisites are satisfied. If not, you might need to upgrade or install other software components first. See CDS Powered by Apache Spark Requirements for details.
2.1 下载jar包和对应的包裹文件
1、下载jar包和对应的包裹文件:http://archive.cloudera.com/spark2/parcels/2.4.0.cloudera2/
- Install the CDS Powered by Apache Spark service descriptor into Cloudera Manager.
第一步下载:
- To download the CDS Powered by Apache Spark service description, in the Version information table in CDS Version Available for Download, click the service description link for the version you want to install.
1、所在目录:
[root@hadoop001 spark2_parcels]# pwd
/root/spark2_parcels
2、对sha1文件重命名,代表手工下载已经完成
[root@hadoop001 spark2_parcels]# mv SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel.sha1 SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel.sha
3、上传目录结构如下:
[root@hadoop001 spark2_parcels]# ll
total 194296
-rw-r--r-- 1 root root 5212 Oct 22 22:56 manifest.json
-rw-r--r-- 1 root root 198924405 Oct 23 00:01 SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel
-rw-r--r-- 1 root root 41 Oct 23 00:00 SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel.sha
-rw-r--r-- 1 root root 19066 Oct 23 00:01 SPARK2_ON_YARN-2.4.0.cloudera2.jar
第二步:
2、 Log on to the Cloudera Manager Server host, and copy the CDS Powered by Apache Spark service descriptior in the location configured for service descriptor files.
- 登录到CM server的机器,拷贝SPARK2_ON_YARN-2.4.0.cloudera2.jar包到路径/opt/cloudera/csd;这个路径没有的话使用mkdir -p创建。
location configured的步骤如下:
1、Select Administration > Settings.
2、Click the Custom Service Descriptors category.
- linux上没有这个目录的话自己创建目录:mkdir -p /opt/cloudera/csd
3、Edit the Local Descriptor Repository Path property.
4、Enter a Reason for change, and then click Save Changes to commit the changes.
5、Restart Cloudera Manager Server:
第三步:
3、Set the file ownership of the service descriptor to cloudera-scm:cloudera-scm with permission 644.
- 设置文件的用户和用户组是cloudera-scm:cloudera-scm,并且设置文件权限是644.
[root@hadoop001 csd]# chown -R cloudera-scm:cloudera-scm SPARK2_ON_YARN-2.4.0.cloudera2.jar
[root@hadoop001 csd]# chmod 644 SPARK2_ON_YARN-2.4.0.cloudera2.jar
第四步:
4、restart the Cloudera Manager Server with the following command:
- systemctl restart cloudera-scm-server
我们使用如下方式重启,经验值等待一分钟:
[root@hadoop001 init.d]# ./cloudera-scm-server restart
Stopping cloudera-scm-server: [ OK ]
Starting cloudera-scm-server: [ OK ]
[root@hadoop001 init.d]# pwd
/opt/cloudera-manager/cm-5.16.1/etc/init.d
第五步:
in the Cloudera Manager Admin Console, add the CDS Powered by Apache Spark parcel repository to the Remote Parcel Repository URLs in Parcel Settings as described in Parcel Cofiguration Settings.
- CDH管理员控制台去添加spark2仓库,因为我们事先已经把包裹文件下载好了,移动到http服务下面就可以了;
[root@hadoop001 ~]# mv spark2_parcels /var/www/html/
移动进去之后使用外网IP/spark2_parcels,就能够进行访问:
- http://47.102.150.69/spark2_parcels/
注意:hadoop001机器上的httpd服务需要开启。