CDH 升级 spark cdh安装spark教程

转载

是大魔术师 2024-03-12 22:08:13

文章标签 CDH 升级 spark kafka cloudera hadoop 文章分类 Spark 大数据

第一章：CDH添加kafka服务

1.1 在集群中add service

第二章：Spark2部署

第一章：CDH添加Kafka服务

添加kafka服务的时候会跳出来一句话：

Before adding this service， ensure that either the kafka parcel is activated or the kafka package is installed.

那我们去哪儿确认呢，回到主页，单击Hosts --> Parcels，远程download安装取决于网络环境：

CDK是cloudera公司对kafka版本的叫法。

1、点击添加服务：add service

CDH 升级 spark cdh安装spark教程_CDH 升级 spark

2、添加kafka服务的时候会提示：Before adding this service, ensure that either the Kafka parcel is activated or the Kafka package is installed.

CDH 升级 spark cdh安装spark教程_kafka_02

3、如何去确认：

我们单击右上角的Configurations：

这个地址能让我们知道哪些地址可以下kafka包裹文件

CDH 升级 spark cdh安装spark教程_hadoop_03

这些是远程安装仓库地址：受限于当前集群与网络环境，如果网路环境强大，可以远程；但是大多数公司都是离线部署的
有些地址我们不需要，但是这些地址能让我们知道哪些地方能下载包裹文件。

CDH 升级 spark cdh安装spark教程_hadoop_04

单击check for new检查网络环境：

CDH 升级 spark cdh安装spark教程_cloudera_05

此时我们去到cdh的kafka官网：https://docs.cloudera.com/documentation/index.html
单击Apache Kafka

CDH 升级 spark cdh安装spark教程_kafka_06

CentOS是7.X版本，cdhshi5.16.1，cdh6.0版本我们没有使用，J总最近一到两年不建议使用；CDH6.X版本有多少bug需要各位去踩坑。
cloudera公司给kafka取的名字的版本。

CDH 升级 spark cdh安装spark教程_kafka_07

cloudera公司把Apache Kafka拿过来，打的一些不定，Apache Kafka的版本是2.1.0，kafka4.0指的就是cloudera公司打的版本号；17是打的补丁号。

CDH 升级 spark cdh安装spark教程_CDH 升级 spark_08

进入到如下网址下载包裹文件：http://archive.cloudera.com/kafka/parcels/4.0.0/
我们要下载三个文件：KAFKA-4.0.0-1.4.0.0.p0.1-el7.parcel、KAFKA-4.0.0-1.4.0.0.p0.1-el7.parcel.sha1、manifest.json，下载好后上传到云主机上。

[root@hadoop001 kafka_parcels]# ll
total 83904
-rw-r--r-- 1 root root 85897902 Oct 22 22:53 KAFKA-4.0.0-1.4.0.0.p0.1-el7.parcel
-rw-r--r-- 1 root root       41 Oct 22 22:52 KAFKA-4.0.0-1.4.0.0.p0.1-el7.parcel.sha
-rw-r--r-- 1 root root     5212 Oct 22 22:56 manifest.json

2、选择install and upgrade：https://docs.cloudera.com/documentation/kafka/4-0-x/topics/kafka_install.html

单机这个链接

CDH 升级 spark cdh安装spark教程_cloudera_09

进入到如下网址：官方步骤
https://docs.cloudera.com/documentation/kafka/4-0-x/topics/kafka_installing.html#concept_ngx_4l4_4r
还缺少一步，配置离线源：yum install httpd

[root@hadoop001 kafka_parcels]# cd /var/www/html
[root@hadoop001 html]# ll
total 0
[root@hadoop001 html]# service httpd start
Redirecting to /bin/systemctl start  httpd.service
[root@hadoop001 html]# ps -ef|grep httpd
root     16306     1  0 23:02 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache   16307 16306  0 23:02 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache   16308 16306  0 23:02 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache   16309 16306  0 23:02 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache   16310 16306  0 23:02 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
apache   16311 16306  0 23:02 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
root     16323 15264  0 23:02 pts/1    00:00:00 grep --color=auto httpd
[root@hadoop001 html]# netstat -nlp|grep 16306
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      16306/httpd

注意：阿里云安全组规则配置中放开80端口

使用：hadoop001的外网IP+80端口访问：说明httpd服务已经安装成功

CDH 升级 spark cdh安装spark教程_hadoop_10

第二步：将下载好的kafka离线源移动到当前文件夹下，目的：构造和官网一样的路径：
[root@hadoop001 html]# mv /root/kafka_parcels /var/www/html/
思考问题：部署的时候建议使用主机名（内网IP）访问
拷贝我们的仓库链接地址：我们通过的是http服务

CDH 升级 spark cdh安装spark教程_cloudera_11

download分布式，activiate激活；激活之后提示重启集群，如果并不需要重启集群，那就关闭忽略提示。

CDH 升级 spark cdh安装spark教程_hadoop_12

1、步骤：download --> activiate（需要手动的），激活好了之后，状态也是distributed、activiate

CDH 升级 spark cdh安装spark教程_hadoop_13

1.1 在集群中add service

1、add service

CDH 升级 spark cdh安装spark教程_hadoop_14

2、选中kafka服务，下一步，kafka brokers三台机器都要选中

CDH 升级 spark cdh安装spark教程_cloudera_15

CDH 升级 spark cdh安装spark教程_cloudera_16

Kafka启动过程中正常是没有问题的，如果有问题：可能是由于内存小的原因。

回到主界面，尝试手工启动。

报错（猜测是因为内存的原因，第一台机器的角色太多了）：

注意：

日志打的是starting，但是jps查看命令查看不到进程了，是因为linux的oom进程把它杀掉了的原因。

[root@hadoop001 kafka]# pwd
/var/log/kafka
[root@hadoop001 kafka]# ll
total 52
-rw-r--r-- 1 kafka kafka 45805 Oct 22 23:33 kafka-broker-hadoop001.log
drwxr-xr-x 2 kafka kafka  4096 Oct 22 23:33 stacks

CDH 升级 spark cdh安装spark教程_kafka_17

修改Java Heap Memory（java堆栈内存）
我们使用的是离线包裹源部署的文件：

[root@hadoop001 kafka]# cd /opt/cloudera/parcels
[root@hadoop001 parcels]# pwd
/opt/cloudera/parcels
[root@hadoop001 parcels]# ll
total 8
lrwxrwxrwx  1 root root   27 Oct 22 21:44 CDH -> CDH-5.16.1-1.cdh5.16.1.p0.3
drwxr-xr-x 11 root root 4096 Nov 22  2018 CDH-5.16.1-1.cdh5.16.1.p0.3
lrwxrwxrwx  1 root root   24 Oct 22 23:19 KAFKA -> KAFKA-4.0.0-1.4.0.0.p0.1
drwxr-xr-x  6 root root 4096 Apr  3  2019 KAFKA-4.0.0-1.4.0.0.p0.1
[root@hadoop001 parcels]# cd KAFKA
[root@hadoop001 KAFKA]# ll
total 16
drwxr-xr-x 2 root root 4096 Apr  3  2019 bin
drwxr-xr-x 5 root root 4096 Apr  3  2019 etc
drwxr-xr-x 3 root root 4096 Apr  3  2019 lib
drwxr-xr-x 2 root root 4096 Apr  3  2019 meta

[root@hadoop001 KAFKA]# cd lib
[root@hadoop001 lib]# ll
total 4
drwxr-xr-x 6 root root 4096 Apr  3  2019 kafka
[root@hadoop001 lib]# cd kafka/
[root@hadoop001 kafka]# ll
total 60
drwxr-xr-x 2 root root  4096 Apr  3  2019 bin
drwxr-xr-x 2 root root  4096 Apr  3  2019 cloudera
lrwxrwxrwx 1 root root    15 Apr  3  2019 config -> /etc/kafka/conf
drwxr-xr-x 2 root root 12288 Apr  3  2019 libs
-rwxr-xr-x 1 root root 32216 Apr  3  2019 LICENSE
-rwxr-xr-x 1 root root   336 Apr  3  2019 NOTICE
drwxr-xr-x 2 root root  4096 Apr  3  2019 site-docs
[root@hadoop001 kafka]# ll cloudera/
total 4
-rwxr-xr-x 1 root root 532 Apr  3  2019 cdh_version.properties
[root@hadoop001 kafka]# pwd
/opt/cloudera/parcels/KAFKA/lib/kafka

第二章：Spark2部署

https://docs.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html

1、检查软件环境是否满足要求：

Check that all the software prerequisites are satisfied. If not, you might need to upgrade or install other software components first. See CDS Powered by Apache Spark Requirements for details.

2.1 下载jar包和对应的包裹文件

1、下载jar包和对应的包裹文件：http://archive.cloudera.com/spark2/parcels/2.4.0.cloudera2/

Install the CDS Powered by Apache Spark service descriptor into Cloudera Manager.

CDH 升级 spark cdh安装spark教程_hadoop_18

第一步下载：

To download the CDS Powered by Apache Spark service description, in the Version information table in CDS Version Available for Download, click the service description link for the version you want to install.

1、所在目录：
[root@hadoop001 spark2_parcels]# pwd
/root/spark2_parcels

2、对sha1文件重命名，代表手工下载已经完成
[root@hadoop001 spark2_parcels]# mv SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel.sha1 SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel.sha

3、上传目录结构如下：
[root@hadoop001 spark2_parcels]# ll
total 194296
-rw-r--r-- 1 root root      5212 Oct 22 22:56 manifest.json
-rw-r--r-- 1 root root 198924405 Oct 23 00:01 SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel
-rw-r--r-- 1 root root        41 Oct 23 00:00 SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel.sha
-rw-r--r-- 1 root root     19066 Oct 23 00:01 SPARK2_ON_YARN-2.4.0.cloudera2.jar

第二步：

2、 Log on to the Cloudera Manager Server host, and copy the CDS Powered by Apache Spark service descriptior in the location configured for service descriptor files.

登录到CM server的机器，拷贝SPARK2_ON_YARN-2.4.0.cloudera2.jar包到路径/opt/cloudera/csd；这个路径没有的话使用mkdir -p创建。

有一个location configured：

location configured的步骤如下：

1、Select Administration > Settings.

CDH 升级 spark cdh安装spark教程_kafka_19

2、Click the Custom Service Descriptors category.

linux上没有这个目录的话自己创建目录：mkdir -p /opt/cloudera/csd

3、Edit the Local Descriptor Repository Path property.

4、Enter a Reason for change, and then click Save Changes to commit the changes.

5、Restart Cloudera Manager Server:

第三步：

3、Set the file ownership of the service descriptor to cloudera-scm:cloudera-scm with permission 644.

设置文件的用户和用户组是cloudera-scm:cloudera-scm，并且设置文件权限是644.

[root@hadoop001 csd]# chown -R cloudera-scm:cloudera-scm SPARK2_ON_YARN-2.4.0.cloudera2.jar

[root@hadoop001 csd]# chmod 644 SPARK2_ON_YARN-2.4.0.cloudera2.jar

第四步：

4、restart the Cloudera Manager Server with the following command：

systemctl restart cloudera-scm-server

我们使用如下方式重启，经验值等待一分钟：

[root@hadoop001 init.d]# ./cloudera-scm-server restart
Stopping cloudera-scm-server:                              [  OK  ]
Starting cloudera-scm-server:                              [  OK  ]
[root@hadoop001 init.d]# pwd
/opt/cloudera-manager/cm-5.16.1/etc/init.d

第五步：

in the Cloudera Manager Admin Console, add the CDS Powered by Apache Spark parcel repository to the Remote Parcel Repository URLs in Parcel Settings as described in Parcel Cofiguration Settings.

CDH管理员控制台去添加spark2仓库，因为我们事先已经把包裹文件下载好了，移动到http服务下面就可以了；

[root@hadoop001 ~]# mv spark2_parcels /var/www/html/

移动进去之后使用外网IP/spark2_parcels，就能够进行访问：

http://47.102.150.69/spark2_parcels/
注意：hadoop001机器上的httpd服务需要开启。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：linux nginx 部署flask项目的详细步骤 nginx uwsgi flask

下一篇：arm架构centos安装 arm架构如何安装centos

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯