hadoop安装部署
- 一.hadoop简介
- 二.安装hadoop
- 三.部署伪分布式hadoop
- 四.部署分布式hadoop
- 五. 部署分布式资源管理框架yarn
一.hadoop简介
HDFS是一个高度容错性的分布式文件系统,可以被广泛的部署于廉价的PC上。它以流式访问模式访问应用程序的数据,这大大提高了整个系统的数据吞吐量,因而非常适合用于具有超大数据集的应用程序中。
HDFS的架构如图所示。HDFS架构采用主从架构(master/slave)。一个典型的HDFS集群包含一个NameNode节点和多个DataNode节点。NameNode节点负责整个HDFS文件系统中的文件的元数据的保管和管理,集群中通常只有一台机器上运行NameNode实例,DataNode节点保存文件中的数据,集群中的机器分别运行一个DataNode实例。在HDFS中,NameNode节点被称为名称节点,DataNode节点被称为数据节点。DataNode节点通过心跳机制与NameNode节点进行定时的通信。
分布式存储系统HDFS (Hadoop Distributed File System )POSIX 分布式存储系统 提供了 高可靠性、高扩展性和高吞吐率的数据存储服务 分布式计算框架MapReduce 分布式计算框架(计算向数据移动) 具有 易于编程、高容错性和高扩展性等优点。 分布式资源管理框架YARN(Yet Another Resource Management) 负责集群资源的管理和调度。
二.安装hadoop
准备hadoop
软件压缩包及提供支持的jdk
压缩包
[root@server1 ~]# ls
anaconda-ks.cfg hadoop-3.2.1.tar.gz jdk-8u181-linux-x64.rpm
创建hadoop
用户,将压缩文件移动至hadoop用户家目录下
[root@server1 ~]# useradd hadoop
[root@server1 ~]# mv * /home/hadoop/
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ls
anaconda-ks.cfg hadoop-3.2.1.tar.gz jdk-8u181-linux-x64.rpm
解压压缩文件,创建java hadoop
软连接
[hadoop@server1 ~]$ ls
anaconda-ks.cfg hadoop-3.2.1.tar.gz jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ls
anaconda-ks.cfg hadoop-3.2.1.tar.gz jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s jdk1.8.0_181/ java
[hadoop@server1 ~]$ ls
anaconda-ks.cfg java jdk-8u181-linux-x64.tar.gz
hadoop-3.2.1.tar.gz jdk1.8.0_181
[hadoop@server1 ~]$ tar zxf hadoop-3.2.1.tar.gz
[hadoop@server1 ~]$ ls
anaconda-ks.cfg hadoop-3.2.1.tar.gz jdk1.8.0_181
hadoop-3.2.1 java jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s hadoop-3.2.1 hadoop
[hadoop@server1 ~]$ ls
anaconda-ks.cfg hadoop-3.2.1 java jdk-8u181-linux-x64.tar.gz
hadoop hadoop-3.2.1.tar.gz jdk1.8.0_181
进入hadoop目录,配置hadoop环境,指定hadoop及java位置
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ cd etc/hadoop/
[hadoop@server1 hadoop]$ vim hadoop-env.sh
创建input
目录,拷贝etc/haadoop/*.xml
文件
执行测试命令,使用hadoop-mapreduce-example
示例文件
测试命令,抓取input
目录中含dfs
的字段,并保存到output
目录下
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
查看output
内容
三.部署伪分布式hadoop
指定worker localhost
[hadoop@server1 hadoop]$ cat workers
localhost
修改core-site.xml
,设置localhost:9000
访问hdfs
[hadoop@server1 hadoop]$ cd /hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim core-site.xml
修改hdfs-site.xml
,设置分布节点数为1
[hadoop@server1 hadoop]$ vim hdfs-site.xml
生成ssh密钥
[hadoop@server1 hadoop]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:ROXTsvFYZX+cgN+n5imW+Jtc75WXm0FIjDPZCuaZGkU hadoop@server1
The key's randomart image is:
+---[RSA 2048]----+
| .E. ..o |
| ... o=o.o.|
| .+==++..+|
| .+ +O=...o|
| .S+o... o.|
| o + o|
| . . +.=o|
| ..+oo.*|
| o=o +o|
+----[SHA256]-----+
设置hadoop密码为westos
[root@server1 ~]# passwd hadoop
Changing password for user hadoop.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
设置localhost免密登陆
[hadoop@server1 ~]$ ssh-copy-id loaclhost
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: ERROR: ssh: Could not resolve hostname loaclhost: Name or service not known
[hadoop@server1 ~]$ ssh-copy-id localhost
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:Qwz6cDDE7GvLYqOWEwNiW4Wf8PBLrLVAYYuHmU8d9Ds.
ECDSA key fingerprint is MD5:f7:84:ee:41:4e:97:1b:f3:28:d7:f5:63:71:d0:6b:06.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@localhost's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'localhost'"
and check to make sure that only the key(s) you wanted were added.
测试免密登陆
[hadoop@server1 ~]$ ssh localhost
Last login: Sat Aug 14 23:13:02 2021
[hadoop@server1 ~]$ logout
Connection to localhost closed.
hadoop初始化
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
WARNING: /home/hadoop/hadoop/logs does not exist. Creating.
2021-08-14 23:14:41,913 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = server1/172.25.3.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.1
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at server1/172.25.3.1
************************************************************/
执行hdfs
启动脚本
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [server1]
server1: Warning: Permanently added 'server1,172.25.3.1' (ECDSA) to the list of known hosts.
将jps
命令放入全局变量,执行jps查看hadoop部署信息
[hadoop@server1 ~]$ vim .bash_profile
[hadoop@server1 ~]$ source .bash_profile
访问172.25.3.1:9870
可以查看到节点信息
节点信息
测试,创建 分布式用户目录/user/hadoop
,导入input
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ id
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input/
2021-08-14 23:24:16,978 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-08-14 23:24:17,575 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-08-14 23:24:17,599 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-08-14 23:24:18,029 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-08-14 23:24:18,055 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-08-14 23:24:18,501 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-08-14 23:24:18,931 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-08-14 23:24:18,959 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-08-14 23:24:18,988 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
查看导入目录
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls input/
Found 9 items
-rw-r--r-- 1 hadoop supergroup 8260 2021-08-14 23:24 input/capacity-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 774 2021-08-14 23:24 input/core-site.xml
-rw-r--r-- 1 hadoop supergroup 11392 2021-08-14 23:24 input/hadoop-policy.xml
-rw-r--r-- 1 hadoop supergroup 775 2021-08-14 23:24 input/hdfs-site.xml
-rw-r--r-- 1 hadoop supergroup 620 2021-08-14 23:24 input/httpfs-site.xml
-rw-r--r-- 1 hadoop supergroup 3518 2021-08-14 23:24 input/kms-acls.xml
-rw-r--r-- 1 hadoop supergroup 682 2021-08-14 23:24 input/kms-site.xml
-rw-r--r-- 1 hadoop supergroup 758 2021-08-14 23:24 input/mapred-site.xml
-rw-r--r-- 1 hadoop supergroup 690 2021-08-14 23:24 input/yarn-site.xml
测试命令
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls output
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2021-08-14 23:29 output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 9351 2021-08-14 23:29 output/part-r-00000
四.部署分布式hadoop
执行脚本,停止hdfs !!!!!
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh
server1/2/3
安装nfs
[hadoop@server1 hadoop]$ yum install -y nfs-utils
server2/3创建hadoop用户
server1
配置网络共享 /home/hadoop
[root@server1 ~]# cat /etc/exports
/home/hadoop *(rw,anonuid=1000,anongid=1000)
[root@server1 ~]# vim /etc/exports
[root@server1 ~]# systemctl enable --now nfs
Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.
[root@server1 ~]# showmount -e
Export list for server1:
/home/hadoop *
挂载172.25.3.1:/home/hadoop/
到hadpood用户家目录下
[root@server5 ~]# mount 172.25.3.1:/home/hadoop/ /home/hadoop/
server2/3
挂载nfs
后,主机可以免密登陆,测试免密
[hadoop@server1 hadoop]$ ssh server2
The authenticity of host 'server2 (172.25.3.2)' can't be established.
ECDSA key fingerprint is SHA256:Qwz6cDDE7GvLYqOWEwNiW4Wf8PBLrLVAYYuHmU8d9Ds.
ECDSA key fingerprint is MD5:f7:84:ee:41:4e:97:1b:f3:28:d7:f5:63:71:d0:6b:06.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'server2,172.25.3.2' (ECDSA) to the list of known hosts.
Last login: Sat Aug 14 23:58:51 2021
[hadoop@server2 ~]$ logout
Connection to server2 closed.
[hadoop@server1 hadoop]$ ssh server3
The authenticity of host 'server3 (172.25.3.3)' can't be established.
ECDSA key fingerprint is SHA256:Qwz6cDDE7GvLYqOWEwNiW4Wf8PBLrLVAYYuHmU8d9Ds.
ECDSA key fingerprint is MD5:f7:84:ee:41:4e:97:1b:f3:28:d7:f5:63:71:d0:6b:06.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'server3,172.25.3.3' (ECDSA) to the list of known hosts.
Last login: Sat Aug 14 23:59:08 2021
[hadoop@server3 ~]$ logout
Connection to server3 closed.
修改hadoop配置文件
hdfs主机 172.25.3.1:9000
[hadoop@server1 hadoop]$ vim core-site.xml
分布式节点数设置为2
[hadoop@server1 hadoop]$ vim hdfs-site.xml
worker指定server2/3
[hadoop@server1 hadoop]$ cat etc/hadoop/workers
server2
server3
配置完毕,执行hadoop初始化
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
2021-08-15 00:01:51,404 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = server1/172.25.3.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.1
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at server1/172.25.3.1
************************************************************/
执行hadoop启动脚本
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [server1]
Starting datanodes
Starting secondary namenodes [server1]
查看server1信息 NameNode
[hadoop@server1 hadoop]$ jps
5155 SecondaryNameNode
4959 NameNode
5295 Jps
查看server2/3信息 DataNode
[hadoop@server2 ~]$ jps
3856 Jps
3793 DataNode
firefox浏览器查看节点信息
节点扩容,将server4加入节点
创建hadoop用户,安装nfs
[root@server4 ~]# useradd hadoop
[root@server4 ~]# id had
id: had: no such user
[root@server4 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
[root@server4 ~]# yum install -y nfs-utils.x86_64
挂载nfs
[root@server4 ~]# mount 172.25.3.1:/home/hadoop/ /home/hadoop/
[root@server4 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/rhel-root 28289540 1162220 27127320 5% /
devtmpfs 495424 0 495424 0% /dev
tmpfs 507448 0 507448 0% /dev/shm
tmpfs 507448 6968 500480 2% /run
tmpfs 507448 0 507448 0% /sys/fs/cgroup
/dev/vda1 1038336 135088 903248 14% /boot
tmpfs 101492 0 101492 0% /run/user/0
172.25.3.1:/home/hadoop 28289792 2997248 25292544 11% /home/hadoop
server1
内配置worker
文件,加入server4
节点
[hadoop@server1 hadoop]$ cat etc/hadoop/workers
server2
server3
server4
server4内执行节点添加命令
[hadoop@server4 hadoop]$ bin/hdfs --daemon start datanode
[hadoop@server4 hadoop]$ jps
4220 Jps
4190 DataNode
server4成功添加
五. 部署分布式资源管理框架yarn
修改yarn-site.xml
,添加yarn模块内容
[hadoop@server1 hadoop]$ vim etc/hadoop/yarn-site.xml
查看mapred-site.xml
[hadoop@server1 hadoop]$ vim etc/hadoop/mapred-site.xml
添加环境变量
[hadoop@server1 hadoop]$ vim etc/hadoop/hadoop-env.sh
执行yarn启动脚本
[hadoop@server1 hadoop]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
server4: Warning: Permanently added 'server4,172.25.3.4' (ECDSA) to the list of known hosts.
查看server1
信息 Resourcemanager
[hadoop@server1 hadoop]$ jps
15667 Jps
4695 SecondaryNameNode
4506 NameNode
15563 ResourceManager
查看server2/3/4
信息 nodemanager
[hadoop@server3 ~]$ jps
4323 NodeManager
4595 Jps
访问 172.25.3.1:8088
,查看yarn管理页面