1.Linux环境准备

  1.1 关闭防火墙(三台虚拟机均执行)

firewall-cmd --state   #查看防火墙状态
 
systemctl start firewalld.service   #开启防火墙
 
systemctl stop firewalld.service     #关闭防火墙
 
systemctl disable firewalld.service  #禁止开机启动防火墙

配置静态IP地址(三台虚拟机均执行)

附:虚拟机Linux配置静态ip

[root@node01 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33

完整内容: 

TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="a5a7540d-fafb-47c8-bd59-70f1f349462e"
DEVICE="ens33"
ONBOOT="yes"

IPADDR="192.168.24.137"
GATEWAY="192.168.24.2"
NETMASK="255.255.255.0"
DNS1="8.8.8.8"

注:

     这里ONBOOT设置成yes,BOOTPROTO改为static,由自动分配改成静态ip,然后就是配置静态ip、网关、子网掩码、DNS.其它内容,三台虚拟机一致,IPADDR由137-139依次分配。

问题:

     刚开始我的网关设置的是192.168.24.1.结果重启虚拟机和重启网卡都没有用,还是一样ping 不通8.8.8.8,也ping不通百度。

解决:

   编辑-虚拟网络编辑器,进入界面。选择自己的虚拟网卡,你就看子网地址跟你设置的ip是不是同一个网段。然后点击NAT设置。

hdfs丢失块报警清除_hdfs丢失块报警清除

hdfs丢失块报警清除_CentOS_02

 可以看到子网掩码是255.255.255.0,网关ip是192.168.24.2,而不是我刚开始认为的192.168.24.1。于是我重新对/etc/sysconfig/network-scripts/ifcfg-ens33进行编辑,将网关地址改成192.168.24.2,然后重启网卡。

[root@node01 yum.repos.d]# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.24.2    0.0.0.0         UG        0 0          0 ens33
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 ens33
192.168.24.0    0.0.0.0         255.255.255.0   U         0 0          0 ens33
[root@node01 yum.repos.d]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
[root@node01 yum.repos.d]# systemctl restart network
[root@node01 yum.repos.d]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=128 time=32.3 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=128 time=32.9 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=128 time=31.7 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=128 time=31.7 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=128 time=31.7 ms

  1.3 修改hostname(三台虚拟机均修改)

注:使用命令编辑:vim /ect/sysconfig/network修改内容:HOSTNAME=node01,在Linux Centos7.5中好像并不适用。因此,在这里我使用命令:hostnamectl set-hostname node01或者直接使用vim /etc/hostname来修改。

修改完毕,需要重启方可生效。可以使用命令reboot.

  1.4 设置ip和域名映射(三台虚拟机均修改.新增部分)

 注:使用命令编辑:vim /etc/hosts

192.168.24.137 node01 node01.hadoop.com
192.168.24.138 node02 node02.hadoop.com
192.168.24.139 node03 node03.hadoop.com

  1.5 三台机器机器免密码登录(三台虚拟机均修改)

为什么要免密登录
  - Hadoop 节点众多, 所以一般在主节点启动从节点, 这个时候就需要程序自动在主节点登录到从节点中, 如果不能免密就每次都要输入密码, 非常麻烦
- 免密 SSH 登录的原理
  1. 需要先在 B节点 配置 A节点 的公钥
  2. A节点 请求 B节点 要求登录
  3. B节点 使用 A节点 的公钥, 加密一段随机文本
  4. A节点 使用私钥解密, 并发回给 B节点
  5. B节点 验证文本是否正确

 第一步:三台机器生成公钥与私钥

在三台机器执行以下命令,生成公钥与私钥。命令如下:ssh-keygen -t rsa

hdfs丢失块报警清除_zookeeper_03

 第二步:拷贝公钥到同一台机器

   将三台机器将拷贝公钥到第一台机器,三台机器执行命令:

ssh-copy-id node01

[root@node02 ~]# ssh-copy-id node01
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'node01 (192.168.24.137)' can't be established.
ECDSA key fingerprint is SHA256:GzI3JXtwr1thv7B0pdcvYQSpd98Nj1PkjHnvABgHFKI.
ECDSA key fingerprint is MD5:00:00:7b:46:99:5e:ff:f2:54:84:19:25:2c:63:0a:9e.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node01's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'node01'"
and check to make sure that only the key(s) you wanted were added.

 第三步:复制第一台机器的认证到其他机器

在第一台机器上面使用以下命令:

scp /root/.ssh/authorized_keys node02:/root/.ssh

scp /root/.ssh/authorized_keys node03:/root/.ssh

[root@node01 ~]# scp /root/.ssh/authorized_keys node02:/root/.ssh
The authenticity of host 'node02 (192.168.24.138)' can't be established.
ECDSA key fingerprint is SHA256:GzI3JXtwr1thv7B0pdcvYQSpd98Nj1PkjHnvABgHFKI.
ECDSA key fingerprint is MD5:00:00:7b:46:99:5e:ff:f2:54:84:19:25:2c:63:0a:9e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node02,192.168.24.138' (ECDSA) to the list of known hosts.
root@node02's password:
authorized_keys                                                       100%  786   719.4KB/s   00:00
[root@node01 ~]# scp /root/.ssh/authorized_keys node03:/root/.ssh
The authenticity of host 'node03 (192.168.24.139)' can't be established.
ECDSA key fingerprint is SHA256:TyZdob+Hr1ZX7WRSeep1saPljafCrfto9UgRWNoN+20.
ECDSA key fingerprint is MD5:53:64:22:86:20:19:da:51:06:f9:a1:a9:a8:96:4f:af.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node03,192.168.24.139' (ECDSA) to the list of known hosts.
root@node03's password:
authorized_keys                                                       100%  786   692.6KB/s   00:00

 可以使用如下命令,在三台虚拟机直接互相检测,是否免密登录了:

[root@node02 hadoop-2.7.5]# cd ~/.ssh
[root@node02 .ssh]# ssh node01
Last login: Thu Jun 11 10:12:27 2020 from 192.168.24.1
[root@node01 ~]# ssh node02
Last login: Thu Jun 11 14:51:58 2020 from node03

1.6 三台机器时钟同步(三台虚拟机均执行)

为什么需要时间同步

- 因为很多分布式系统是有状态的, 比如说存储一个数据, A节点 记录的时间是 1, B节点 记录的时间是 2, 就会出问题
## 安装
[root@node03 ~]# yum install -y ntp
## 启动定时任务
[root@node03 ~]# crontab -e
no crontab for root - using an empty one
crontab: installing new crontab
## 文件中添加如下内容:
*/1 * * * * /usr/sbin/ntpdate ntp4.aliyun.com;

注:如果在使用yum install ....命令时,遇到如下错误:

/var/run/yum.pid 已被锁定,PID 为 5396 的另一个程序正在运行。
Another app is currently holding the yum lock; waiting for it to exit...
  另一个应用程序是:yum
    内存: 70 M RSS (514 MB VSZ)
    已启动: Thu Jun 11 10:02:10 2020 - 18:48之前
    状态  :跟踪/停止,进程ID:5396
Another app is currently holding the yum lock; waiting for it to exit...
  另一个应用程序是:yum
    内存: 70 M RSS (514 MB VSZ)
    已启动: Thu Jun 11 10:02:10 2020 - 18:50之前
    状态  :跟踪/停止,进程ID:5396
^Z
[1]+  已停止               yum install -y ntp

可以使用此命令解决: 

[root@node03 ~]# rm -f /var/run/yum.pid

如果想要将Linux CentOS的yum源更换为国内yum源,可以使用如下命令:

附:centos yum repo 国内镜像

阿里云镜像:

#备份
cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
#如果你的centos 是 5
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-5.repo
#如果你的centos是6
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo
#如果是7
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
yum clean all
yum makecache

163镜像 :

cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.163.com/.help/CentOS5-Base-163.repo
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.163.com/.help/CentOS6-Base-163.repo
yum clean all
yum makecache

 2.安装jdk

   2.1 安装包分发到其他机器

  第一台机器(192.168.24.37)上面执行以下两个命令:

[root@node01 software]# ls
hadoop-2.7.5  jdk1.8.0_241  zookeeper-3.4.9  zookeeper-3.4.9.tar.gz
[root@node01 software]# java -version
java version "1.8.0_241"
Java(TM) SE Runtime Environment (build 1.8.0_241-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)
[root@node01 software]# scp -r  /software/jdk1.8.0_241/ node02:/software/jdk1.8.0_241/
root@node02's password:
(省略.....)
[root@node01 software]# scp -r  /software/jdk1.8.0_241/ node03:/software/jdk1.8.0_241/
root@node03's password:
(省略.....)

  ps: 这里我的node1节点中的jdk1.8已经安装配置好了,可以参考:jdk安装

       执行完毕后,可以在node2、node3节点上查看,可以发现自动创建了/software/jdk1.8.0_241/目录,并将node1节点的jdk安装包传输到了node2和node3.然后在node2和node3节点中使用如下命令配置jdk:

[root@node02 software]# vim /etc/profile
[root@node02 software]# source /etc/profile
[root@node02 software]# java -version
java version "1.8.0_241"
Java(TM) SE Runtime Environment (build 1.8.0_241-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)

 /etc/profile 新增内容:

export JAVA_HOME=/software/jdk1.8.0_241
export CLASSPATH="$JAVA_HOME/lib"
export PATH="$JAVA_HOME/bin:$PATH"

3.zookeeper集群安装

服务器IP

主机名

myid的值

192.168.24.137

node01

1

192.168.24.138

node02

2

192.168.24.139

node03

3

  3.1 下载zookeeeper的压缩包

 下载网址如下:zookeeper下载地址,我使用的zk版本为3.4.9。可以使用wget下载。

  3.2 解压

[root@node01 software]# tar -zxvf zookeeper-3.4.9.tar.gz
[root@node01 software]# ls
hadoop-2.7.5  jdk1.8.0_241  zookeeper-3.4.9  zookeeper-3.4.9.tar.gz

  3.3 修改配置文件

第一台机器(node1)修改配置文件  

cd /software/zookeeper-3.4.9/conf/

cp zoo_sample.cfg zoo.cfg

mkdir -p /software/zookeeper-3.4.9/zkdatas/

vim zoo.cfg: (新增部分)

dataDir=/software/zookeeper-3.4.9/zkdatas
# 保留多少个快照
autopurge.snapRetainCount=3
# 日志多少小时清理一次
autopurge.purgeInterval=1
# 集群中服务器地址
server.1=node01:2888:3888
server.2=node02:2888:3888
server.3=node03:2888:3888

  3.4 添加myid配置

   在第一台机器(node1)的/software/zookeeper-3.4.9/zkdatas /这个路径下创建一个文件,文件名为myid ,文件内容为1,使用命令:

echo 1 > /software/zookeeper-3.4.9/zkdatas/myid

  3.5 安装包分发并修改myid的值

 安装包分发到其他机器

第一台机器(node1)上面执行以下两个命令

[root@node01 conf]# scp -r  /software/zookeeper-3.4.9/ node02:/software/zookeeper-3.4.9/
root@node02's password:
(省略.....)
[root@node01 conf]# scp -r  /software/zookeeper-3.4.9/ node03:/software/zookeeper-3.4.9/
root@node03's password:
(省略.....)

第二台机器上修改myid的值为2

echo 2 > /software/zookeeper-3.4.9/zkdatas/myid

第三台机器上修改myid的值为3

echo 3 > /software/zookeeper-3.4.9/zkdatas/myid

3.6 三台机器启动zookeeper服务(三台虚拟机均执行)

#启动
/software/zookeeper-3.4.9/bin/zkServer.sh start

#查看启动状态
/software/zookeeper-3.4.9/bin/zkServer.sh status

如图:

hdfs丢失块报警清除_CentOS_04

4、安装配置hadoop

使用完全分布式,实现namenode高可用,ResourceManager的高可用

 

192.168.24.137

192.168.24.138

192.168.24.139

zookeeper

zk

zk

zk

HDFS

JournalNode

JournalNode

JournalNode

NameNode

NameNode

 

ZKFC

ZKFC

 

DataNode

DataNode

DataNode

YARN

 

ResourceManager

ResourceManager

NodeManager

NodeManager

NodeManager

MapReduce

 

 

JobHistoryServer

  4.1  Linux centos7.5 编译hadoop源码

    这里我并不直接使用hadoop提供的包,而使用自己编译过后的hadoop的包。停止之前的hadoop集群的所有服务,并删除所有机器的hadoop安装包.

[root@localhost software]# cd /software/hadoop-2.7.5-src/hadoop-dist/target
[root@localhost target]# ls
antrun                    hadoop-2.7.5.tar.gz                 javadoc-bundle-options
classes                   hadoop-dist-2.7.5.jar               maven-archiver
dist-layout-stitching.sh  hadoop-dist-2.7.5-javadoc.jar       maven-shared-archive-resources
dist-tar-stitching.sh     hadoop-dist-2.7.5-sources.jar       test-classes
hadoop-2.7.5              hadoop-dist-2.7.5-test-sources.jar  test-dir
[root@localhost target]# cp -r hadoop-2.7.5 /software
[root@localhost target]# cd /software/
[root@localhost software]# ls
apache-maven-3.0.5             findbugs-1.3.9.tar.gz    jdk1.7.0_75                protobuf-2.5.0
apache-maven-3.0.5-bin.tar.gz  hadoop-2.7.5             jdk-7u75-linux-x64.tar.gz  protobuf-2.5.0.tar.gz
apache-tomcat-6.0.53.tar.gz    hadoop-2.7.5-src         mvnrepository              snappy-1.1.1
findbugs-1.3.9                 hadoop-2.7.5-src.tar.gz  mvnrepository.tar.gz       snappy-1.1.1.tar.gz
[root@localhost software]# cd hadoop-2.7.5
[root@localhost hadoop-2.7.5]# ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
[root@localhost hadoop-2.7.5]# cd etc
[root@localhost etc]# ls
hadoop
[root@localhost etc]# cd hadoop/
[root@localhost hadoop]# ls
capacity-scheduler.xml      hadoop-policy.xml        kms-log4j.properties        ssl-client.xml.example
configuration.xsl           hdfs-site.xml            kms-site.xml                ssl-server.xml.example
container-executor.cfg      httpfs-env.sh            log4j.properties            yarn-env.cmd
core-site.xml               httpfs-log4j.properties  mapred-env.cmd              yarn-env.sh
hadoop-env.cmd              httpfs-signature.secret  mapred-env.sh               yarn-site.xml
hadoop-env.sh               httpfs-site.xml          mapred-queues.xml.template
hadoop-metrics2.properties  kms-acls.xml             mapred-site.xml.template
hadoop-metrics.properties   kms-env.sh               slaves

附: 可以使用notepad++插件:NppFtp来对远程服务器中文件进行编辑:

在下列小图标中找到Show NppFTP Window,结果并未找到。

hdfs丢失块报警清除_hadoop_05

点击插件-插件管理-搜索nppftp-勾选-安装。

hdfs丢失块报警清除_大数据_06

再次打开就会多一个小图标:点击connect,现在就可以对远程服务器文件进行编辑了。

hdfs丢失块报警清除_大数据_07

hdfs丢失块报警清除_大数据_08

不过这里呢,我不使用notepad++插件:NppFtp,我使用MobaXterm对远程服务器文件进行编辑。

4.2 修改hadoop配置文件

  4.2.1 修改 core-site.xml

cd /software/hadoop-2.7.5/etc/hadoop
<configuration>
	<!-- 指定NameNode的HA高可用的zk地址  -->
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>node01:2181,node02:2181,node03:2181</value>
	</property>
	<!-- 指定HDFS访问的域名地址  -->
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://ns</value>
	</property>
	<!-- 临时文件存储目录  -->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/software/hadoop-2.7.5/data/tmp</value>
	</property>
	<!-- 开启hdfs垃圾箱机制,指定垃圾箱中的文件七天之后就彻底删掉
			单位为分钟
	 -->
	<property>
		<name>fs.trash.interval</name>
		<value>10080</value>
	</property>
</configuration>

 4.2.2 修改 hdfs-site.xml:

<configuration>
	<!--  指定命名空间  -->
	<property>
		<name>dfs.nameservices</name>
		<value>ns</value>
	</property>
	<!--  指定该命名空间下的两个机器作为我们的NameNode  -->
	<property>
		<name>dfs.ha.namenodes.ns</name>
		<value>nn1,nn2</value>
	</property>
	<!-- 配置第一台服务器的namenode通信地址  -->
	<property>
		<name>dfs.namenode.rpc-address.ns.nn1</name>
		<value>node01:8020</value>
	</property>
	<!--  配置第二台服务器的namenode通信地址  -->
	<property>
		<name>dfs.namenode.rpc-address.ns.nn2</name>
		<value>node02:8020</value>
	</property>
	<!-- 所有从节点之间相互通信端口地址 -->
	<property>
		<name>dfs.namenode.servicerpc-address.ns.nn1</name>
		<value>node01:8022</value>
	</property>
	<!-- 所有从节点之间相互通信端口地址 -->
	<property>
		<name>dfs.namenode.servicerpc-address.ns.nn2</name>
		<value>node02:8022</value>
	</property>
	<!-- 第一台服务器namenode的web访问地址  -->
	<property>
		<name>dfs.namenode.http-address.ns.nn1</name>
		<value>node01:50070</value>
	</property>
	<!-- 第二台服务器namenode的web访问地址  -->
	<property>
		<name>dfs.namenode.http-address.ns.nn2</name>
		<value>node02:50070</value>
	</property>
	<!-- journalNode的访问地址,注意这个地址一定要配置 -->
	<property>
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://node01:8485;node02:8485;node03:8485/ns1</value>
	</property>
	<!--  指定故障自动恢复使用的哪个java类 -->
	<property>
		<name>dfs.client.failover.proxy.provider.ns</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property>
	<!-- 故障转移使用的哪种通信机制 -->
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	</property>
	<!-- 指定通信使用的公钥  -->
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/root/.ssh/id_rsa</value>
	</property>
	<!-- journalNode数据存放地址  -->
	<property>
		<name>dfs.journalnode.edits.dir</name>
		<value>/software/hadoop-2.7.5/data/dfs/jn</value>
	</property>
	<!-- 启用自动故障恢复功能 -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>
	<!-- namenode产生的文件存放路径 -->
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:///software/hadoop-2.7.5/data/dfs/nn/name</value>
	</property>
	<!-- edits产生的文件存放路径 -->
	<property>
		<name>dfs.namenode.edits.dir</name>
		<value>file:///software/hadoop-2.7.5/data/dfs/nn/edits</value>
	</property>
	<!-- dataNode文件存放路径 -->
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:///software/hadoop-2.7.5/data/dfs/dn</value>
	</property>
	<!-- 关闭hdfs的文件权限 -->
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
	<!-- 指定block文件块的大小 -->
	<property>
		<name>dfs.blocksize</name>
		<value>134217728</value>
	</property>
</configuration>

 4.2.3 修改yarn-site.xml

注:node03与node02配置不同

<configuration>
	<!-- Site specific YARN configuration properties -->
	<!-- 是否启用日志聚合.应用程序完成后,日志汇总收集每个容器的日志,这些日志移动到文件系统,例如HDFS. -->
	<!-- 用户可以通过配置"yarn.nodemanager.remote-app-log-dir"、"yarn.nodemanager.remote-app-log-dir-suffix"来确定日志移动到的位置 -->
	<!-- 用户可以通过应用程序时间服务器访问日志 -->
	<!-- 启用日志聚合功能,应用程序完成后,收集各个节点的日志到一起便于查看 -->
	<property>
		<name>yarn.log-aggregation-enable</name>
		<value>true</value>
	</property>
	<!--开启resource manager HA,默认为false-->
	<property>
		<name>yarn.resourcemanager.ha.enabled</name>
		<value>true</value>
	</property>
	<!-- 集群的Id,使用该值确保RM不会做为其它集群的active -->
	<property>
		<name>yarn.resourcemanager.cluster-id</name>
		<value>mycluster</value>
	</property>
	<!--配置resource manager  命名-->
	<property>
		<name>yarn.resourcemanager.ha.rm-ids</name>
		<value>rm1,rm2</value>
	</property>
	<!-- 配置第一台机器的resourceManager -->
	<property>
		<name>yarn.resourcemanager.hostname.rm1</name>
		<value>node03</value>
	</property>
	<!-- 配置第二台机器的resourceManager -->
	<property>
		<name>yarn.resourcemanager.hostname.rm2</name>
		<value>node02</value>
	</property>
	<!-- 配置第一台机器的resourceManager通信地址 -->
	<property>
		<name>yarn.resourcemanager.address.rm1</name>
		<value>node03:8032</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm1</name>
		<value>node03:8030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
		<value>node03:8031</value>
	</property>
	<property>
		<name>yarn.resourcemanager.admin.address.rm1</name>
		<value>node03:8033</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm1</name>
		<value>node03:8088</value>
	</property>
	<!-- 配置第二台机器的resourceManager通信地址 -->
	<property>
		<name>yarn.resourcemanager.address.rm2</name>
		<value>node02:8032</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm2</name>
		<value>node02:8030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
		<value>node02:8031</value>
	</property>
	<property>
		<name>yarn.resourcemanager.admin.address.rm2</name>
		<value>node02:8033</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm2</name>
		<value>node02:8088</value>
	</property>
	<!--开启resourcemanager自动恢复功能-->
	<property>
		<name>yarn.resourcemanager.recovery.enabled</name>
		<value>true</value>
	</property>
	<!--在node3上配置rm1,在node2上配置rm2,注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改,其他机器上不配置此项-->
	<property>
		<name>yarn.resourcemanager.ha.id</name>
		<value>rm1</value>
		<description>If we want to launch more than one RM in single node, we need this configuration</description>
	</property>
	<!--用于持久存储的类。尝试开启-->
	<property>
		<name>yarn.resourcemanager.store.class</name>
		<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
	</property>
	<property>
		<name>yarn.resourcemanager.zk-address</name>
		<value>node02:2181,node03:2181,node01:2181</value>
		<description>For multiple zk services, separate them with comma</description>
	</property>
	<!--开启resourcemanager故障自动切换,指定机器-->
	<property>
		<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
		<value>true</value>
		<description>Enable automatic failover; By default, it is enabled only when HA is enabled.</description>
	</property>
	<property>
		<name>yarn.client.failover-proxy-provider</name>
		<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
	</property>
	<!-- 允许分配给一个任务最大的CPU核数,默认是8 -->
	<property>
		<name>yarn.nodemanager.resource.cpu-vcores</name>
		<value>4</value>
	</property>
	<!-- 每个节点可用内存,单位MB -->
	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>512</value>
	</property>
	<!-- 单个任务可申请最少内存,默认1024MB -->
	<property>
		<name>yarn.scheduler.minimum-allocation-mb</name>
		<value>512</value>
	</property>
	<!-- 单个任务可申请最大内存,默认8192MB -->
	<property>
		<name>yarn.scheduler.maximum-allocation-mb</name>
		<value>512</value>
	</property>
	<!--多长时间聚合删除一次日志 此处-->
	<property>
		<name>yarn.log-aggregation.retain-seconds</name>
		<value>2592000</value>
		<!--30 day-->
	</property>
	<!--时间在几秒钟内保留用户日志。只适用于如果日志聚合是禁用的-->
	<property>
		<name>yarn.nodemanager.log.retain-seconds</name>
		<value>604800</value>
		<!--7 day-->
	</property>
	<!--指定文件压缩类型用于压缩汇总日志-->
	<property>
		<name>yarn.nodemanager.log-aggregation.compression-type</name>
		<value>gz</value>
	</property>
	<!-- nodemanager本地文件存储目录-->
	<property>
		<name>yarn.nodemanager.local-dirs</name>
		<value>/software/hadoop-2.7.5/yarn/local</value>
	</property>
	<!-- resourceManager  保存最大的任务完成个数 -->
	<property>
		<name>yarn.resourcemanager.max-completed-applications</name>
		<value>1000</value>
	</property>
	<!-- 逗号隔开的服务列表,列表名称应该只包含a-zA-Z0-9_,不能以数字开始-->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<!--rm失联后重新链接的时间-->
	<property>
		<name>yarn.resourcemanager.connect.retry-interval.ms</name>
		<value>2000</value>
	</property>
</configuration>

4.2.4 修改mapred-site.xml

<configuration>
	<!--指定运行mapreduce的环境是yarn -->
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<!-- MapReduce JobHistory Server IPC host:port -->
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>node03:10020</value>
	</property>
	<!-- MapReduce JobHistory Server Web UI host:port -->
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>node03:19888</value>
	</property>
	<!-- The directory where MapReduce stores control files.默认 ${hadoop.tmp.dir}/mapred/system -->
	<property>
		<name>mapreduce.jobtracker.system.dir</name>
		<value>/software/hadoop-2.7.5/data/system/jobtracker</value>
	</property>
	<!-- The amount of memory to request from the scheduler for each map task. 默认 1024-->
	<property>
		<name>mapreduce.map.memory.mb</name>
		<value>1024</value>
	</property>
	<!-- <property>
                <name>mapreduce.map.java.opts</name>
                <value>-Xmx1024m</value>
        </property> -->
	<!-- The amount of memory to request from the scheduler for each reduce task. 默认 1024-->
	<property>
		<name>mapreduce.reduce.memory.mb</name>
		<value>1024</value>
	</property>
	<!-- <property>
               <name>mapreduce.reduce.java.opts</name>
               <value>-Xmx2048m</value>
        </property> -->
	<!-- 用于存储文件的缓存内存的总数量,以兆字节为单位。默认情况下,分配给每个合并流1MB,给个合并流应该寻求最小化。默认值100-->
	<property>
		<name>mapreduce.task.io.sort.mb</name>
		<value>100</value>
	</property>
	<!-- <property>
        <name>mapreduce.jobtracker.handler.count</name>
        <value>25</value>
        </property>-->
	<!-- 整理文件时用于合并的流的数量。这决定了打开的文件句柄的数量。默认值10-->
	<property>
		<name>mapreduce.task.io.sort.factor</name>
		<value>10</value>
	</property>
	<!-- 默认的并行传输量由reduce在copy(shuffle)阶段。默认值5-->
	<property>
		<name>mapreduce.reduce.shuffle.parallelcopies</name>
		<value>25</value>
	</property>
	<property>
		<name>yarn.app.mapreduce.am.command-opts</name>
		<value>-Xmx1024m</value>
	</property>
	<!-- MR AppMaster所需的内存总量。默认值1536-->
	<property>
		<name>yarn.app.mapreduce.am.resource.mb</name>
		<value>1536</value>
	</property>
	<!-- MapReduce存储中间数据文件的本地目录。目录不存在则被忽略。默认值${hadoop.tmp.dir}/mapred/local-->
	<property>
		<name>mapreduce.cluster.local.dir</name>
		<value>/software/hadoop-2.7.5/data/system/local</value>
	</property>
</configuration>

4.2.5 修改slaves

ps:后面的版本好像改成了workers。

hdfs丢失块报警清除_大数据_09

node01
node02
node03

4.2.6 修改hadoop-env.sh

export JAVA_HOME=/software/jdk1.8.0_241

4.2.7 将第一台机器(node1)hadoop的安装包发送到其他机器上 

[root@node01 software]# ls
hadoop-2.7.5  jdk1.8.0_241  zookeeper-3.4.9  zookeeper-3.4.9.tar.gz
[root@node01 software]# scp -r hadoop-2.7.5/ node02:$PWD
root@node02's password:
(省略.....)
[root@node01 software]# scp -r hadoop-2.7.5/ node03:$PWD
root@node03's password:
(省略.....)

4.2.8 创建目录(三台虚拟机都创建)

mkdir -p /software/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /software/hadoop-2.7.5/data/dfs/nn/edits
mkdir -p /software/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /software/hadoop-2.7.5/data/dfs/nn/edits

如图: 

hdfs丢失块报警清除_CentOS_10

4.2.9 更改node02、node03 节点中的yarn-site.xml

node01: 注掉yarn.resourcemanager.ha.id这一部分

<!-- 
<property>
	<name>yarn.resourcemanager.ha.id</name>
	<value>rm1</value>
	<description>If we want to launch more than one RM in single node, we need this configuration</description>
</property>
-->

node02:

<property>
	<name>yarn.resourcemanager.ha.id</name>
	<value>rm2</value>
	<description>If we want to launch more than one RM in single node, we need this configuration</description>
</property>

node03:

<property>
	<name>yarn.resourcemanager.ha.id</name>
	<value>rm1</value>
	<description>If we want to launch more than one RM in single node, we need this configuration</description>
</property>

5、启动hadoop

 5.1 启动HDFS过程

 node01机器执行以下命令:

bin/hdfs zkfc -formatZK

sbin/hadoop-daemons.sh start journalnode

bin/hdfs namenode -format

bin/hdfs namenode -initializeSharedEdits -force

sbin/start-dfs.sh

如果在执行命令时,遇到如下问题,那么就是虚拟机免密码登录没有配置,或者配置的有问题:

[root@node01 hadoop-2.7.5]# sbin/hadoop-daemons.sh start journalnode
The authenticity of host 'node01 (192.168.24.137)' can't be established.
ECDSA key fingerprint is SHA256:GzI3JXtwr1thv7B0pdcvYQSpd98Nj1PkjHnvABgHFKI.
ECDSA key fingerprint is MD5:00:00:7b:46:99:5e:ff:f2:54:84:19:25:2c:63:0a:9e.
Are you sure you want to continue connecting (yes/no)? root@node02's password: root@node03's password: Please type 'yes' or 'no':
node01: Warning: Permanently added 'node01' (ECDSA) to the list of known hosts.
root@node01's password:
node02: starting journalnode, logging to /software/hadoop-2.7.5/logs/hadoop-root-journalnode-node02.out


root@node03's password: node03: Permission denied, please try again.

root@node01's password: node01: Permission denied, please try again.

node02机器执行以下命令:

[root@node02 software]# cd hadoop-2.7.5/
[root@node02 hadoop-2.7.5]# bin/hdfs namenode -bootstrapStandby
(省略....)
[root@node02 hadoop-2.7.5]# sbin/hadoop-daemon.sh start namenode
(省略....)

5.2 启动yarn过程

node02、node03机器执行以下命令:

[root@node03 software]# cd hadoop-2.7.5/
[root@node03 hadoop-2.7.5]# sbin/start-yarn.sh
[root@node02 hadoop-2.7.5]# sbin/start-yarn.sh
starting yarn daemons
resourcemanager running as process 11740. Stop it first.
The authenticity of host 'node02 (192.168.24.138)' can't be established.
ECDSA key fingerprint is SHA256:GzI3JXtwr1thv7B0pdcvYQSpd98Nj1PkjHnvABgHFKI.
ECDSA key fingerprint is MD5:00:00:7b:46:99:5e:ff:f2:54:84:19:25:2c:63:0a:9e.
Are you sure you want to continue connecting (yes/no)? node01: nodemanager running as process 15655. Stop it first.
node03: nodemanager running as process 13357. Stop it first.

启动过程中,如果遇到上面的报错信息,则使用下面命令解决: 

网上的大部分博客(不推荐使用,好像有问题)是如下命令:会提示This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh

#进程已经在运行中了,先执行stop-all.sh下,然后再执行start-all.sh
[root@node02 sbin]# pwd
/software/hadoop-2.7.5/sbin
[root@node02 sbin]# ./stop-all.sh
[root@node02 sbin]# ./start-all.sh

不过命令已经废弃了,现在使用:

./stop-yarn.sh
 ./stop-dfs.sh

 ./start-yarn.sh
 ./start-dfs.sh
[root@node03 sbin]# ./start-dfs.sh
Starting namenodes on [node01 node02]
node02: starting namenode, logging to /software/hadoop-2.7.5/logs/hadoop-root-namenode-node02.out
node01: starting namenode, logging to /software/hadoop-2.7.5/logs/hadoop-root-namenode-node01.out
node02: starting datanode, logging to /software/hadoop-2.7.5/logs/hadoop-root-datanode-node02.out
node01: starting datanode, logging to /software/hadoop-2.7.5/logs/hadoop-root-datanode-node01.out
node03: starting datanode, logging to /software/hadoop-2.7.5/logs/hadoop-root-datanode-node03.out
Starting journal nodes [node01 node02 node03]
node02: starting journalnode, logging to /software/hadoop-2.7.5/logs/hadoop-root-journalnode-node02.out
node01: starting journalnode, logging to /software/hadoop-2.7.5/logs/hadoop-root-journalnode-node01.out
node03: starting journalnode, logging to /software/hadoop-2.7.5/logs/hadoop-root-journalnode-node03.out
Starting ZK Failover Controllers on NN hosts [node01 node02]
node01: starting zkfc, logging to /software/hadoop-2.7.5/logs/hadoop-root-zkfc-node01.out
node02: starting zkfc, logging to /software/hadoop-2.7.5/logs/hadoop-root-zkfc-node02.out
[root@node03 sbin]# ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /software/hadoop-2.7.5/logs/yarn-root-resourcemanager-node03.out
node01: starting nodemanager, logging to /software/hadoop-2.7.5/logs/yarn-root-nodemanager-node01.out
node02: starting nodemanager, logging to /software/hadoop-2.7.5/logs/yarn-root-nodemanager-node02.out
node03: starting nodemanager, logging to /software/hadoop-2.7.5/logs/yarn-root-nodemanager-node03.out

注:使用jps命令查看三台虚拟机:

问题:这是有问题的,三台虚拟机中均少了DataNode,DataNode没有起来

node01: 

[root@node01 hadoop-2.7.5]# jps
8083 NodeManager
8531 DFSZKFailoverController
8404 JournalNode
9432 Jps
1467 QuorumPeerMain
8235 NameNode

node02: 

[root@node02 sbin]# jps
7024 NodeManager
7472 DFSZKFailoverController
7345 JournalNode
7176 NameNode
8216 ResourceManager
8793 Jps
1468 QuorumPeerMain

node03:

[root@node03 hadoop-2.7.5]# jps
5349 NodeManager
5238 ResourceManager
6487 JobHistoryServer
6647 Jps
5997 JournalNode

解决:

(1)首先使用stop-dfs.sh和stop-yarn.sh将服务停掉:(在任意节点执行即可)

[root@node03 hadoop-2.7.5]# ./sbin/stop-dfs.sh
Stopping namenodes on [node01 node02]
node02: no namenode to stop
node01: no namenode to stop
node02: no datanode to stop
node01: no datanode to stop
node03: no datanode to stop
Stopping journal nodes [node01 node02 node03]
node02: no journalnode to stop
node01: no journalnode to stop
node03: no journalnode to stop
Stopping ZK Failover Controllers on NN hosts [node01 node02]
node02: no zkfc to stop
node01: no zkfc to stop
[root@node03 hadoop-2.7.5]# ./sbin/stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
node01: stopping nodemanager
node02: stopping nodemanager
node03: stopping nodemanager
no proxyserver to stop

(2)删除dataNode数据存放路径中的文件(三台虚拟机中都要删除)

hdfs丢失块报警清除_zookeeper_11

根据配置,所以需要删除/software/hadoop-2.7.5/data/dfs/dn下的文件:

(3)使用start-dfs.sh和start-yarn.sh将服务再次启动(任意节点即可)

[root@node01 hadoop-2.7.5]# rm -rf data/dfs/dn
[root@node01 hadoop-2.7.5]# sbin/start-dfs.sh
Starting namenodes on [node01 node02]
node02: starting namenode, logging to /software/hadoop-2.7.5/logs/hadoop-root-namenode-node02.out
node01: starting namenode, logging to /software/hadoop-2.7.5/logs/hadoop-root-namenode-node01.out
node02: starting datanode, logging to /software/hadoop-2.7.5/logs/hadoop-root-datanode-node02.out
node03: starting datanode, logging to /software/hadoop-2.7.5/logs/hadoop-root-datanode-node03.out
node01: starting datanode, logging to /software/hadoop-2.7.5/logs/hadoop-root-datanode-node01.out
Starting journal nodes [node01 node02 node03]
node02: starting journalnode, logging to /software/hadoop-2.7.5/logs/hadoop-root-journalnode-node02.out
node03: starting journalnode, logging to /software/hadoop-2.7.5/logs/hadoop-root-journalnode-node03.out
node01: starting journalnode, logging to /software/hadoop-2.7.5/logs/hadoop-root-journalnode-node01.out
Starting ZK Failover Controllers on NN hosts [node01 node02]
node02: starting zkfc, logging to /software/hadoop-2.7.5/logs/hadoop-root-zkfc-node02.out
node01: starting zkfc, logging to /software/hadoop-2.7.5/logs/hadoop-root-zkfc-node01.out
您在 /var/spool/mail/root 中有新邮件
[root@node01 hadoop-2.7.5]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /software/hadoop-2.7.5/logs/yarn-root-resourcemanager-node01.out
node02: starting nodemanager, logging to /software/hadoop-2.7.5/logs/yarn-root-nodemanager-node02.out
node03: starting nodemanager, logging to /software/hadoop-2.7.5/logs/yarn-root-nodemanager-node03.out
node01: starting nodemanager, logging to /software/hadoop-2.7.5/logs/yarn-root-nodemanager-node01.out

再次使用jps查看三台虚拟机:(此时才是正确的)

node01:

[root@node01 dfs]# jps
10561 NodeManager
9955 DataNode
10147 JournalNode
9849 NameNode
10762 Jps
1467 QuorumPeerMain
10319 DFSZKFailoverController

node02:

[root@node02 hadoop-2.7.5]# jps
9744 NodeManager
9618 DFSZKFailoverController
9988 Jps
9367 NameNode
8216 ResourceManager
9514 JournalNode
1468 QuorumPeerMain
9439 DataNode

node03:

[root@node03 hadoop-2.7.5]# jps
7953 Jps
7683 JournalNode
6487 JobHistoryServer
7591 DataNode
7784 NodeManager

5.3 查看resourceManager状态

node03上面执行:

[root@node03 hadoop-2.7.5]# bin/yarn rmadmin -getServiceState rm1
active

node02上面执行:

[root@node02 hadoop-2.7.5]# bin/yarn rmadmin -getServiceState rm2
standby

5.4 启动jobHistory

node03:

[root@node03 hadoop-2.7.5]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /software/hadoop-2.7.5/logs/mapred-root-historyserver-node03.out

5.5 hdfs状态查看

node01: (部分截图)

浏览器访问:http://192.168.24.137:50070/dfshealth.html#tab-overview

hdfs丢失块报警清除_CentOS_12

node02:(部分截图)

浏览器访问:http://192.168.24.138:50070/dfshealth.html#tab-overview

hdfs丢失块报警清除_大数据_13

5.6 yarn集群访问查看

浏览器访问:http://192.168.24.139:8088/cluster/nodes

hdfs丢失块报警清除_hdfs丢失块报警清除_14

5.7 历史任务浏览界面

hdfs丢失块报警清除_hadoop_15

6.hadoop命令行

删除文件:

[root@node01 bin]# ./hdfs dfs -rm /a.txt
20/06/12 14:33:30 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 10080 minutes, Emptier interval = 0 minutes.
20/06/12 14:33:30 INFO fs.TrashPolicyDefault: Moved: 'hdfs://ns/a.txt' to trash at: hdfs://ns/user/root/.Trash/Current/a.txt
Moved: 'hdfs://ns/a.txt' to trash at: hdfs://ns/user/root/.Trash/Current

创建文件夹: 

[root@node01 bin]# ./hdfs dfs -mkdir /dir

上传文件: 

[root@node01 bin]# ./hdfs dfs -put /software/a.txt /dir

截图: 

hdfs丢失块报警清除_hadoop_16

hdfs丢失块报警清除_大数据_17

注:

  点击Download实际上给我访问的是http://node02:50075....什么的。如果不配置hosts,是打不开的,将node02改成ip就可以了,虚拟机中我用的是node01,node02,node3可以直接访问的。

hdfs丢失块报警清除_hdfs丢失块报警清除_18

于是我改变宿主机的hosts文件,与虚拟机的hosts改成一致。 再次点击Download可以直接浏览器下载了。

hdfs丢失块报警清除_hdfs丢失块报警清除_19

至此,hadoop分布式环境搭建成功。