说明:

本专栏的第一篇博客,只是搭建了一个完全的分布式Hadoop集群,而且当时的安装目录是在/usr/local/src/下边安装的,后来因为报错的问题,又重新搭建了一次完全分布式的集群,今天就来讲一下,如何搭建真正实现高可用的Hadoop集群。

前提:

必须完成三台节点的jdk,zookeeper的安装配置,至于hadoop的话,接着以前的环境做也可以,删掉从新解压从新配置都可以,因为我比较懒,就是接着做的。

不多废话,直接操作。

Hadoop HA搭建

1,更改环境变量

vi /etc/profile

进行如下配置(此处需先删除第四章配置的环境变量)

#hadoop enviroment
export HADOOP_HOME=/opt/local/src/hadoop  #HADOOP_HOME指向JAVA安装目录
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

#java environment
export JAVA_HOME=/opt/local/src/java	#JAVA_HOME指向JAVA安装目录
export PATH=$PATH:$JAVA_HOME/bin	#将JAVA安装目录加入PATH路径
#zookeeper environment
export ZK_HOME=/opt/local/src/zookeeper
export PATH=$PATH:$ZK_HOME/bin

保存并退出。

2,配置hadoop环境变量

cd /opt/local/src/hadoop/etc/hadoop
vi hadoop-env.sh

 在最下面添加如下配置:

export JAVA_HOME=/opt/local/src/jdk

保存并退出

3,修改配置文件1

vi core-site.xml

添加如下配置:

<!-- 指定hdfs的nameservice为mycluster -->

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://mycluster</value>

    </property>

    <property>

        <name>hadoop.tmp.dir</name>

        <value>file:/opt/local/src/hadoop/tmp</value>

    </property>

    <!-- 指定zookeeper地址 -->

    <property>

        <name>ha.zookeeper.quorum</name>

        <value>master:2181,slave1:2181,slave2:2181</value>

    </property>

    <!-- hadoop链接zookeeper的超时时长设置 -->

    <property>

        <name>ha.zookeeper.session-timeout.ms</name>

        <value>30000</value>

        <description>ms</description>

    </property>

    <property>

        <name>fs.trash.interval</name>

        <value>1440</value>

    </property>

保存并退出.

4,修改配置文件2

vi hdfs-site.xml

进行如下配置:

<!-- journalnode集群之间通信的超时时间 -->

    <property>

        <name>dfs.qjournal.start-segment.timeout.ms</name>

        <value>60000</value>

    </property>

    <!--指定hdfs的nameservice为mycluster,需要和core-site.xml中的保持一致        dfs.ha.namenodes.[nameservice id]为在nameservice中的每一个NameNode设置唯一标示符。配置一个逗号分隔的NameNode ID列表。这将是被DataNode识别为所有的NameNode。如果使用"mycluster"作为nameservice ID,并且使用"master"和"slave1"作为NameNodes标示符 -->

    <property>

        <name>dfs.nameservices</name>

        <value>mycluster</value>

    </property>

        <!-- mycluster下面有两个NameNode,分别是master,slave1 -->

    <property>

        <name>dfs.ha.namenodes.mycluster</name>

        <value>master,slave1</value>

    </property>

    <!-- master的RPC通信地址 -->

    <property>

        <name>dfs.namenode.rpc-address.mycluster.master</name>

        <value>master:8020</value>

    </property>

    <!-- slave1的RPC通信地址 -->

    <property>

        <name>dfs.namenode.rpc-address.mycluster.slave1</name>

        <value>slave1:8020</value>

    </property>

     <!-- master的http通信地址 -->

    <property>

        <name>dfs.namenode.http-address.mycluster.master</name>

        <value>master:50070</value>

    </property>

    <!-- slave1的http通信地址 -->

    <property>

        <name>dfs.namenode.http-address.mycluster.slave1</name>

        <value>slave1:50070</value>

    </property>

    <!-- 指定NameNode的edits元数据的共享存储位置。也就是JournalNode列表

         该url的配置格式:qjournal://host1:port1;host2:port2;host3:port3/journalId

        journalId推荐使用nameservice,默认端口号是:8485 -->

    <property>

        <name>dfs.namenode.shared.edits.dir</name>

        <value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value>

    </property>

    <!-- 配置失败自动切换实现方式 -->

    <property>

        <name>dfs.client.failover.proxy.provider.mycluster</name>

        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

    </property>

    <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行 -->

    <property>

        <name>dfs.ha.fencing.methods</name>

        <value>

            sshfence

        shell(/bin/true)

        </value>

    </property>

    <property>

       <name>dfs.permissions.enabled</name>

       <value>false</value>

    </property>

    <property>

        <name>dfs.support.append</name>

        <value>true</value>

    </property>

    <!-- 使用sshfence隔离机制时需要ssh免登陆 -->

    <property>

        <name>dfs.ha.fencing.ssh.private-key-files</name>

        <value>/root/.ssh/id_rsa</value>

    </property>

    <!-- 指定副本数 -->

    <property>

        <name>dfs.replication</name>

        <value>2</value>

    </property>

    <property>

        <name>dfs.namenode.name.dir</name>

        <value>/opt/local/src/hadoop/tmp/hdfs/nn</value>

    </property>

    <property>

        <name>dfs.datanode.data.dir</name>

        <value>/opt/local/src/hadoop/tmp/hdfs/dn</value>

    </property>

    <!-- 指定JournalNode在本地磁盘存放数据的位置 -->

    <property>

        <name>dfs.journalnode.edits.dir</name>

        <value>/opt/local/src/hadoop/tmp/hdfs/jn</value>

    </property>

    <!-- 开启NameNode失败自动切换 -->

    <property>

        <name>dfs.ha.automatic-failover.enabled</name>

        <value>true</value>

    </property>

    <!-- 启用webhdfs -->

    <property>

        <name>dfs.webhdfs.enabled</name>

        <value>true</value>

    </property>

    <!-- 配置sshfence隔离机制超时时间 -->

    <property>

        <name>dfs.ha.fencing.ssh.connect-timeout</name>

        <value>30000</value>

    </property>

    <property>

        <name>ha.failover-controller.cli-check.rpc-timeout.ms</name>

        <value>60000</value>

    </property>

保存并退出。

5,修改配置文件3

vi mapred-site.xml

进行如下配置:

<!-- 指定mr框架为yarn方式 -->

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

</property>

    <!-- 指定mapreduce jobhistory地址 -->

    <property>

        <name>mapreduce.jobhistory.address</name>

        <value>master:10020</value>

    </property>

    <!-- 任务历史服务器的web地址 -->

    <property>

        <name>mapreduce.jobhistory.webapp.address</name>

        <value>master:19888</value>

  </property>

保存并退出。

6,修改配置文件4

vi yarn-site.xml

进行如下配置: 

<!-- Site specific YARN configuration properties -->

    <!-- 开启RM高可用 -->

    <property>

        <name>yarn.resourcemanager.ha.enabled</name>

        <value>true</value>

    </property>

    <!-- 指定RM的cluster id -->

    <property>

        <name>yarn.resourcemanager.cluster-id</name>

        <value>yrc</value>

    </property>

    <!-- 指定RM的名字 -->

    <property>

        <name>yarn.resourcemanager.ha.rm-ids</name>

        <value>rm1,rm2</value>

    </property>

    <!-- 分别指定RM的地址 -->

    <property>

        <name>yarn.resourcemanager.hostname.rm1</name>

        <value>master</value>

    </property>

    <property>

        <name>yarn.resourcemanager.hostname.rm2</name>

        <value>slave1</value>

    </property>

    <!-- 指定zk集群地址 -->

    <property>

        <name>yarn.resourcemanager.zk-address</name>

        <value>master:2181,slave1:2181,slave2:2181</value>

    </property>

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

    <property>

        <name>yarn.log-aggregation-enable</name>

        <value>true</value>

    </property>

    <property>

        <name>yarn.log-aggregation.retain-seconds</name>

        <value>86400</value>

    </property>

    <!-- 启用自动恢复 -->

    <property>

        <name>yarn.resourcemanager.recovery.enabled</name>

        <value>true</value>

    </property>

    <!-- 制定resourcemanager的状态信息存储在zookeeper集群上 -->

    <property>

        <name>yarn.resourcemanager.store.class</name>

   <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

    </property>

保存并退出。

7,配置slave

vi slaves

进行如下配置:

master
slave1
slave2

8,创建必备目录

namenode、datanode、journalnode等存放数据的公共目录为/opt/local/src/hadoop/tmp;
 在master上执行如下:

mkdir -p /opt/local/src/hadoop/tmp/hdfs/nn
mkdir -p /opt/local/src/hadoop/tmp/hdfs/dn
mkdir -p /opt/local/src/hadoop/tmp/hdfs/jn
mkdir -p /opt/local/src/hadoop/tmp/logs

9,分发文件

scp -r /etc/profile root@slave1:/etc/
scp -r /etc/profile root@slave2:/etc/
scp -r /opt/local/src/hadoop root@slave1:/opt/local/src/
scp -r /opt/local/src/hadoop root@slave2:/opt/local/src/

三个节点修改权限和所属组

chown -R hadoop:hadoop /opt/local/src/

三个节点生效环境变量

source /etc/profile

Hadoop HA的启动

1,启动journalnode守护进程

hadoop-daemons.sh start journalnode
master: starting journalnode, logging to /usr/local/src/hadoop/logs/hadoop-root-journalnode-master.out
slave1: starting journalnode, logging to /usr/local/src/hadoop/logs/hadoop-root-journalnode-slave1.out
slave2: starting journalnode, logging to /usr/local/src/hadoop/logs/hadoop-root-journalnode-slave2.out

2,初始化namenode

hdfs namenode -format

截图

hadoop高可用 两种 hadoop高可用搭建_zookeeper

3,注册ZNode

hdfs zkfc -formatZK
20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/src/hadoop/lib:/usr/local/src/hadoop/lib/native

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-693.el7.x86_64

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Client environment:user.name=root

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Client environment:user.home=/root

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/src/hadoop/etc/hadoop

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave1:2181,slave2:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@27ce24aa

20/07/01 17:23:15 INFO zookeeper.ClientCnxn: Opening socket connection to server slave2/192.168.1.8:2181. Will not attempt to authenticate using SASL (unknown error)

20/07/01 17:23:15 INFO zookeeper.ClientCnxn: Socket connection established to slave2/192.168.1.8:2181, initiating session

20/07/01 17:23:15 INFO zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.1.8:2181, sessionid = 0x373099bfa8c0000, negotiated timeout = 5000

20/07/01 17:23:15 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ns in ZK.

20/07/01 17:23:15 INFO zookeeper.ZooKeeper: Session: 0x373099bfa8c0000 closed

20/07/01 17:23:15 WARN ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x373099bfa8c0000

20/07/01 17:23:15 INFO zookeeper.ClientCnxn: EventThread shut down

4,启动hdfs

start-dfs.sh
Starting namenodes on [master slave1]

master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-root-namenode-master.out

slave1: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-root-namenode-slave1.out

master: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-master.out

slave1: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-slave1.out

slave2: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-slave2.out

Starting journal nodes [master slave1 slave2]

master: journalnode running as process 1787. Stop it first.

slave2: journalnode running as process 1613. Stop it first.

slave1: journalnode running as process 1634. Stop it first.

Starting ZK Failover Controllers on NN hosts [master slave1]

slave1: starting zkfc, logging to /usr/local/src/hadoop/logs/hadoop-root-zkfc-slave1.out

master: starting zkfc, logging to /usr/local/src/hadoop/logs/hadoop-root-zkfc-master.out

5,启动yarn

start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-root-resourcemanager-master.out

master: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-root-nodemanager-master.out

slave1: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-root-nodemanager-slave1.out

slave2: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-root-nodemanager-slave2.out

6,同步master数据

复制namenode元数据到其它节点(在master节点执行)

scp -r /usr/local/src/hadoop/tmp/hdfs/nn/* slave1:/usr/local/src/hadoop/tmp/hdfs/nn/
scp -r /usr/local/src/hadoop/tmp/hdfs/nn/* slave2:/usr/local/src/hadoop/tmp/hdfs/nn/

7,在slave1上启动resourcemanager和namenode进程

yarn-daemon.sh start resourcemanager

hadoop-daemon.sh start namenode

截图: 

hadoop高可用 两种 hadoop高可用搭建_hadoop_02

8,启动 MapReduce任务历史服务器

yarn-daemon.sh start proxyserver
mr-jobhistory-daemon.sh start historyserver

9,查看端口和进程

jps

hadoop高可用 两种 hadoop高可用搭建_hdfs_03

master:50070

截图

hadoop高可用 两种 hadoop高可用搭建_hdfs_04

 

slave1:50070

hadoop高可用 两种 hadoop高可用搭建_zookeeper_05

 

 master:8088

hadoop高可用 两种 hadoop高可用搭建_hadoop_06