一、介绍

  本指南概述了YARN ResourceManager的高可用性,并详细介绍了如何配置和使用此功能。ResourceManager(RM)负责跟踪集群中的资源,以及调度应用程序(例如,MapReduce作业)。在Hadoop 2.4之前,ResourceManagerYARN集群中的单点故障。高可用性功能以Active / Standby ResourceManager对的形式添加冗余,以消除此单点故障。

ResourceManager 高可用配置 resourcemanager作用_YARN

二、故障转移

  ResourceManager HA通过主动/备用架构实现 - 在任何时间点,其中一个RM处于活动状态,并且一个或多个RM处于待机模式,等待活动发生任何事情时接管。转换为活动的触发器来自管理员(通过CLI)或启用自动故障转移时的集成故障转移控制器。

手动转换和故障转移
  如果未启用自动故障转移,则管理员必须手动将其中一个RM转换为活动。要从一个RM故障转移到另一个RM,它们应首先将Active-RM转换为待机状态,并将Standby-RM转换为Active。所有这些都可以使用“ yarn rmadmin ”CLI完成。

自动故障转移
  RM可以选择嵌入基于ZookeeperActiveStandbyElector来决定哪个RM应该是Active。当Active关闭或无响应时,另一个RM自动被选为Active,然后接管。请注意,不需要像HDFS那样运行单独的ZKFC守护程序,因为嵌入在RM中的ActiveStandbyElector充当故障检测器和领导者选择器而不是单独的ZKFC守护程序。

RM故障转移上的客户端,ApplicationMaster和NodeManager
  当存在多个RMz时,客户端和节点使用的配置(yarn-site.xml)应该列出所有RM。客户端,ApplicationMaster(AM)NodeManagers(NM)尝试以循环方式连接到RM,直到它们到达Active RM。如果活动停止,他们将恢复循环轮询,直到他们点击“新”活动。此默认重试逻辑实现为org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider。您可以通过实现org.apache.hadoop.yarn.client.RMFailoverProxyProvider并将yarn.client.failover-proxy-provider的值设置为类名来覆盖逻辑。

三、恢复以前的active-RM状态

  随着ResourceManager的重新启用,RM被晋升为活动状态负载RM内部状态,并继续从以前的主动离开的地方尽可能多地取决于RM重启功能操作。为先前提交给RM的每个托管应用程序生成一个新尝试。应用程序可以定期检查点,以避免丢失任何工作。必须从两个活动/备用RM中可见状态存储。目前,有两种用于持久性的RMStateStore实现 - FileSystemRMStateStoreZKRMStateStore。该ZKRMStateStore隐式允许在任何时间点对单个RM进行写访问,因此是在HA群集中使用的推荐存储。当使用ZKRMStateStore时,不需要单独的防护机制来解决潜在的裂脑情况,其中多个RM可能潜在地承担活动角色。使用ZKRMStateStore时,建议不要在Zookeeper群集上设置“ zookeeper.DigestAuthenticationProvider.superDigest”属性,以确保zookeeper admin无权访问YARN应用程序/用户凭据信息。

四、安装的节点服务器列表

节点服务器

NameNode01

NameNode02

DataNode

ZooKeeper

ZKFS

JournalNode

resourcemanager

nodemanager

node01




node02







node03






node04





五、部署
  1. 修改/opt/hadoop-3.1.2/etc/hadoop/mapred-site.xml文件
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!-- 指定mr框架为yarn方式 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <!-- 指定mapreduce jobhistory地址 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>node01:10020</value>
    </property>

    <!-- 任务历史服务器的web地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node01:19888</value>
    </property>

    <property>
      <name>mapreduce.application.classpath</name>
      <value>
		/opt/hadoop-3.1.2/etc/hadoop,
		/opt/hadoop-3.1.2/share/hadoop/common/*,
		/opt/hadoop-3.1.2/share/hadoop/common/lib/*,
		/opt/hadoop-3.1.2/share/hadoop/hdfs/*,
		/opt/hadoop-3.1.2/share/hadoop/hdfs/lib/*,
		/opt/hadoop-3.1.2/share/hadoop/mapreduce/*,
		/opt/hadoop-3.1.2/share/hadoop/mapreduce/lib/*,
		/opt/hadoop-3.1.2/share/hadoop/yarn/*,
		/opt/hadoop-3.1.2/share/hadoop/yarn/lib/*
      </value>
    </property>

</configuration>
  1. 修改/opt/hadoop-3.1.2/etc/hadoop/yarn-site.xml文件
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
    <!-- 开启RM高可用 -->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <!-- 指定RM的cluster id -->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>cluster1</value>
    </property>

    <!-- 指定RM的名字 -->
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <!-- 分别指定RM的地址 -->
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>node03</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>node04</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>node03:8088</value>
    </property>
    
    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>node04:8088</value>
    </property>
   
    <!-- 指定zk集群地址 -->
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>node02:2181,node03:2181,node04:2181</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>86400</value>
    </property>

    <!-- 启用自动恢复 -->
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

    <!-- 制定resourcemanager的状态信息存储在zookeeper集群上 -->
    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
    
    <!-- Whether virtual memory limits will be enforced for containers.  -->
    <property>
		<name>yarn.nodemanager.vmem-check-enabled</name>
		<value>false</value>
    </property>
	
    <property>
		<name>yarn.nodemanager.vmem-pmem-ratio</name>
	  	<value>5</value>
    </property>

</configuration>
  1. /opt/hadoop-3.1.2/etc/hadoop/hadoop-env.sh添加角色
# JAVA
export JAVA_HOME=/usr/java/jdk1.8.0_201-amd64

# HDFS
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
# export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_ZKFC_USER=root
export HDFS_JOURNALNODE_USER=root
# export HADOOP_SECURE_DN_USER=hdfs
export HDFS_DATANODE_SECURE_USER=hdfs

#YARN
export YARN_RESOURCEMANAGER_USER=root
# export HADOOP_SECURE_DN_USER=yarn 
export HDFS_DATANODE_SECURE_USER=yarn
export YARN_NODEMANAGER_USER=root
  1. 向其他节点(node02,node03,node04)发送配置信息
scp mapred-site.xml yarn-site.xml node02:`pwd`
scp mapred-site.xml yarn-site.xml node03:`pwd`
scp mapred-site.xml yarn-site.xml node04:`pwd`
六、启动

Zookeeper -> JournalNode -> 格式化NameNode ->创建命名空间(zkfc) -> NameNode -> DataNode -> ResourceManager -> NodeManager。

  1. 启动YARN 在主备 resourcemanager 中随便选择一台进行启动
  2. start-yarn.sh
  3. 若备用节点的 resourcemanager没有启动起来,则手动启动起来: yarn-daemon.sh start resourcemanager
  4. 启动 mapreduce 任务历史服务器
mr-jobhistory-daemon.sh start historyserver
七、查看启动状态
  1. 查看Java运行中的进程
# node01节点
[root@node01 hadoop]# jps
21715 NameNode
22646 Jps
21993 JournalNode
2586 JobHistoryServer
22173 DFSZKFailoverController

# node02节点
[root@node02 hadoop]# jps
15730 DataNode
15667 NameNode
15989 NodeManager
15798 JournalNode
15863 DFSZKFailoverController
1449 QuorumPeerMain
16252 Jps

# node03节点
[root@node03 ~]# jps
11362 JournalNode
11939 Jps
11555 NodeManager
11300 DataNode
1419 QuorumPeerMain
11471 ResourceManager

# node04节点
[root@node04 ~]# jps
9670 DataNode
1418 QuorumPeerMain
10027 Jps
9742 ResourceManager
9807 NodeManager
  1. WEB界面进行查看
    a. HDFS: http://node01:50070
  2. ResourceManager 高可用配置 resourcemanager作用_xml_02

  3. b. YARN: http://node03:8088/cluster
  4. ResourceManager 高可用配置 resourcemanager作用_YARN_03


  5. ResourceManager 高可用配置 resourcemanager作用_hadoop3_04