1. 条件准备
软件准备:
Centos 7.2 64位操作系统,jdk1.8 64位, hadoop2.7.9,zookeeper 3.4.9,Hive 2.2
硬件条件:(Vmware虚拟机)
1台主节点机器, 配置cpu 1个 内存2g 硬盘50G
2台从节点机器,其中一台内存为2G,其他配置一样
各个节点IP如下:
服务器名字 | Ip地址 | 备注(为方便操作将hostname改为如下) |
hadoop001 | 192.168.0.211 | hadoop001 |
hadoop002 | 192.168.0.212 | hadoop002 |
hadoop003 | 192.168.0.213 | hadoop003 |
搭建预期结构
hostname | 软件 | 进程 |
hadoop001 | JDK,hadoop | Namenode, ZKFC , resourcemanager |
hadoop002 | JDK,hadoop | Zookeer, datanode, journalnode, quorumpeermain, nodemanager |
hadoop003 | JDK,hadoop | Zookeer, datanode, journalnode, quorumpeermain, nodemanager |
2. 服务器准备
安装前需要安装好vmvare虚拟机,搭建好Linux服务器集群,新建用户hadoop,以下操作非必要都在hadoop用户下操作
2.1 关闭服务器防火墙
Centos7 默认用的是firewall作为防火墙,因此需要关闭,因此执行以下红色字体
查看已经开放的端口:firewall-cmd--list-ports
开启端口:firewall-cmd --zone=public--add-port=80/tcp --permanent
命令含义:
–zone #作用域
–add-port=80/tcp #添加端口,格式为:端口/通讯协议
–permanent #永久生效,没有此参数重启后失效
重启firewall:firewall-cmd --reload
停止firewall:systemctl stop firewalld.service
禁止firewall开机启动:systemctl disable firewalld.service
查看默认防火墙状态(关闭后显示notrunning,开启后显示running):firewall-cmd –state
2.2 修改主机名
通过xshell,远程登录主机192.168.0.211,登录成功后,然后执行命令:
vim /etc/hostname
接着执行命令:hostname hadoop001
修改完成211服务器主机名,分别以上方式修改另外服务器主机名。
2.3 修改主机hosts
登录到hadoop001服务器,执行以下命令:
vim /etc/hosts 进入文件编辑,加入以下内容:
192.168.0.211 hadoop001
192.168.0.212 hadoop002
192.168.0.213 hadoop003
然后保存退出。然后继续执行命令:
分别执行命令远程拷贝hosts文件到各个节点,覆盖掉本身的hosts文件。
scp –r /etc/hosts hadoop@192.168.0.212:/etc/
scp –r /etc/hosts hadoop@192.168.0.213:/etc/
注意:1.如果ssh命令不能用,可能机器本身没有ssh,需要安装:yum install sshpass.x86_64
2.确保各个节点能相互ping通,如果ping不通,查看下防火墙是否关闭。
2.4 ssh免密登录设置
登录到hadoop001服务器hadoop用户下执行命令:
ssh-keygen –t rsa
进入目录/home/hadoop/.ssh/,发现多了三文件:authorized_keys,id_rsa,id_rsa.pub
执行命令: cp ~/.ssh/id_rsa.pub ~/.ssh.authorized_keys
验证执行:ssh localhost,查看本节点是否可以无密码登录。
将授权文件拷贝到其他节点,执行命令:
scp authorized_keyshadoop@hadoop002:~/.ssh/
scp id_rsahadoop@hadoop002:~/.ssh/
scp id_rsa.pubhadoop@hadoop002:~/.ssh/
同样其他节点也执行这样的操作。执行完毕后,测试一下。节点之间能否相互之间无密码登录。
2.5 ntp时间同步配置
(1)首先安装ntp服务
在linux的root用户下执行以下命令 安装ntp服务
yuminstall ntp –y
(2)
我们需要在linux集群中找到一台作为ntp服务器的server,其他机器则为ntp的client,因此,在server服务器上修改一下配置文件
执行 vim/etc/ntp.conf
# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help
driftfile /var/lib/ntp/ntp.drift
# Enable this if you want statistics to be logged.
#statsdir /var/log/ntpstats/
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
# Specify one or more NTP servers.
# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board
# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for
# more information.
#linux自带的时间同步,需要注释掉
#pool 0.ubuntu.pool.ntp.org iburst
#pool 1.ubuntu.pool.ntp.org iburst
#pool 2.ubuntu.pool.ntp.org iburst
#pool 3.ubuntu.pool.ntp.org iburst
# Use Ubuntu's ntp server as a fallback.
#pool ntp.ubuntu.com
# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details. The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.
# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery limited
restrict -6 default kod notrap nomodify nopeer noquery limited
# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1
#因为是内网,所以用本地时间做为服务器时间,注意这里不是127.0.0.1
server 127.127.1.0
fudge 127.127.1.0 stratum 8
#开放192.168.0.0 整个网段,即在这个网段的所有机器都可以使用 214 作为时间同步服务端
restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap
# Needed for adding pool entries
restrict source notrap nomodify noquery
# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
#restrict 192.168.123.0 mask 255.255.255.0 notrust
# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255
# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines. Please do this only if you trust everybody on the network!
#disable auth
#broadcastclient
#Changes recquired to use pps synchonisation as explained in documentation:
#http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm#AEN3918
#server 127.127.8.1 mode 135 prefer # Meinberg GPS167 with PPS
#fudge 127.127.8.1 time1 0.0042 # relative to PPS for my hardware
#server 127.127.22.1 # ATOM(PPS)
#fudge 127.127.22.1 flag3 1 # enable PPS API
|
设置完成退出保存,然后重启ntp服务
执行 service ntp restart
(3)
同样的server端已经配置好了,client的配置相对简单
执行 vim /etc/ntp.conf
# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help
driftfile /var/lib/ntp/ntp.drift
# Enable this if you want statistics to be logged.
#statsdir /var/log/ntpstats/
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
# Specify one or more NTP servers.
# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board
# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for
# more information.
#linux自带的时间同步,需要注释掉
#pool 0.ubuntu.pool.ntp.org iburst
#pool 1.ubuntu.pool.ntp.org iburst
#pool 2.ubuntu.pool.ntp.org iburst
#pool 3.ubuntu.pool.ntp.org iburst
# Use Ubuntu's ntp server as a fallback.
#pool ntp.ubuntu.com
# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details. The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.
# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery limited
restrict -6 default kod notrap nomodify nopeer noquery limited
# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1
#增加214作为时间服务器即可
server 192.168.0.214
# Needed for adding pool entries
restrict source notrap nomodify noquery
# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
#restrict 192.168.123.0 mask 255.255.255.0 notrust
# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255
# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines. Please do this only if you trust everybody on the network!
#disable auth
#broadcastclient
#Changes recquired to use pps synchonisation as explained in documentation:
#http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm#AEN3918
#server 127.127.8.1 mode 135 prefer # Meinberg GPS167 with PPS
#fudge 127.127.8.1 time1 0.0042 # relative to PPS for my hardware
#server 127.127.22.1 # ATOM(PPS)
#fudge 127.127.22.1 flag3 1 # enable PPS API
|
退出保存,重启ntp服务
执行 service ntp restart
(4) 查看ntp服务是否配置完成
在ntp sever上执行 ntpq-p
在ntp client上执行 ntpq–p
即完成ntp服务的配置
2.6 上传安装文件
通过WinSCP软件,登录主机hadoop001。实现本地机器与远程机器的文件共享。将本机下的:hadoop2.7.4和jdk 1.8和zookeeper3.4.9 和hive2.2复制到hadoop001机器/opt/soft目录下。
注意:可以通过wincp软件进行本地拷贝
3. Zookeeper集群搭建
3.1 zookeeper安装包解压
首先将zookeeper安装包拷贝到hadoop001服务器/opt/soft目录下,然后执行解压命令
tar -zxvf zookeeper-3.4.9.tar.gz
解压完成即得到 zookeeper安装包
3.2 zookeeper配置安装
切换目录: cd /opt/soft/zookeeper-3.4.9/conf/
执行命令:cpzoo_sample.cfg zoo.cfg
复制一份zookeeper的配置文件,以便于进行配置
执行以下命令编辑文件:vim zoo.cfg
加入以下参数:
dataDir=/opt/data/zookeeper
dataLogDir=/opt/data/zookeeper/logs
在文件最后添加:
server.1=hadoop001:2888:3888
server.2=hadoop002:2888:3888
server.3=hadoop003:2888:3888
具体参数如下图:
然后退出保存。
然后创建文件夹,执行以下命令:
mkdir -p/opt/data/zookeeper
mkdir -p/opt/data/zookeeper/logs
创建zookeeper的data存放目录
然后在创建zookeeper的myid空文件:
touch/opt/data/zookeepe/myid
最后向该文件写入ID
echo 1> /opt/data/zookeepe/myid
3.3 将配置好的zookeeper拷贝到其他节点
scp -r zookeeper-3.4.9 hadoop002:/opt/soft/
scp -r zookeeper-3.4.9 hadoop003:/opt/soft/
然后分别在每台机器上执行
然后创建文件夹,执行以下命令:
mkdir -p /opt/data/zookeeper
mkdir -p/opt/data/zookeeper/logs
创建zookeeper的data存放目录
然后在创建zookeeper的myid空文件:
touch/opt/data/zookeepe/myid
最后向该文件写入ID
hadoop002:echo 2> /opt/data/zookeepe/myid
hadoop003:echo 3> /opt/data/zookeepe/myid
3.4 修改环境变量
在安装zookeeper服务器上的hadoop用用户下 执行:
cd /home/hadoop vim .bash_profile
把以下内容加入到其中
exportZOOKEEPER_HOME=/opt/soft/zookeeper-3.4.9/
export PATH=$PATH:$ZOOKEEPER_HOME/bin
退出保存,然后执行 source .bash_profile 使其生效
3.5 zookeeper启动与测试
在每台机器上执行以下命令
zkServer.shstart
然后在执行 zkServer.sh status .有一个leader,两个follower,即正常启动
3.6 修改Zookeeper日志输出路径
如果不做修改,默认zookeeper的日志输出信息都打印到了zookeeper.out文件中,这样输出路径和大小没法控制,因为日志文件没有轮转。所以需要修改日志输出方式。具体操作如下:
1、修改$ZOOKEEPER_HOME/bin目录下的zkEnv.sh文件,ZOO_LOG_DIR指定想要输出到哪个目录,ZOO_LOG4J_PROP,指定INFO,ROLLINGFILE的日志APPENDER.
2、修改$ZOOKEEPER_HOME/conf/log4j.properties文件的:zookeeper.root.logger的值与前一个文件的ZOO_LOG4J_PROP保持一致,该日志配置是以日志文件大小轮转的,如果想要按照天轮转,可以修改为DaliyRollingFileAppender.
4. Hadoop集群搭建
4.1 hadoop压缩包解压
登录到hadoop001服务器上,移动hadoop安装包到/opt/soft下
执行解压命令: tar -zxvf hadoop-2.7.6.tar.gz 解压文件
然后执行创建文件夹命令:
mkdir -p /opt/data/hadoop/tmp
mkdir -p /opt/data/hadoop/dfs/data
mkdir -p /opt/data/hadoop/ dfs/name
4.2 hadoop文件配置
4.2.1 配置 JAVA_HOME
进入目录:cd /opt/soft/hadoop-2.7.6/etc/hadoop
配置文件:hadoop-env.sh,打开它修改JAVA_HOME值为(
export JAVA_HOME= /opt/soft/jdk1.8.0_171
export HADOOP_LOG_DIR=/opt/data/hadoop/logs
在以下yarn-size.xml配置的文件目录
配置文件:yarn-env.sh,打开它修改JAVA_HOME值为(
export JAVA_HOME= /opt/soft/jdk1.8.0_171
export HADOOP_LOG_DIR=/opt/data/hadoop/logs
4.2.2 配置slaves
配置文件:slaves,打开它写入内容(写入nodename结点即可):
执行命令vim slaves 加入以下参数:
hadoop001
hadoop002
hadoop003
退出保存
4.2.3 配置core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
beh</value> ###hdfs的命名空间
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
/opt/data/hadoop/tmp</value> ###自己创建的临时目录
<description>Abase for other temporary directories.</description>
</property>
<property>
### zookeeper集群
hadoop001:2181,hadoop002:2181,hadoop003:2181</value> -
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
|
注:汉字部分不要加入文件,标红参数根据需要修改
4.2.4 配置hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>beh</value> ###命名空间和cor-site.xml
</property>
<property>
<name>dfs.ha.namenodes.beh</name>
<value>hadoop001,hadoop002</value> ###主节点主机名
</property>
<property>
<name>dfs.namenode.rpc-address.beh.hadoop001</name>
<value>hadoop001:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.beh.hadoop001</name>
<value>hadoop001:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.beh.hadoop002</name>
<value>hadoop002:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.beh.hadoop002</name>
<value>hadoop002:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name> ###与zookeeper保持一致
<value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/beh</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/data/hadoop/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.beh</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/hadoop/.ssh/id_rsa</value> ###无密码登录一致,一般默认
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/data/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/data/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
</configuration>
|
注:汉字部分不要加入文件,标红参数根据需要修改
4.2.5 配置mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>
|
4.2.6 配置yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value> ##两个yarn节点
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop001</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop002</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value> ##这个是当前机器yarn节点,在热备需要改为rm2
<description>If we want to launch more than one RM in single node, we need this configuration</description>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>beh-yarn</value> ##与之前的命名空间保持一致
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop001:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop001:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop001:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop001:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>hadoop001:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>hadoop001:23142</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop002:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop002:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop002:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop002:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>hadoop002:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>hadoop002:23142</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/opt/data/hadoop/yarn</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/opt/data/hadoop/logs</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
<description>Optional setting. The default value is /yarn-leader-election</description>
</property>
</configuration>
|
注:汉字部分不要加入文件,标红参数根据需要修改,文件夹需要自己创建
4.3 hadoop分发其他机器
执行以下命令将安装包分发
scp -r hadoop-2.7.6 hadoop002:/opt/soft/
scp -r hadoop-2.7.6 hadoop003:/opt/soft/
4.4 Hadoop环境变量配置
在每台服务器上的hadoop用用户下 执行:
cd/home/hadoop vim .bash_profile
把以下内容加入到其中
export HADOOP_HOME=/opt/soft/hadoop-2.7.6/
exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
退出保存,然后执行 source .bash_profile 使其生效
4.5 启动测试集群
4.5.1 启动zookeeper集群
分别在hadoop001,hadoop002,hadoop003上执行
zkServer.sh start 启动zookeeper
然后查看状态 ./zkServer.sh status
(一个leader,两个follower)zookeeper正常启动
4.5.2 格式化HDFS的Zookeeper存储目录
在 hadoop001上执行( 只需在一个 zookeeper 节点执行即可 ):hdfs zkfc –formatZK
4.5.3 启动 JournalNode 集群
所有 journalnode 节点上分别执行:
hadoop-daemon.shstart journalnode
4.5.4 格式化并启动第一个 NameNode
选择 hadoop001
##格式化当前节点的 namenode 数据
hdfs namenode -format
##格式化 journalnode 的数据,这个是 ha 需要做的
hdfs namenode -initializeSharedEdits
##启动当前节点的 namenode 服务
hadoop-daemon.sh start namenode
4.5.5 格式化并启动第二个 NameNode
在 hadoop002执行:
##启 hadoop001已经格式化过,然后同步至 hadoop002
hdfs namenode -bootstrapStandby
##启动当前节点的 namenode 服务
hadoop-daemon.sh start namenode
4.5.6 启动所有DataNode
#每个 datanode 上执行hadoop-daemon.sh start datanode
4.5.7 启动 ZooKeeperFailoverController
所有 namenode 节点分别执行:
hadoop-daemon.sh start zkfc
4.5.8 登陆 namenode 服务器 web 端查看服务器状态
此时登陆 http://hadoop001:50070与 http://haoop002:50070
其中一个为 active 另一个为 standby 状态。
这里如果 PC 连接服务器使用浏览器需要输入IP_ADDRESS:50070来进行访问
4.5.9 启动YARN
在hadoop001上执行
start-yarn.sh
4.5.10 hadoop002 上启动 resourcemanager
yarn-daemon.sh start resourcemanager
4.5.11 登陆 resourcemanager 服务器 web 端查看服务器状态
此时登陆 http://hadoop001:23188与 http://haoop002:23188
其中一个为 active 另一个为 standby 状态。活跃节点可以正常访问,备用节点会自动跳转至活跃节
点的 web 地址。
http://resourcemanager_ipaddress:23188
这里如果 PC 连接服务器使用浏览器需要输入IP_ADDRESS:23188来进行访问。
4.5.12 测试集群性能
测试集群是否可能,热备是否切换等性能
5. Hive集群搭建
5.1 hive压缩包解压配置环境变量
登录到hadoop001服务器上,移动hadoop安装包到/opt/soft下
执行解压命令: tar -zxvf apache-hive-2.2.0-bin.tar.gz解压文件
在每台服务器上的hadoop用用户下 执行:
cd /home/hadoop vim .bash_profile
把以下内容加入到其中
export HIVE_HOME=/opt/soft/hive-2.2.0
export HIVE_CONF_DIR=$HIVE_HOME/conf
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
export PATH=$PATH:$HIVE_HOME/bin
5.2 安装mysql
配置MySQL(注:切换到root用户)
卸载CentOS自带的MySQL
rpm -qa | grep mysql
rpm -e mysql-libs-5.1.66-2.el6_3.i686--nodeps
yum -y install mysql-server
初始化MySQL
(1)修改mysql的密码(root权限执行)
cd /usr/bin
./mysql_secure_installation
(2)输入当前MySQL数据库的密码为root, 初始时root是没有密码的, 所以直接回车
Enter current password for root (enter fornone):
(3)设置MySQL中root用户的密码(应与下面Hive配置一致,下面设置为123456)
Set root password? [Y/n] Y
New password:
Re-enter new password:
Password updated successfully!
Reloading privilege tables..
... Success!
(4)删除匿名用户
Remove anonymous users? [Y/n] Y
... Success!
(5)是否不允许用户远程连接,选择N
Disallow root login remotely? [Y/n] N
... Success!
(6)删除test数据库
Remove test database and access to it?[Y/n] Y
Dropping test database...
... Success!
Removing privileges on test database...
... Success!
(7)重装
Reload privilege tables now? [Y/n] Y
... Success!
(8)完成
All done! If you've completed all of the above steps, your MySQL
installation should now be secure.
Thanks for using MySQL!
(9)登陆mysql
mysql -uroot -p
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%'IDENTIFIED BY '123' WITH GRANT OPTION;
FLUSH PRIVILEGES;
exit;
至此MySQL配置完成
5.3 配置hive
5.3.1 编辑hive-env.xml
文件
将hive-env.sh.template
文件复制为hive-env.sh
, 编辑hive-env.xml
文件
JAVA_HOME=/opt/soft/jdk1.8.0_171
HADOOP_HOME=/opt/soft/hadoop-2.7.6
HIVE_HOME=/opt/soft/hive-2.2.0
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_AUX_JARS_PATH=$SPARK_HOME/lib/spark-assembly-1.6.0-hadoop2.6.0.jar
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$HADOOP_HOME/lib:$HIVE_HOME/lib
export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS"
|
5.3.2 编辑hive-site.xml文件
配置hive-site.xml
文件, 将hive-default.xml.template
文件拷贝为hive-default.xml
, 并编辑hive-site.xml
文件(删除所有内容,只留一个<configuration></configuration>
)
配置项参考:
hive.server2.thrift.port– TCP的监听端口,默认为10000。
hive.server2.thrift.bind.host– TCP绑定的主机,默认为localhost
hive.server2.thrift.min.worker.threads– 最小工作线程数,默认为5。
hive.server2.thrift.max.worker.threads – 最小工作线程数,默认为500。
hive.server2.transport.mode – 默认值为binary(TCP),可选值HTTP。
hive.server2.thrift.http.port– HTTP的监听端口,默认值为10001。
hive.server2.thrift.http.path – 服务的端点名称,默认为cliservice。hive.server2.thrift.http.min.worker.threads– 服务池中的最小工作线程,默认为5。hive.server2.thrift.http.max.worker.threads– 服务池中的最小工作线程,默认为500。
|
Hive-site文件
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop003:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateColumns</name>
<value>true</value>
</property>
<!-- 设置 hive仓库的HDFS上的位置 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive</value>
<description>location of default database for the warehouse</description>
</property>
<!--资源临时文件存放位置 -->
<property>
<name>hive.downloaded.resources.dir</name>
<value>/opt/data/hive/tmp/resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<!-- Hive在0.9版本之前需要设置hive.exec.dynamic.partition为true, Hive在0.9版本之后默认为true -->
<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<!-- 修改日志位置 -->
<property>
<name>hive.exec.local.scratchdir</name>
<value>/opt/data/hive/tmp/HiveJobsLog</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/opt/data/hive/tmp/ResourcesLog</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/opt/data/hive/tmp/HiveRunLog</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/opt/data/hive/tmp/OpertitionLog</value>
<description>Top level directory where operation tmp are stored if logging functionality is enabled</description>
</property>
<!-- 配置HWI接口 -->
<property>
<name>hive.hwi.war.file</name>
<value>/opt/soft/hive-2.2.0/lib/hive-hwi-2.2.0.jar</value>
<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}.</description>
</property>
<property>
<name>hive.hwi.listen.host</name>
<value>hadoop003</value>
<description>This is the host address the Hive Web Interface will listen on</description>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
<description>This is the port the Hive Web Interface will listen on</description>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop003</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10001</value>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
</property>
<!-- HiveServer2的WEB UI -->
<property>
<name>hive.server2.webui.host</name>
<value>hadoop003</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
<property>
<name>hive.scratch.dir.permission</name>
<value>755</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>hive.auto.convert.join</name>
<value>false</value>
</property>
<property>
<name>spark.dynamicAllocation.enabled</name>
<value>true</value>
<description>动态分配资源</description>
</property>
<!-- 使用Hive on spark时,若不设置下列该配置会出现内存溢出异常 -->
<property>
<name>spark.driver.extraJavaOptions</name>
<value>-XX:PermSize=128M -XX:MaxPermSize=512M</value>
</property>
</configuration>
|
5.4 配置hive-config.sh文件
配置$HIVE_HOME/conf/hive-config.sh文件
## 增加以下三行
exportJAVA_HOME=/opt/soft/jdk1.8.0_171
exportHIVE_HOME=/opt/soft/hive-2.2.0
exportHADOOP_HOME=/opt/soft/hadoop-2.7.6
## 修改下列该行
HIVE_CONF_DIR=$HIVE_HOME/conf
5.5 拷贝JDBC包
将JDBC的jar包放入$HIVE_HOME/lib
目录下
cp /home/hadoop/mysql-connector-java-5.1.6-bin.jar /opt/soft/hive-2.2.0/lib/
5.6 拷贝jline扩展包
将$HIVE_HOME/lib
目录下的jline-2.12.jar
包拷贝到$HADOOP_HOME/share/hadoop/yarn/lib
目录下,并删除$HADOOP_HOME/share/hadoop/yarn/lib
目录下旧版本的jline
包
5.7 拷贝tools.jar包
复制$JAVA_HOME/lib
目录下的tools.jar
到$HIVE_HOME/lib
下
5.8 执行初始化Hive操作
选用MySQLysql和Derby二者之一为元数据库
注意:先查看MySQL中是否有残留的Hive元数据,若有,需先删除
schematool -dbType mysql -initSchema ## MySQL作为元数据库
其中mysql表示用mysql做为存储hive元数据的数据库, 若不用mysql做为元数据库, 则执行
schematool -dbType derby -initSchema ## Derby作为元数据库
脚本hive-schema-1.2.1.mysql.sql会在配置的Hive元数据库中初始化创建表
5.9 启动Metastore服务
执行Hive
前, 须先启动metastore
服务, 否则会报错
./hive
--service metastore
然后打开另一个终端窗口,之后再启动Hive
进程
5.10 测试
hive
show databases;
show tables;
create table book (id bigint, name string) row format delimited fields terminated by '\t';
select * from book;
select count(*) from book;