前言

Hadoop的部署与安装是Hadoop研究过程中必定不可缺少的一环. Hadoop部署方式分三种,Standalone mode、Pseudo-Distributed mode、Cluster mode,其中前两种都是在单机部署。本章主要讲述如何在Standalone modePseudo-Distributed mode的部署方式.

Hadoop的基本主件主要包括:

  • HDFS (NameNode / SecondyNameNode / DataNode)
  • YARN( ResourceManager / NodeManager)

下面我们将主要安装如下几个组件.


准备工作

  • 前置条件 JDK
  • 下载Hadoop的包,并解压.
  • 配置Haddoop环境变量.
export HADOOP_HOME_2_7_5=/Users/Sean/Software/hadoop/hadoop-2.7.5
export PATH=$PATH:$HADOOP_HOME_2_7_5/bin

Standalone mode

  • standalone mode(本地单独模式)
    这种模式,仅1个节点运行1个java进程,主要用于调试。单结点模式,无需进行任何单安装操作.
# 使用测试用例 运行WordCount
# 创建input命令
mkdir input
# 将文件拷贝到指定目录
cp etc/hadoop/*.xml input
# 运行Jar包(注意版本号) 此job是使用hadoop自带的样例,在input中统计含有dfs的字符串。
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar grep input output 'dfs[a-z.]+'
cat output/*

Pseudo-Distributed mode(伪分布式模式)

在伪分布式模式中,我们需要在一台机器上安装如下内容:

  • HDFS中的 NameNodeDateNode
  • YARRN中的 ResourceManagerNodeManager

好的,我们下面正式进入安装过程.

安装HDFS
  • 修改配置文件(core-site.xmlhdfs-site.xml)

core-site.xml (配置文件访问地址)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
	<property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <!-- 指定hadoop运行时产生文件的存储目录 -->
     <property>
     <name>hadoop.tmp.dir</name>
     <value>/Users/Sean/Software/hadoop/hadoop-2.7.5/tmp</value>
     </property>
</configuration>

hdfs-site.xml (配置拷贝次数)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
  • 配置本机免密
# ssh-keygen -t rsa
# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • 格式化文件系统后, 启动系统
# 格式化文件系统
 hdfs namenode -format
# 启动系统
 sbin/start-dfs.sh
  • 启动日志小解
localhost:sbin Sean$ ./start-dfs.sh
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/25 11:09:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
# 启动本地namenode
Starting namenodes on [localhost]
localhost: starting namenode, logging to /Users/Sean/Software/hadoop/hadoop-2.7.5/logs/hadoop-Sean-namenode-localhost.out
# 启动本地datanode
localhost: starting datanode, logging to /Users/Sean/Software/hadoop/hadoop-2.7.5/logs/hadoop-Sean-datanode-localhost.out
# 启动本地secondarynode
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /Users/Sean/Software/hadoop/hadoop-2.7.5/logs/hadoop-Sean-secondarynamenode-localhost.out
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/25 11:09:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  • 启动后
localhost:sbin Sean$ jps
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
3233 DataNode
3147 NameNode
3692 Jps
3340 SecondaryNameNode
localhost:hadoop-2.7.5 Sean$ echo "1" >> 1.log
localhost:hadoop-2.7.5 Sean$ hdfs dfs -put 1.log /
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/25 11:55:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
localhost:hadoop-2.7.5 Sean$ hdfs dfs -ls /
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/25 11:55:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 Sean supergroup          2 2019-03-25 11:55 /1.log
  • 注: 生成的日志在文件安装目录的logs文件夹下方.例如:namenode的日志为hadoop-Sean-namenode-localhost.log
安装YARN
  • 修改配置文件

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
	<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
  • 启动脚本
localhost:sbin Sean$ jps
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
4048 Jps
3233 DataNode
3928 ResourceManager
4010 NodeManager
3147 NameNode
3340 SecondaryNameNode
  • 验证 (http://localhost:8088)
  • 使用YARN
# 创建文件夹
hdfs dfs -mkdir -p  /user/Sean
# 拷贝
hdfs dfs -put etc/hadoop /user/Sean/input

# 执行wordcount
localhost:hadoop-2.7.5 Sean$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount /user/sean/input  output
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/25 13:27:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/25 13:27:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/03/25 13:27:04 INFO input.FileInputFormat: Total input paths to process : 30
19/03/25 13:27:04 INFO mapreduce.JobSubmitter: number of splits:30
19/03/25 13:27:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553491599720_0001
19/03/25 13:27:05 INFO impl.YarnClientImpl: Submitted application application_1553491599720_0001
19/03/25 13:27:05 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1553491599720_0001/
19/03/25 13:27:05 INFO mapreduce.Job: Running job: job_1553491599720_0001
19/03/25 13:27:14 INFO mapreduce.Job: Job job_1553491599720_0001 running in uber mode : false
19/03/25 13:27:14 INFO mapreduce.Job:  map 0% reduce 0%
19/03/25 13:27:28 INFO mapreduce.Job:  map 13% reduce 0%
19/03/25 13:27:29 INFO mapreduce.Job:  map 20% reduce 0%
19/03/25 13:27:40 INFO mapreduce.Job:  map 27% reduce 0%
19/03/25 13:27:41 INFO mapreduce.Job:  map 37% reduce 0%
19/03/25 13:27:47 INFO mapreduce.Job:  map 37% reduce 12%
19/03/25 13:27:49 INFO mapreduce.Job:  map 40% reduce 12%
19/03/25 13:27:50 INFO mapreduce.Job:  map 43% reduce 13%
19/03/25 13:27:51 INFO mapreduce.Job:  map 53% reduce 13%
19/03/25 13:27:53 INFO mapreduce.Job:  map 53% reduce 18%
19/03/25 13:27:59 INFO mapreduce.Job:  map 60% reduce 18%
19/03/25 13:28:02 INFO mapreduce.Job:  map 63% reduce 20%
19/03/25 13:28:03 INFO mapreduce.Job:  map 70% reduce 20%
19/03/25 13:28:05 INFO mapreduce.Job:  map 70% reduce 23%
19/03/25 13:28:12 INFO mapreduce.Job:  map 77% reduce 23%
19/03/25 13:28:14 INFO mapreduce.Job:  map 77% reduce 26%
19/03/25 13:28:16 INFO mapreduce.Job:  map 87% reduce 26%
19/03/25 13:28:17 INFO mapreduce.Job:  map 87% reduce 29%
19/03/25 13:28:23 INFO mapreduce.Job:  map 93% reduce 29%
19/03/25 13:28:25 INFO mapreduce.Job:  map 100% reduce 29%
19/03/25 13:28:27 INFO mapreduce.Job:  map 100% reduce 100%
19/03/25 13:28:28 INFO mapreduce.Job: Job job_1553491599720_0001 completed successfully
19/03/25 13:28:28 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=75804
		FILE: Number of bytes written=3927609
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=82078
		HDFS: Number of bytes written=36903
		HDFS: Number of read operations=93
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters
		Launched map tasks=30
		Launched reduce tasks=1
		Data-local map tasks=30
		Total time spent by all maps in occupied slots (ms)=325766
		Total time spent by all reduces in occupied slots (ms)=55271
		Total time spent by all map tasks (ms)=325766
		Total time spent by all reduce tasks (ms)=55271
		Total vcore-milliseconds taken by all map tasks=325766
		Total vcore-milliseconds taken by all reduce tasks=55271
		Total megabyte-milliseconds taken by all map tasks=333584384
		Total megabyte-milliseconds taken by all reduce tasks=56597504
	Map-Reduce Framework
		Map input records=2109
		Map output records=8027
		Map output bytes=106964
		Map output materialized bytes=75978
		Input split bytes=3594
		Combine input records=8027
		Combine output records=4043
		Reduce input groups=1585
		Reduce shuffle bytes=75978
		Reduce input records=4043
		Reduce output records=1585
		Spilled Records=8086
		Shuffled Maps =30
		Failed Shuffles=0
		Merged Map outputs=30
		GC time elapsed (ms)=2980
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=6136266752
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=78484
	File Output Format Counters
		Bytes Written=36903

hadoop主节点配置文件 hadoop单节点搭建_mapreduce_02


Q&A

文件没有权限?

2019-03-25 11:37:38,713 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /private/tmp/hadoop-Sean/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:382)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:233)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:984)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:686)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:586)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:820)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:804)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1516)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1582)
2019-03-25 11:37:38,720 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50070
2019-03-25 11:37:38,826 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2019-03-25 11:37:38,827 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2019-03-25 11:37:38,827 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2019-03-25 11:37:38,827 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /private/tmp/hadoop-Sean/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:382)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:233)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:984)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:686)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:586)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:820)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:804)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1516)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1582)
2019-03-25 11:37:38,830 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2019-03-25 11:37:38,832 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/

解决措施(core-site更改地址 或 设置目录权限):

# 设置地址
<!--core-site.xml-->
  <property>
      <name>hadoop.tmp.dir</name>
      <value>/Users/Sean/Software/hadoop/hadoop-2.7.5/tmp</value>
    </property>
 # 创建文件夹
 mkdir -p /Users/Sean/Software/hadoop/hadoop-2.7.5/tmp/dfs/name
文件系统未初始化java.io.IOException: NameNode is not formatted. 解决办法:hdfs namenode -format
Exit code: 127
 Stack trace: ExitCodeException exitCode=127:
 解决办法: 通过查看localhost:container_1553486900247_0023_01_000001 Sean$ cat stderr
/bin/bash: /bin/java: No such file or directory
localhost:container_1553486900247_0023_01_000001 Sean$ pwd
/Users/Sean/Software/hadoop/hadoop-2.7.5/logs/userlogs/application_1553486900247_0023/container_1553486900247_0023_01_000001
方法1: 通过ln -s /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/bin/java /bin/java配置软链接
 方法2: 在/etc/hadoop/hadoop-env.sh配置export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home.
 并且注释原来的#export JAVA_HOME=${JAVA_HOME}.