从网上搜索配置教程,以下博文的作者写的非常好,完全可以实施,便将内容拷贝至此,加上适当的注解,以作备用。 http://www.micmiu.com/bigdata/hadoop/hadoop2x-single-node-setup/

本文是详细记录Hadoop 2.2.0 在Mac OSX系统下单节点安装配置启动的详细步骤,并且演示运行一个简单的job。目录结构如下:

  • 基础环境配置
  • Hadoop安装配置
  • 启动及演示

[一]、基础环境配置

1、OS: Mac OSX 10.9.1

2、JDK 1.6.0_65

不管是安装包还是自己编译源码安装都可以,这个就不多介绍了,搜索下有很多文章介绍的,只要确保环境变量配置正确即可,我的JAVA_HOME配置如下:

 



1 micmiu-mbp:~ micmiu$ echo $JAVA_HOME
2 /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
3 micmiu-mbp:~ micmiu$ java -version
4 java version "1.6.0_65"
5 Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
6 Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)
7 micmiu-mbp:~ micmiu$



  (注:此处jdk的安装,我直接官网安装eclipse4.3,此时如果没装jdk,eclipse运行时会默认提示下载安装jdk,傻瓜式点击确认就可以了,默认安装1.6.0_65版本。

.bash_profile内容:



export CLICOLOR=1
export LSCOLORS=GxFxDxBxegedabagaced
export PS1="\[\e[0;31m\]\u@\h\[\e[0;33m\]:\[\e[1;34m\]\w \[\e[1;37m\]$ \[\e[m\]"
export HADOOP_HOME=~/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME=`/usr/libexec/java_home`



HADOOP_HOME的配置是为了输入命令方便,最好配置下。下面有更多详细参数配置亦可参照。

  )



3、无密码SSH登录



由于是单节点的应用,只要实现localhost 的无密码ssh登录即可,这个比较简单:



micmiu-mbp:~ micmiu$ cd ~
micmiu-mbp:~ micmiu$ ssh-keygen -t rsa -P ''
micmiu-mbp:~ micmiu$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys



 

验证是否成功:



micmiu-mbp:~ micmiu$ ssh localhost
Last login: Sat Jan 18 10:17:19 2014
micmiu-mbp:~ micmiu$



这样就表示SSH无密码登录成功了。

有关SSH无密码登录的详细介绍可以参见:Linux(Centos)配置OpenSSH无密码登陆

[二]、Hadoop安装配置

1、下载发布包

打开官方下载链接 http://hadoop.apache.org/releases.html#Download  ,选择2.2.0版本的发布包下载 后解压到指定路径下:micmiu$ tar -zxf hadoop-2.2.0.tar.gz -C /usr/local/share,那么本文中HADOOP_HOME = /usr/local/share/hadoop-2.2.0/。

2、配置系统环境变量 vi ~/.profile ,添加如下内容:



# Hadoop settings by Michael@micmiu.com
export HADOOP_HOME="/usr/local/share/hadoop-2.2.0"
export HADOOP_PREFIX=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX} 
export HADOOP_HDFS_HOME=${HADOOP_PREFIX} 
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX} 
export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop/"
export YARN_CONF_DIR=${HADOOP_CONF_DIR}

export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin



 

3、修改 <HADOOP_HOME>/etc/hadoop/hadoop-env.sh

Mac OSX配置如下:



# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=$(/usr/libexec/java_home -d 64 -v 1.6)
#找到HADOOP_OPTS 配置增加下面参数
export HADOOP_OPTS="$HADOOP_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"



 

跟多可以参见:$JAVA_HOME环境变量在Mac OS X中设置的问题

Linux|Unix 配置如下:



# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=系统中JDK实际路径



 

4、修改 <HADOOP_HOME>/etc/hadoop/core-site.xml

在<configuration>节点下添加或者更新下面的配置信息:



<!-- 新变量f:s.defaultFS 代替旧的:fs.default.name |micmiu.com-->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system.</description>
  </property>

  <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/micmiu/tmp/hadoop</value>
    <description>A base for other temporary directories.</description>
  </property>

  <property>
    <name>io.native.lib.available</name>
    <value>false</value>
    <description>default value is true:Should native hadoop libraries, if present, be used.</description>
  </property>



 

5、修改 <HADOOP_HOME>/etc/hadoop/hdfs-site.xml

在<configuration>节点下添加或者更新下面的配置信息:



<property>
        <name>dfs.replication</name>
        <value>1</value>
        <!-- 如果是单节点配置为1,如果是集群根据实际集群数量配置 | micmiu.com -->
</property>



 

6、修改 <HADOOP_HOME>/etc/hadoop/yarn-site.xml

在<configuration>节点下添加或者更新下面的配置信息:



<!-- micmiu.com -->
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>



 

7、修改 <HADOOP_HOME>/etc/hadoop/mapred-site.xml

默认没有mapred-site.xml 文件,copy  mapred-site.xml.template 一份为 mapred-site.xml即可

在<configuration>节点下添加或者更新下面的配置信息:



<!-- micmiu.com -->
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <final>true</final>
</property>



 

[三]、启动及演示

1、启动Hadoop

首先执行hdfs namenode -format



micmiu-mbp:~ micmiu$ hdfs namenode -format
14/01/18 23:07:07 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = micmiu-mbp.local/192.168.1.103
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.2.0
.................
.................
.................
14/01/18 23:07:08 INFO util.GSet: VM type       = 64-bit
14/01/18 23:07:08 INFO util.GSet: 0.029999999329447746% max memory = 991.7 MB
14/01/18 23:07:08 INFO util.GSet: capacity      = 2^15 = 32768 entries
Re-format filesystem in Storage Directory /Users/micmiu/tmp/hadoop/dfs/name ? (Y or N) Y
14/01/18 23:07:26 INFO common.Storage: Storage directory /Users/micmiu/tmp/hadoop/dfs/name has been successfully formatted.
14/01/18 23:07:26 INFO namenode.FSImage: Saving image file /Users/micmiu/tmp/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
14/01/18 23:07:26 INFO namenode.FSImage: Image file /Users/micmiu/tmp/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved in 0 seconds.
14/01/18 23:07:27 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/01/18 23:07:27 INFO util.ExitUtil: Exiting with status 0
14/01/18 23:07:27 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at micmiu-mbp.local/192.168.1.103
************************************************************/



然后执行 start-dfs.sh



micmiu-mbp:~ micmiu$ start-dfs.sh 
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/share/hadoop-2.2.0/logs/hadoop-micmiu-namenode-micmiu-mbp.local.out
localhost: starting datanode, logging to /usr/local/share/hadoop-2.2.0/logs/hadoop-micmiu-datanode-micmiu-mbp.local.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/share/hadoop-2.2.0/logs/hadoop-micmiu-secondarynamenode-micmiu-mbp.local.out
micmiu-mbp:~ micmiu$ jps
1522 NameNode
1651 DataNode
1794 SecondaryNameNode
1863 Jps
micmiu-mbp:~ micmiu$



 

再执行 start-yarn.sh



micmiu-mbp:~ micmiu$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/local/share/hadoop-2.2.0/logs/yarn-micmiu-resourcemanager-micmiu-mbp.local.out
localhost: starting nodemanager, logging to /usr/local/share/hadoop-2.2.0/logs/yarn-micmiu-nodemanager-micmiu-mbp.local.out
micmiu-mbp:~ micmiu$ jps
2033 NodeManager
1900 ResourceManager
1522 NameNode
1651 DataNode
2058 Jps
1794 SecondaryNameNode
micmiu-mbp:~ micmiu$



 

启动日志没有错误信息,并确认上面的相关进程存在,就表示启动成功了。

2、演示

演示hdfs 一些常用命令,为wordcount演示做准备:



micmiu-mbp:~ micmiu$ hdfs dfs -ls /
micmiu-mbp:~ micmiu$ hdfs dfs -mkdir /user
micmiu-mbp:~ micmiu$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - micmiu supergroup          0 2014-01-18 23:20 /user
micmiu-mbp:~ micmiu$ hdfs dfs -mkdir -p /user/micmiu/wordcount/in
micmiu-mbp:~ micmiu$ hdfs dfs -ls /user/micmiu/wordcount
Found 1 items
drwxr-xr-x   - micmiu supergroup          0 2014-01-18 23:21 /user/micmiu/wordcount/in



 

本地创建一个文件 micmiu-word.txt, 写入如下内容:



Hi Michael welcome to Hadoop 
Hi Michael welcome to BigData
Hi Michael welcome to Spark 
more see micmiu.com



把 micmiu-word.txt 文件上传到hdfs:
hdfs dfs -put micmiu-word.txt  /user/micmiu/wordcount/in

然后cd 切换到Hadoop的根目录下执行:

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount  /user/micmiu/wordcount/in /user/micmiu/wordcount/out

ps: /user/micmiu/wordcount/out 目录不能存在 否则运行报错。

看到类似如下的日志信息:



micmiu-mbp:hadoop-2.2.0 micmiu$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount  /user/micmiu/wordcount/in /user/micmiu/wordcount/out
14/01/19 20:02:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/01/19 20:02:29 INFO input.FileInputFormat: Total input paths to process : 1
14/01/19 20:02:29 INFO mapreduce.JobSubmitter: number of splits:1
............
............
............
14/01/19 20:02:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1390131922557_0001
14/01/19 20:02:30 INFO impl.YarnClientImpl: Submitted application application_1390131922557_0001 to ResourceManager at /0.0.0.0:8032
14/01/19 20:02:30 INFO mapreduce.Job: The url to track the job: http://micmiu-mbp.local:8088/proxy/application_1390131922557_0001/
14/01/19 20:02:30 INFO mapreduce.Job: Running job: job_1390131922557_0001
14/01/19 20:02:38 INFO mapreduce.Job: Job job_1390131922557_0001 running in uber mode : false
14/01/19 20:02:38 INFO mapreduce.Job:  map 0% reduce 0%
14/01/19 20:02:43 INFO mapreduce.Job:  map 100% reduce 0%
14/01/19 20:02:50 INFO mapreduce.Job:  map 100% reduce 100%
14/01/19 20:02:50 INFO mapreduce.Job: Job job_1390131922557_0001 completed successfully
14/01/19 20:02:51 INFO mapreduce.Job: Counters: 43
    File System Counters
        FILE: Number of bytes read=129
        FILE: Number of bytes written=158647
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=228
        HDFS: Number of bytes written=83
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=3346
        Total time spent by all reduces in occupied slots (ms)=3799
    Map-Reduce Framework
        Map input records=4
        Map output records=18
        Map output bytes=179
        Map output materialized bytes=129
        Input split bytes=120
        Combine input records=18
        Combine output records=10
        Reduce input groups=10
        Reduce shuffle bytes=129
        Reduce input records=10
        Reduce output records=10
        Spilled Records=20
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=30
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=283127808
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=108
    File Output Format Counters 
        Bytes Written=83
micmiu-mbp:hadoop-2.2.0 micmiu$



 

到此 wordcount的job已经执行完成,执行如下命令可以查看刚才job的执行结果:



micmiu-mbp:hadoop-2.2.0 micmiu$ hdfs dfs -ls /user/micmiu/wordcount/out
Found 2 items
-rw-r--r--   1 micmiu supergroup          0 2014-01-19 20:02 /user/micmiu/wordcount/out/_SUCCESS
-rw-r--r--   1 micmiu supergroup         83 2014-01-19 20:02 /user/micmiu/wordcount/oummmicmiu-mbp:hadoop-2.2.0 micmiu$ hdfs dfs -cat /user/micmiu/wordcount/out/part-r-00000
BigData    1
Hadoop    1
Hi    3
Michael    3
Spark    1
micmiu.com    1
more    1
see    1
to    3
welcome    3