Hadoop 2.2下载

  Hadoop2.2.0 我们可以直接从Apache官方网站下载。

  下载地址:http://mirror.esocc.com/apache/hadoop/common/stable/

Hadoop集群环境搭建准备

这里我们搭建一个由三台机器组成的集群:

 

       Ip地址
   用户名   
  主机名
        系统
192.168.1.105  
  hadoop
 master
 Ubuntu  64bit   
192.168.1.120
  hadoop
 slave2    
 CentOS 64bit
192.168.1.104
  hadoop
 slave1
 CentOS 64bit

 

1.通过如下配置master, slave1, slave2。

1)通过vim /etc/hostname修改主机名。

2)通过vim /etc/hosts修改/etc/hosts 文件,增加三台机器的ip和hostname的映射关系。

master,slave1,slave2三台主机安装ssh服务,实现各个节点间的无密码登录!

进入slave1的~/.ssh目录下

$ ssh-keygen$ chmod 700 ~/.ssh/$ cat id_rsa.pub >> authorized_keys
$ chmod 600 authorized_keys
$ scp authorized_keys hadoop@slave2:~/.ssh/


进入slave2的~/.ssh目录下


$ ssh-keygen$ chmod 700 ~/.ssh/$ cat id_rsa.pub >> authorized_keys
$ chmod 600 authorized_keys
$ scp authorized_keys hadoop@master:~/.ssh/

进入master的~/.ssh目录下

$ ssh-keygen$ chmod 700 ~/.ssh/$ cat id_rsa.pub >> authorized_keys
$ chmod 600 authorized_keys
$ scp authorized_keys hadoop@slave2:~/.ssh/$ scp authorized_keys hadoop@slave1:~/.ssh/

经过这样的操作,三节点都能实现相互无密码访问

JDK安装:包含master,slave1, slave2三节点,最好位置统一,例如像本例一样安装在/opt/java/jdk目录下

Hadoop 2.2安装

复制到其他节点。所以这里的安装过程相当于在每台机器上面都要执行。

1.解压文件

根据自己的目录进行替换)/路径下(或者将在64位机器上编译的结果存放在此路径下)。然后为了节省空间,可删除此压缩文件,或将其存放于其他地方进行备份。

注意:每台机器的安装路径要相同!!

2.hadoop配置过程

配置之前,需要在master本地文件系统创建以下文件夹:


这里要涉及到的配置文件有6个:

~/hadoop-2.2.0/etc/hadoop/hadoop-env.sh

~/hadoop-2.2.0/etc/hadoop/slaves

~/hadoop-2.2.0/etc/hadoop/core-site.xml

~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml

~/hadoop-2.2.0/etc/hadoop/mapred-site.xml

~/hadoop-2.2.0/etc/hadoop/yarn-site.xml

以上文件默认不存在的,可以复制相应的template文件获得。

配置文件1:hadoop-env.sh(slave1,slave2也需要配置)

修改JAVA_HOME值(export  JAVA_HOME=/opt/java/jdk

配置文件2:slaves

写入以下内容:

slave1

slave2

配置文件3:core-site.xml


<configuration>
	<property>
	  <name>hadoop.tmp.dir</name>
	  <value>/home/hadoop/tmp/hadoop-${user.name}</value>
	  <description>A base for other temporarydirectories.</description>
	</property>
	<property>
	   <name>fs.default.name</name>
	   <value>hdfs://master:8010</value>
	   <description>The name of the default file system.  A URI whose
	           scheme and authority determine the FileSystem implementation.  The
	              uri's scheme determines the config property (fs.SCHEME.impl) naming
	                 the FileSystem implementation class. The uri's authority is used to
	                    determine the host, port, etc. for a filesystem.</description>
	  </property> 
</configuration>

配置文件4:hdfs-site.xml


<configuration>
	<property>
	<name>dfs.replication</name>
	<value>1</value>
	<description>Default block replication.
	 Theactual number of replications can be specified when the file is created.
	  Thedefault is used if replication is not specified in create time.
	</description>
   </property> 
</configuration>

配置文件5:mapred-site.xml



<configuration>
	<property>
	   <name>mapred.job.tracker</name>
	   <value>master:54311</value>
	   <description>The host and port that the MapReduce job tracker runs
	 at.  If "local", thenjobs are run in-process as a single map
	  and reduce task.
	   </description>
	</property>
        <property>
          <name>mapred.map.tasks</name>
          <value>10</value>
          <description>As a rule of thumb, use 10x the number of slaves(i.e., number of tasktrackers).          
          </description>
       </property>
       <property>
          <name>mapred.reduce.tasks</name>
          <value>2</value>
          <description>As a rule of thumb, use 2x the number of slaveprocessors (i.e., number of tasktrackers).
         </description> 
     </property>
</configuration>

配置文件6:yarn-site.xml:暂时无需配置

3.复制到其他节点

master通过scp命令(必须先配置好ssh服务)将配置文件传输给两个子节点slave1,slave2,下面以slave1为例


$ scp -r ~/hadoop-2.2.0 hadoop@slave1:~/
$ scp -r ~/hadoop-2.2.0 hadoop@slave2:~/

4.启动验证


1.启动hadoop

进入安装目录:

cd  ~/hadoop-2.2.0/

格式化namenode:

./bin/hdfs namenode –format 

2.启动hdfs: ./sbin/start-dfs.sh

此时在master上面运行的进程有:namenode secondarynamenode

slave1和slave2上面运行的进程有:datanode

3.启动yarn: ./sbin/start-yarn.sh

此时在master上面运行的进程有:namenode secondarynamenoderesourcemanager

slave1和slave2上面运行的进程有:datanode nodemanaget

5.查看集群状态:./bin/hdfs dfsadmin –report


 运行命令:

最终运行结果如下所示,则安装成功!

 


hadoop@master:~/hadoop-2.2.0/bin$ ./hdfs dfsadmin -report
14/03/20 21:24:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 139443990528 (129.87 GB)
Present Capacity: 123374796800 (114.90 GB)
DFS Remaining: 123374747648 (114.90 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0


-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)


Live datanodes:
Name: 192.168.1.120:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 100111396864 (93.24 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 8277905408 (7.71 GB)
DFS Remaining: 91833466880 (85.53 GB)
DFS Used%: 0.00%
DFS Remaining%: 91.73%
Last contact: Thu Mar 20 21:24:44 CST 2014




Name: 192.168.1.104:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 39332593664 (36.63 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7791288320 (7.26 GB)
DFS Remaining: 31541280768 (29.38 GB)
DFS Used%: 0.00%
DFS Remaining%: 80.19%
Last contact: Thu Mar 20 21:24:43 CST 2014