最近打算尝试在服务器上安装hadoop3.0,由于服务器上本来搭建好hadoop2.6,并且是用root用户安装。我不打算删除原来的hadoop2.6,


所以准备新建两个用户,hadoop2.6及hadoop3.0;将位于/usr/local/Hadoop/目录下的hadoop2.6移到hadoop2.6用户目录下,并且将原本在/etc/profile

中配置的环境变量删除,然后在hadoop2.6用户目录中的.bashrc文件设置环境变量。使得两个用户各自使用不同的版本。

Preface


此次将用4台服务器搭建,一台为master节点,三台为slave节点。之前的hosts设置如下


[html] view plain copy


1. master  116.57.56.220  
2. slave1  116.57.86.221  
3. slave2  116.57.86.222  
4. slave3  116.57.86.223



一、创建新用户


[plain] view plain copy



    1. sudo useradd -d /home/hadoop3.0 -m hadoop3.0  //-d设置用户目录路径,-m设置登录名



    [plain] view plain copy



      1. passwd hadoop3.0 //设置密码


      查询资料后发现此时的shell被设置为sh,故在/etc/passwd将/bin/sh改为/bin/bash后恢复正常。


      [plain] view plain copy



        1. hadoop3.0:x:1002:1002::/home/hadoop3.0:/bin/bash


        关于sh与bash的区别



        接着在使用sudo时报错,原因是新创建的用户需要在/etc/sudoers中添加sudo权限



        [plain] view plain copy


        1. # Allow members of group sudo to execute any command  
        2. %sudo   ALL=(ALL:ALL) ALL  
        3. hadoop3.0 ALL=(ALL)ALL


        将环境变量设置到.bashrc中


        [plain] view plain copy


        1. export JAVA_HOME=/usr/local/java/jdk1.8.0_101  //hadoop3.0需要java8  
        2. export HADOOP_HOME=~/usr/local/hadoop/hadoop-3.0.0-alpha1  
        3. export JRE_HOME=${JAVA_HOME}/jre  
        4. export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:${HIVE_HOME}/lib  
        5. export SCALA_HOME=~/usr/local/scala/scala-2.10.5  
        6. export SPARK_HOME=~/usr/local/spark/spark-2.0.1-bin-hadoop2.7  
        7. export SQOOP_HOME=~/usr/local/sqoop/sqoop-1.4.6  
        8. export HIVE_HOME=~/usr/local/hive/hive-1.2.1  
        9. export HBASE_HOME=~/usr/local/hbase/hbase-1.0.1.1  
        10. export PATH=${SPARK_HOME}/bin:${SCALA_HOME}/bin:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SQOOP_HOME}/bin:${HADOOP_HOME}/lib:${HIVE_HOME}/bin:${HBASE_HOME}/bin:$PATH


        二、新建用户设置ssh免密码登陆


        [html] view plain copy


        1. ssh-keygen -t rsa  //生成密钥id-rsa、公钥id-rsa.pub

        将公钥的内容复制到需要ssh免密码登陆的机器的~/.ssh/authorized_keys文件中。




        例如:A机器中生成密钥及公钥,然后将公钥内容复制到B机器的authorized_keys文件中,这样变实现了A免密码ssh登陆B。



        三、hadoop配置


        hadoop3.0需要配置的文件有core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、hadoop-env.sh、workers


        1.core-site.xml配置文件


        [html] view plain copy



          1. <configuration>
          2. <property>
          3. <name>fs.defaultFS</name>
          4. <value>hdfs://master:9000</value>
          5. </property>
          6.   
          7. <property>
          8. <name>hadoop.tmp.dir</name>
          9. <value>file:///home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/tmp</value>
          10. </property>
          11. </configuration>



          2.hdfs-site.xml配置文件



          [html] view plain copy



          1. <configuration>
          2. <property>
          3. <name>dfs.replication</name>
          4. <value>3</value>
          5. </property>
          6. <property>
          7. <name>dfs.namenode.name.dir</name>
          8. <value>file:///home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/hdfs/name</value>
          9. </property>
          10. <property>
          11. <name>dfs.datanode.data.dir</name>
          12. <value>file:///home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/hdfs/data</value>
          13. </property>
          14. <property>
          15. <name>dfs.namenode.secondary.http-address</name>
          16. <value>slave1:9001</value>
          17. </property>
          18. </configuration>



          3.workers中设置slave节点,将slave机器的名称写入


          [html] view plain copy



            1. slave1  
            2. slave2  
            3. slave3



            4.mapred-site配置



            [html] view plain copy



              1. cp mapred-site.xml.template mapred-site.xml



              [html] view plain copy


              1. <configuration>
              2. <property>
              3. <name>mapreduce.framework.name</name>
              4. <value>yarn</value>
              5. </property>
              6.   
              7. <property>
              8. <name>mapreduce.application.classpath</name>
              9. <value>
              10.   /home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/etc/hadoop,  
              11.   /home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/share/hadoop/common/*,  
              12.   /home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/share/hadoop/common/lib/*,  
              13.   /home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/share/hadoop/hdfs/*,  
              14.   /home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/share/hadoop/hdfs/lib/*,  
              15.   /home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/share/hadoop/mapreduce/*,  
              16.   /home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/share/hadoop/mapreduce/lib/*,  
              17.   /home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/share/hadoop/yarn/*,  
              18.   /home/hadoop3.0/usr/local/hadoop/hadoop-3.0.0-alpha1/share/hadoop/yarn/lib/*  
              19. </value>
              20. </property>
              21. </configuration>



              Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster




              后来在此帖子找到原因http://youling87.blog.51cto.com/5271987/1548227






              5.yarn-site.xml配置



              [html] view plain copy

              1. <configuration>
              2.   
              3. <!-- Site specific YARN configuration properties -->
              4. <property>
              5. <name>yarn.nodemanager.aux-services</name>
              6. <value>mapreduce_shuffle</value>
              7. </property>
              8. <property>
              9. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
              10. <value>org.apache.hadoop.mapred.ShuffleHandle</value>
              11. </property>
              12. <property>
              13. <name>yarn.resourcemanager.resource-tracker.address</name>
              14. <value>master:8025</value>
              15. </property>
              16. <property>
              17. <name>yarn.resourcemanager.scheduler.address</name>
              18. <value>master:8030</value>
              19. </property>
              20. <property>
              21. <name>yarn.resourcemanager.address</name>
              22. <value>master:8040</value>
              23. </property>
              24. </configuration>


              6.hadoop-env.sh中配置java_home



              [html] view plain copy


              1. export JAVA_HOME=/usr/local/java/jdk1.8.0_101







              四、启动hadoop



              1.格式化namenode

              [html] view plain copy


              1. hdfs namenode -format



              [html] view plain copy


                1. bin/hdfs namenode -format



                [html] view plain copy



                  1. start-dfs.sh  
                  2. start-yarn.sh


                  若没有设置路径$HADOOP_HOME/sbin为环境变量,则需在$HADOOP_HOME路径下执行


                  [html] view plain copy


                  1. sbin/start-dfs.sh  
                  2. sbin/start-yarn.sh



                  现在便可以打开页面http://master:8088及http://master:9870;看到下面两个页面时说明安装成功。



                  hive cdh驱动包位置_hive cdh驱动包位置





                  hive cdh驱动包位置_hive cdh驱动包位置_02








                  使用自带的example进行测试



                  1.生成HDFS请求目录执行MapReduce任务


                  [html] view plain copy


                  1. hdfs dfs -mkdir /user    
                  2. hdfs dfs -mkdir /user/hduser


                  2.将输入文件拷贝到分布式文件系统



                  [html] view plain copy


                  1. hdfs dfs -mkdir /user/hduser/input     
                  2. hdfs dfs -put etc/hadoop/*.xml /user/hduser/input


                  3.运行提供的示例程序



                  [html] view plain copy


                  1. hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha1.jar grep /user/hduser/input output 'dfs[a-z.]+'




                  4.查看输出文件:

                  将输出文件从分布式文件系统拷贝到本地文件系统查看:



                  [html] view plain copy



                  1. hdfs dfs -get output output    
                  2. cat output/*