Hadoop虚拟机配置与IDEA远程调试配置

  • 博主自身环境
  • Hadoop配置
  • 错误解决
  • IDEA配置
  • Step 1 jar 的导入
  • Step 2 核心 *.xml文件的导入
  • Step 3 HDFS的工具:Big Data Tool 的安装
  • 其配置步骤:
  • Step 4 HDFS文件夹权限的修改
  • IDEA配置错误解决
  • 程序运行输入输出设置
  • core-site.xml 和 hdfs-site.xml 的配置


博主自身环境

注意:学习用,生产环境啥的谨慎使用!!!!!

Win10 + VM Pro 15 + ubuntu 18.04LTS + Hadoop 2.7.7 (1主2从)+ IDEA 2019.3 Ultimate

Hadoop配置

克隆虚拟机的时候一定要记得把主机的MAC地址改下,不然可能会启动的时候IP地址每次加1

错误解决

  1. hadoop运行计算pi或running job卡住或 map 0% reduce 0%
  2. JAVA_HOME之类的错误配置下env 环境的参数就OK
    注:记得把Slave的文件也一并修改了

IDEA配置

先新建个普通项目,然后在项目目录下新建个resources 文件夹

Step 1 jar 的导入

HADOOP_HOME为hadoop安装目录,例如/usr/local/hadoop/

HADOOP_HOME/share/hadoop/common目录下的hadoop-common-2.7.1.jar和haoop-nfs-2.7.1.jar;
 HADOOP_HOME/share/hadoop/common/lib目录下的所有JAR包;
 HADOOP_HOME/share/hadoop/hdfs目录下的haoop-hdfs-2.7.1.jar和haoop-hdfs-nfs-2.7.1.jar;
 HADOOP_HOME/share/hadoop/hdfs/lib目录下的所有JAR包;
 HADOOP_HOME/share/hadoop/mapreduce除hadoop-mapreduce-examples-2.7.1.jar之外的jar包;
 HADOOP_HOME/share/hadoop/mapreduce/lib/所有jar包。

将上述的jar包整合在同个文件夹中(本博文中以jar为文件夹名)

Project_Structure -> Project_Setting 下的Modules->Dependencies->按以下图片提示的操作选择上述jar文件夹

idea android 虚拟机调试 idea虚拟机配置_hdfs

Step 2 核心 *.xml文件的导入

将远程Master主机上的HADOOP_HOME/etc/hadoop目录下的

log4j.properties
core-site.xml
hdfs-site.xml

三个文件放到本地主机项目下的resources 文件夹下

idea android 虚拟机调试 idea虚拟机配置_idea android 虚拟机调试_02

Step 3 HDFS的工具:Big Data Tool 的安装

idea android 虚拟机调试 idea虚拟机配置_hdfs_03

其配置步骤:

idea android 虚拟机调试 idea虚拟机配置_xml_04


项目结构如上所示

Viwe -> Tool Windows -> Big DataTools ->选择’+’->RemoteFS->HDFS,Config Path 选择刚刚的resources文件夹,其他不用动

idea android 虚拟机调试 idea虚拟机配置_hadoop_05

Step 4 HDFS文件夹权限的修改
  1. 在HDFS下创建文件夹 hdfs dfs -mkdir -p /user/hadoop
  2. hadoop fs -chmod 777 /user 为该文件夹赋予权限,使得win10 本地可以远程上传下载文件
  3. 想查看其他文件可用 hadoop fs -ls [-R] /

IDEA配置错误解决

  1. 1 (null) entry in command string: null chmod 0700
1.2  Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

上面的错误参考这里

  1. windows运行mapreuce报错pathname…is not a valid DFS filename

程序运行输入输出设置

!!!在编程之前我已将Master 的IP地址放到我Win10 的hosts 文件内了。
对于怎么修改的参考

  1. 若想把输出放在本地,可设置(参照上文的代码)
.......
        // 文件路径信息
        FileInputFormat.setInputPaths(conf, new Path("hdfs://Master:9000/user/hadoop/input/"));
        FileOutputFormat.setOutputPath(conf, new Path("file:\\E:/JavaCode/HadoopDemo/result/wordCount")); 
        // 执行
        JobClient.runJob(conf);
    }
}

file \\: 指定当前本地的目录(当前是win10,如果你是在linux 下操作就是linux )

hdfs//: 指定hadoop 分布式文件系统的目录

注意:WordCount 文件夹不能存在,不然会报错。即你的路径就到result 为止是存在的,如下:

idea android 虚拟机调试 idea虚拟机配置_xml_06

core-site.xml 和 hdfs-site.xml 的配置

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>Master:50090</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/usr/local/hadoop/tmp/dfs/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/usr/local/hadoop/tmp/dfs/data</value>
        </property>
</configuration>

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://Master:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/usr/local/hadoop/tmp</value>
                <description>Abase for other temporary directories.</description>
        </property>
</configuration>

参考博文:
Hadoop hdfs上传文件 权限问题Idea远程连接Hadoop运行MapReduce https://www.jetbrains.com/help/idea/big-data-tools-configuration.html