环境信息及软件准备
系统信息:
Linux promote 4.1.12-1-default #1 SMP PREEMPT Thu Oct 29 06:43:42 UTC 2015 (e24bad1) x86_64 x86_64 x86_64 GNU/Linux
需要的软件:
jdk-8u101-linux-x64.rpm
scala-2.11.8.rpm
hadoop-2.6.4.tar.gz
spark-2.0.0-bin-hadoop2.6.tgz
ideaIC-2016.2.2.tar.gz
创建spark用户
使用root用户,执行以下命令,创建spark用户,spark用户的家目录下文以 $HOME 代替
useradd -m -d /home/spark -s /bin/bash spark
passwd spark
配置SSH
使用spark用户执行以下命令
ssh-keygen -t rsa -P ""
cd /home/spark/.ssh
cat id_rsa.pub >> authorized_keys
执行以下命令,查看SSH配置是否成功
ssh localhost
如果配置成功,会成功登录系统并显示欢迎信息
spark@promote:~/.ssh> ssh localhost
Last failed login: Fri Aug 19 23:13:27 CST 2016 from localhost on ssh:notty
There were 3 failed login attempts since the last successful login.
Have a lot of fun...
spark@promote:~>
安装JDK
将jdk-8u101-linux-x64.rpm上传到单板上,以root用户执行以下命令,安装jdk
rpm -ivh jdk-8u101-linux-x64.rpm
编辑 /etc/profile
文件,在文件的末尾添加以下内容
export JAVA_HOME=/usr/java/jdk1.8.0_101
export PATH=$JAVA_HOME/bin:$PATH
执行以下命令,查看安装是否成功
su - spark
echo $JAVA_HOME
java -version
安装成功的话,会看到以下显示
promote:~ # su - spark
spark@promote:~> echo $JAVA_HOME
/usr/java/jdk1.8.0_101
spark@promote:~> java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
安装Scala
将scala-2.11.8.rpm上传到单板上,以root用户执行以下命令,安装Scala
rpm -ivh scala-2.11.8.rpm
编辑 /etc/profile
文件,在文件的末尾添加以下内容
export SCALA_HOME=/usr/share/scala
export PATH=$SCALA_HOME/bin:$PATH
执行以下命令,查看安装是否成功
su - spark
echo $SCALA_HOME
scala -version
如果安装成功,会看到以下显示
promote:~ # su - spark
spark@promote:~> echo $SCALA_HOME
/usr/share/scala
spark@promote:~> scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
安装Hadoop
本节的操作都以spark用户执行
将hadoop-2.6.4.tar.gz上传到HOME目录,执行以下操作,将压缩包解压tar zxvf hadoop−2.6.4.tar.gz,编辑
HOME/.profile`文件,在文件末尾添加以下内容
export HADOOP_HOME=/home/spark/hadoop-2.6.4
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$HADOOP_HOME/bin:$PATH
执行以下命令,验证hadoop的环境变量信息配置是否正确
source $HOME/.profile
hadoop version
如果配置正确,会看到以下显示
spark@promote:~> hadoop version
Hadoop 2.6.4
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 5082c73637530b0b7e115f9625ed7fac69f937e6
Compiled by jenkins on 2016-02-12T09:45Z
Compiled with protoc 2.5.0
From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
This command was run using /home/spark/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar
修改Hadoop配置文件
由于本文档指导的是伪分布式环境的安装,所以,需要修改以下配置文件
$HADOOP_HOME/etc/hadoop/core-site.xml
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
$HADOOP_HOME/etc/hadoop/mapred-site.xml
$HADOOP_HOME/etc/hadoop/yarn-site.xml
配置core-site.xml
编辑$HADOOP_HOME/etc/hadoop/core-site.xml
,将标签配置成以下内容
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/spark/hadoop-2.6.4/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
配置hdfs-site.xml
编辑$HADOOP_HOME/etc/hadoop/hdfs-site.xml
,将标签配置成以下内容
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/spark/hadoop-2.6.4/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/spark/hadoop-2.6.4/hdfs/data</value>
</property>
</configuration>
配置mapred-site.xml
执行以下命令,新增mapred-site.xml文件
cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
编辑$HADOOP_HOME/etc/hadoop/mapred-site.xml
,将标签配置成以下内容
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置yarn-site.xml
编辑$HADOOP_HOME/etc/hadoop/yarn-site.xml
,将标签配置成以下内容
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
格式化NameNode
执行以下命令,将NameNode格式化
hdfs namenode -format
启动Hadoop
执行以下命令,启动Hadoop
cd $HADOOP_HOME/sbin
./start-all.sh
启动过程中如果显示以下信息,请输入yes 并回车
Are you sure you want to continue connecting (yes/no)?
如果启动成功,会看到以下显示
spark@promote:~> cd $HADOOP_HOME/sbin
spark@promote:~/hadoop-2.6.4/sbin> ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/spark/hadoop-2.6.4/logs/hadoop-spark-namenode-promote.out
localhost: starting datanode, logging to /home/spark/hadoop-2.6.4/logs/hadoop-spark-datanode-promote.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is 0f:2a:39:18:ac:17:70:0f:24:d7:45:3c:d6:c7:16:59 [MD5].
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/spark/hadoop-2.6.4/logs/hadoop-spark-secondarynamenode-promote.out
starting yarn daemons
starting resourcemanager, logging to /home/spark/hadoop-2.6.4/logs/yarn-spark-resourcemanager-promote.out
localhost: starting nodemanager, logging to /home/spark/hadoop-2.6.4/logs/yarn-spark-nodemanager-promote.out
spark@promote:~/hadoop-2.6.4/sbin>
执行 jps 命令,可以看到所有的进程如下
spark@promote:~> jps
6946 Jps
6647 NodeManager
6378 SecondaryNameNode
6203 DataNode
6063 NameNode
6527 ResourceManager
访问 http://localhost:50070
,可以查看Hadoop的Web页面
访问 http://localhost:8088
,可以查看Yarn的资源管理页面
HDFS操作
在HDFS上创建用户目录
执行以下命令,在HDFS上创建用户目录
hdfs dfs -mkdir -p /user/liyanjie
上传本地文件到HDFS对应用户目录
执行以下命令,将本地文件上传到HDFS
hdfs dfs -put $HADOOP_HOME/etc/hadoop/*.xml /user/liyanjie
查看HDFS文件列表
执行以下命令,查看HDFS上的文件列表
hdfs dfs -ls /user/liyanjie
如果命令执行成功,将会看到以下显示
spark@promote:~> hdfs dfs -ls /user/liyanjie
Found 9 items
-rw-r--r-- 1 spark supergroup 4436 2016-08-19 23:41 /user/liyanjie/capacity-scheduler.xml
-rw-r--r-- 1 spark supergroup 1077 2016-08-19 23:41 /user/liyanjie/core-site.xml
-rw-r--r-- 1 spark supergroup 9683 2016-08-19 23:41 /user/liyanjie/hadoop-policy.xml
-rw-r--r-- 1 spark supergroup 1130 2016-08-19 23:41 /user/liyanjie/hdfs-site.xml
-rw-r--r-- 1 spark supergroup 620 2016-08-19 23:41 /user/liyanjie/httpfs-site.xml
-rw-r--r-- 1 spark supergroup 3523 2016-08-19 23:41 /user/liyanjie/kms-acls.xml
-rw-r--r-- 1 spark supergroup 5511 2016-08-19 23:41 /user/liyanjie/kms-site.xml
-rw-r--r-- 1 spark supergroup 862 2016-08-19 23:41 /user/liyanjie/mapred-site.xml
-rw-r--r-- 1 spark supergroup 758 2016-08-19 23:41 /user/liyanjie/yarn-site.xml
运行wordcount实例
执行以下命令,运行wordcount实例
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /user/liyanjie /output
命令执行成功,将会看到以下显示
spark@promote:~/hadoop-2.6.4/sbin> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /user/liyanjie /output
16/08/20 11:30:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/20 11:30:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/08/20 11:30:45 INFO input.FileInputFormat: Total input paths to process : 9
16/08/20 11:30:45 INFO mapreduce.JobSubmitter: number of splits:9
16/08/20 11:30:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471663809891_0001
16/08/20 11:30:46 INFO impl.YarnClientImpl: Submitted application application_1471663809891_0001
16/08/20 11:30:46 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1471663809891_0001/
16/08/20 11:30:46 INFO mapreduce.Job: Running job: job_1471663809891_0001
16/08/20 11:30:52 INFO mapreduce.Job: Job job_1471663809891_0001 running in uber mode : false
16/08/20 11:30:52 INFO mapreduce.Job: map 0% reduce 0%
16/08/20 11:31:02 INFO mapreduce.Job: map 44% reduce 0%
16/08/20 11:31:03 INFO mapreduce.Job: map 67% reduce 0%
16/08/20 11:31:05 INFO mapreduce.Job: map 78% reduce 0%
16/08/20 11:31:06 INFO mapreduce.Job: map 89% reduce 0%
16/08/20 11:31:08 INFO mapreduce.Job: map 100% reduce 0%
16/08/20 11:31:09 INFO mapreduce.Job: map 100% reduce 100%
16/08/20 11:31:09 INFO mapreduce.Job: Job job_1471663809891_0001 completed successfully
16/08/20 11:31:09 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=21822
FILE: Number of bytes written=1111057
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=28641
HDFS: Number of bytes written=10525
HDFS: Number of read operations=30
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=9
Launched reduce tasks=1
Data-local map tasks=9
Total time spent by all maps in occupied slots (ms)=50573
Total time spent by all reduces in occupied slots (ms)=4388
Total time spent by all map tasks (ms)=50573
Total time spent by all reduce tasks (ms)=4388
Total vcore-milliseconds taken by all map tasks=50573
Total vcore-milliseconds taken by all reduce tasks=4388
Total megabyte-milliseconds taken by all map tasks=51786752
Total megabyte-milliseconds taken by all reduce tasks=4493312
Map-Reduce Framework
Map input records=789
Map output records=2880
Map output bytes=36676
Map output materialized bytes=21870
Input split bytes=1041
Combine input records=2880
Combine output records=1262
Reduce input groups=603
Reduce shuffle bytes=21870
Reduce input records=1262
Reduce output records=603
Spilled Records=2524
Shuffled Maps =9
Failed Shuffles=0
Merged Map outputs=9
GC time elapsed (ms)=1389
CPU time spent (ms)=5120
Physical memory (bytes) snapshot=2571784192
Virtual memory (bytes) snapshot=18980491264
Total committed heap usage (bytes)=1927282688
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=27600
File Output Format Counters
Bytes Written=10525
执行以下命令,可以看到wordcount的结果文件已经在/output目录下生成
hdfs dfs -ls /output
屏幕显示
spark@promote:~> hdfs dfs -ls /output
Found 2 items
-rw-r--r-- 1 spark supergroup 0 2016-08-20 11:17 /output/_SUCCESS
-rw-r--r-- 1 spark supergroup 10525 2016-08-20 11:17 /output/part-r-00000
执行以下命令,将wordcount的结果文件下载到$HOME
hdfs dfs -get /output/part-r-00000 ${HOME}
也可以通过访问 http://localhost:50070
,使用菜单栏的“Utilities”→“Browse the file system”
,将wordcount的结果文件下载下来。
安装Spark
本节的操作都以spark用户执行
将spark-2.0.0-bin-hadoop2.6.tgz上传到HOME,执行以下命令,将压缩包解压tar zxvf spark−2.0.0−bin−hadoop2.6.tgz
,编辑HOME/.profile
文件,在文件末尾添加以下内容
export SPARK_HOME=/home/spark/spark-2.0.0-bin-hadoop2.6
export PATH=$SPARK_HOME/bin:$PATH
执行以下命令,验证hadoop的环境变量信息配置是否正确
source $HOME/.profile
echo $SPARK_HOME
如果配置正确,会看到以下显示
spark@promote:~> source $HOME/.profile
spark@promote:~> echo $SPARK_HOME
/home/spark/spark-2.0.0-bin-hadoop2.6
执行以下命令,将spark-env.sh.template复制并重命名为spark-env.sh
cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh
编辑$SPARK_HOME/conf/spark-env.sh
,在文件末尾添加以下内容
export SPARK_MASTER=localhost
export SPARK_LOCAL_IP=192.168.2.74
export SPARK_WORKER_MEMORY=1G
export SCALA_HOME=/usr/local/scala
export JAVA_HOME=/mnt/external/Java/jdk1.8.0_121
export HADOOP_HOME=/usr/local/hadoop-2.7.3
export SPARK_HOME=/usr/local/spark
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
export YARN_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
执行以下命令,启动Spark
cd $SPARK_HOME/sbin
./start-all.sh
执行以下命令,测试Spark是否安装成功
cd $SPARK_HOME/bin
./run-example SparkPi
若Spark安装成功,看看到以下显示
spark@promote:~/spark-2.0.0-bin-hadoop2.6/bin> ./run-example SparkPi
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/20 11:48:54 INFO SparkContext: Running Spark version 2.0.0
16/08/20 11:48:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/20 11:48:54 WARN Utils: Your hostname, promote resolves to a loopback address: 127.0.0.1; using 192.168.0.108 instead (on interface eth0)
16/08/20 11:48:54 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/08/20 11:48:54 INFO SecurityManager: Changing view acls to: spark
16/08/20 11:48:54 INFO SecurityManager: Changing modify acls to: spark
16/08/20 11:48:54 INFO SecurityManager: Changing view acls groups to:
16/08/20 11:48:54 INFO SecurityManager: Changing modify acls groups to:
16/08/20 11:48:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
16/08/20 11:48:55 INFO Utils: Successfully started service 'sparkDriver' on port 54474.
16/08/20 11:48:55 INFO SparkEnv: Registering MapOutputTracker
16/08/20 11:48:55 INFO SparkEnv: Registering BlockManagerMaster
16/08/20 11:48:55 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5aa2d4f8-4ccb-48dd-b1e7-95ba3d75fa9c
16/08/20 11:48:55 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/08/20 11:48:55 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/20 11:48:55 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/20 11:48:55 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.108:4040
16/08/20 11:48:55 INFO SparkContext: Added JAR file:/home/spark/spark-2.0.0-bin-hadoop2.6/examples/jars/scopt_2.11-3.3.0.jar at spark://192.168.0.108:54474/jars/scopt_2.11-3.3.0.jar with timestamp 1471664935636
16/08/20 11:48:55 INFO SparkContext: Added JAR file:/home/spark/spark-2.0.0-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.0.0.jar at spark://192.168.0.108:54474/jars/spark-examples_2.11-2.0.0.jar with timestamp 1471664935636
16/08/20 11:48:55 INFO Executor: Starting executor ID driver on host localhost
16/08/20 11:48:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45881.
16/08/20 11:48:55 INFO NettyBlockTransferService: Server created on 192.168.0.108:45881
16/08/20 11:48:55 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.108, 45881)
16/08/20 11:48:55 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.108:45881 with 366.3 MB RAM, BlockManagerId(driver, 192.168.0.108, 45881)
16/08/20 11:48:55 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.108, 45881)
16/08/20 11:48:55 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
16/08/20 11:48:55 INFO SharedState: Warehouse path is 'file:/home/spark/spark-2.0.0-bin-hadoop2.6/bin/spark-warehouse'.
16/08/20 11:48:56 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
16/08/20 11:48:56 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
16/08/20 11:48:56 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
16/08/20 11:48:56 INFO DAGScheduler: Parents of final stage: List()
16/08/20 11:48:56 INFO DAGScheduler: Missing parents: List()
16/08/20 11:48:56 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
16/08/20 11:48:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB)
16/08/20 11:48:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1169.0 B, free 366.3 MB)
16/08/20 11:48:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.108:45881 (size: 1169.0 B, free: 366.3 MB)
16/08/20 11:48:56 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/08/20 11:48:56 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34)
16/08/20 11:48:56 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/08/20 11:48:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 5478 bytes)
16/08/20 11:48:56 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1, PROCESS_LOCAL, 5478 bytes)
16/08/20 11:48:56 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/08/20 11:48:56 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/08/20 11:48:56 INFO Executor: Fetching spark://192.168.0.108:54474/jars/spark-examples_2.11-2.0.0.jar with timestamp 1471664935636
16/08/20 11:48:56 INFO TransportClientFactory: Successfully created connection to /192.168.0.108:54474 after 45 ms (0 ms spent in bootstraps)
16/08/20 11:48:56 INFO Utils: Fetching spark://192.168.0.108:54474/jars/spark-examples_2.11-2.0.0.jar to /tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86/userFiles-a9db2c3e-c81e-4b2c-ac24-4742aa25bf42/fetchFileTemp8233164352392794360.tmp
16/08/20 11:48:56 INFO Executor: Adding file:/tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86/userFiles-a9db2c3e-c81e-4b2c-ac24-4742aa25bf42/spark-examples_2.11-2.0.0.jar to class loader
16/08/20 11:48:56 INFO Executor: Fetching spark://192.168.0.108:54474/jars/scopt_2.11-3.3.0.jar with timestamp 1471664935636
16/08/20 11:48:56 INFO Utils: Fetching spark://192.168.0.108:54474/jars/scopt_2.11-3.3.0.jar to /tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86/userFiles-a9db2c3e-c81e-4b2c-ac24-4742aa25bf42/fetchFileTemp1317824548914840322.tmp
16/08/20 11:48:56 INFO Executor: Adding file:/tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86/userFiles-a9db2c3e-c81e-4b2c-ac24-4742aa25bf42/scopt_2.11-3.3.0.jar to class loader
16/08/20 11:48:56 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 959 bytes result sent to driver
16/08/20 11:48:56 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 959 bytes result sent to driver
16/08/20 11:48:57 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 321 ms on localhost (1/2)
16/08/20 11:48:57 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 495 ms on localhost (2/2)
16/08/20 11:48:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/08/20 11:48:57 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.531 s
16/08/20 11:48:57 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.818402 s
Pi is roughly 3.139915699578498
16/08/20 11:48:57 INFO SparkUI: Stopped Spark web UI at http://192.168.0.108:4040
16/08/20 11:48:57 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/08/20 11:48:57 INFO MemoryStore: MemoryStore cleared
16/08/20 11:48:57 INFO BlockManager: BlockManager stopped
16/08/20 11:48:57 INFO BlockManagerMaster: BlockManagerMaster stopped
16/08/20 11:48:57 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/08/20 11:48:57 INFO SparkContext: Successfully stopped SparkContext
16/08/20 11:48:57 INFO ShutdownHookManager: Shutdown hook called
16/08/20 11:48:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86
查看spark的web页面
spark的web控制页面
sparl-shell web界面
http://192.168.2.74:4040/jobs/
进入交互/bin/spark-shell
../bin/spark-shell
Spark和Hadoop联合测试
1.在HDFS上建立目录
hadoop fs -mkdir -p /usr/hadoop
hadoop fs -ls /usr/ 可以查看
hadoop fs -mkdir -p /usr/data/input 建立文件夹input
将本地目录传送到HDFS上
hadoop fs -put /home/data/spark/spark_test.txt /usr/data/input
2测试spark
观察job网页
配置成功
安装Intellij IDEA
将ideaIC-2016.2.2.tar.gz上传到$HOME目录,执行以下命令,将压缩包解压
tar zxvf ideaIC-2016.2.2.tar.gz
执行以下命令,安装IDEA
cd $HOME/idea-IC-162.1628.40/bin
./idea.sh
此时会弹出安装界面
设置导入以前的配置
接受协议
选择界面风格
生成界面入口
生成启动脚本,若勾选,则在完成安装时需要输入root用户的密码
选择自己需要的工具进行安装
选择安装插件,点击Scala下方的Install按钮,安装Scala插件,安装完成后,点击“Start using Intellij IDEA”
输入root用户的密码,点击“OK”
此时IDEA会启动,显示欢迎界面
创建一个Maven管理的Scala工程
点击IDEA欢迎界面的“Create New Project”,进入工程创建页面
左侧工程类型选择“Maven”,点击“Project SDK”右侧的“New…”添加当前系统的SDK,点击“Next”
输入GroupId、ArtifactId与Version,点击“Next”
设置工程的名称及存储位置,然后点击“Finish”
工程创建完成,但现在还不能在工程中创建Scala类,需要为工程添加ScalaSDK到External Libraries
点击菜单栏“File”→“Project Structure…”进入工程设置页面
进入“Libraries”标签页,点击绿色的+号,添加Scala SDK
直接点击OK即可
点击OK,将Scala SDK添加到当前工程
点击OK即可
编写scala代码时,我们可能会需要将其单独放在一个名为“scala”的srouces root下,添加方法如下
在src目录上点击右键,添加一个目录,名为scala
然后在scala目录上点击右键,选择“Mark Directory as”→“Sources Root”
在scala目录下新建包 com.liyanjie.test,然后在这个包中添加一个scala类,类型为Object
编写HelloWrold类,然后通过菜单栏的“Run”或者“Alt+Shift+F10”来运行。