1.配置环境说明
hadoop配置节点:sg202(namenode SecondaryNameNode) sg206(datanode) sg207(datanode) sg208(datanode)
spark配置节点:sg201(Master) sg211(Worker)
2.从hdfs上读取文件并运行wordcount
a. 登录hadoop的主节点sg202 将要进行wordcount的文件上传到hdfs上
[root@sg202 hadoop-1.0.4]# hadoop fs -put /home/hadoop-1.0.4/README.txt input
b. 登录spark的Master节点(sg201)进入sparkshell
[root@sg201 spark-0.7.3]# MASTER=spark://172.16.48.201:7077 ./spark-shell
c. 运行wordcount
scala> val file=sc.textFile("hdfs://172.16.48.202:9000/user/root/input/README.txt")
scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
1. scala>