spark shell
scala> sc.textFile("data/word.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
20/12/26 17:39:43 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
res1: Array[(String, Int)] = Array((scala,1), (zxl,1), (hello,4), (spark,2))
scala>
WordCount代码示例
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.zxl</groupId>
<artifactId>spark</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>
</project>
scala代码:
package com.zxl.spark
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object WordCountDemo01 {
def main(args: Array[String]): Unit = {
//创建spark运行
val sparkConf:SparkConf = new SparkConf().setMaster("local").setAppName("wc")
val sparkContext:SparkContext = new SparkContext(sparkConf)
val fileRDD:RDD[String] = sparkContext.textFile("input/word.txt")
val wordRDD:RDD[String] = fileRDD.flatMap(_.split(" "))
val word2OneRDD:RDD[(String,Int)] = wordRDD.map((_, 1))
val word2CountRDD:RDD[(String,Int)] = word2OneRDD.reduceByKey(_ + _)
val word2Count:Array[(String,Int)] = word2CountRDD.collect()
word2Count.foreach(println)
sparkContext.stop()
}
}
word.txt:
hello world
hello spark
hello zxl
hello scala
hello java
hello golang
hello python
hello spark
运行结果: