spark shell 

scala> sc.textFile("data/word.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
20/12/26 17:39:43 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
res1: Array[(String, Int)] = Array((scala,1), (zxl,1), (hello,4), (spark,2))

scala>

spark3.0开发WordCount程序完整代码_apache

WordCount代码示例

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org.zxl</groupId>
<artifactId>spark</artifactId>
<version>1.0-SNAPSHOT</version>

<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>
</project>

scala代码:

package com.zxl.spark

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object WordCountDemo01 {
def main(args: Array[String]): Unit = {
//创建spark运行
val sparkConf:SparkConf = new SparkConf().setMaster("local").setAppName("wc")
val sparkContext:SparkContext = new SparkContext(sparkConf)
val fileRDD:RDD[String] = sparkContext.textFile("input/word.txt")
val wordRDD:RDD[String] = fileRDD.flatMap(_.split(" "))
val word2OneRDD:RDD[(String,Int)] = wordRDD.map((_, 1))
val word2CountRDD:RDD[(String,Int)] = word2OneRDD.reduceByKey(_ + _)
val word2Count:Array[(String,Int)] = word2CountRDD.collect()
word2Count.foreach(println)
sparkContext.stop()
}

}

word.txt:

hello world
hello spark
hello zxl
hello scala
hello java
hello golang
hello python
hello spark

运行结果:

spark3.0开发WordCount程序完整代码_apache_02