【Structured Streaming】-- 输出模式
原创
©著作权归作者所有:来自51CTO博客作者high2011的原创作品,请联系作者获取转载授权,否则将追究法律责任
1、环境
- spark 2.4.0
- scala 2.11.8
- jdk 1.8
- maven
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.4.0</version>
</dependency>
2、源码说明
package org.apache.spark.sql.streaming;
import org.apache.spark.annotation.InterfaceStability;
import org.apache.spark.sql.catalyst.streaming.InternalOutputModes;
/**
* OutputMode describes what data will be written to a streaming sink when there is
* new data available in a streaming DataFrame/Dataset.
*
* @since 2.0.0
*/
@InterfaceStability.Evolving
public class OutputMode {
/**
* OutputMode in which only the new rows in the streaming DataFrame/Dataset will be
* written to the sink. This output mode can be only be used in queries that do not
* contain any aggregation.
*
* @since 2.0.0
*/
public static OutputMode Append() {
return InternalOutputModes.Append$.MODULE$;
}
/**
* OutputMode in which all the rows in the streaming DataFrame/Dataset will be written
* to the sink every time there are some updates. This output mode can only be used in queries
* that contain aggregations.
*
* @since 2.0.0
*/
public static OutputMode Complete() {
return InternalOutputModes.Complete$.MODULE$;
}
/**
* OutputMode in which only the rows that were updated in the streaming DataFrame/Dataset will
* be written to the sink every time there are some updates. If the query doesn't contain
* aggregations, it will be equivalent to `Append` mode.
*
* @since 2.1.1
*/
public static OutputMode Update() {
return InternalOutputModes.Update$.MODULE$;
}
}
3、区别
- Append :默认模式这种模式保证每一行只输出一次(假设是容错接收器)。只输出结果表中本批次新增的数据,即本批次中的数据。支持使用:select、where、map、flatMap、filter、join等的查询。
- Completed:每次触发后,输出最新的完整的结果表数据。 支持使用:聚合查询。
- Updated:(自Spark 2.1.1起可用)只输出结果表中被本批次修改的数据。
4、参考
http://spark.apache.org/docs/2.4.0/structured-streaming-programming-guide.html#output-modes