Architecture
Client连接其中的⼀个代理服务区然后发送查询请求给Proxy服务器,查询指定key所对应的状态数据,底层Flink按照KeyGroup的⽅式管理Keyed State,这些KeyGroup被分配给了所有的TaskMnager的服务。每个TaskManage服务多个KeyGroup状态的存储。为了找到查询key所在的KeyGroup所TaskManager服务,Proxy服务会去询问JobManager查询TaskManager的信息,然后直接访问TaskManager上的QueryableStateServer服务器获取状态数据,最后将获取的状态数据返回给Client端。
- QueryableStateClient- 运⾏在Flink集群以外,负责提交⽤户的查询给Flink集群。
- QueryableStateClientProxy- 运⾏在Flink集群中的TaskManager中的⼀个代理服务,负责接收客户端的查询,代理负责相应TaskManager获取请求的state,并将其state返回给客户端。
- QueryableStateServer -运⾏在Flink集群中的TaskManager中服务,仅仅负责读取当前TaskManage主机上存储到状态数据。
The client connects to one of the proxies and sends a request for the state associated with a specifickey, k . As stated in Working with State , keyed state is organized in Key Groups , and eachTaskManager is assigned a number of these key groups. To discover which TaskManager is responsible for the key group holding k , the proxy will ask the JobManager . Based on the answer, the proxy will then query the QueryableStateServer running on that TaskManager for the state associated with k , and forward the response back to the client.
激活 Queryable State
- 将Flink的 opt/flink-queryable-state-runtime_2.11-1.10.0.jar 拷贝到Flink的
lib/ ⽬录。
[root@CentOS flink-1.10.0]# cp opt/flink-queryable-state-runtime_2.11-1.10.0.jar lib/
- 在Flink的flink-conf.yaml配置⽂件中添加以下配置:
queryable-state.enable: true
- 重启Flink服务,为了校验服务是否开启你可以查看task manager⽇志,可以看到
"Started the Queryable State Proxy Server @ ..."
.
[root@CentOS flink-1.10.0]# ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host CentOS.
Starting taskexecutor daemon on host CentOS.
查看TaskManager启动⽇志
Making State Queryable
为了使State对外界可⻅,需要使⽤以下命令显式地使其可查询:
- 创建QueryableStateStream,该QueryableStateStream充当⼀个Sink的输出,仅仅是将数据存储到state中。
- 或者stateDescriptor.setQueryable(String queryableStateName)⽅法使得我们的状态可查询。
Queryable State Stream
⽤户可以调⽤keyedstream的.asQueryableState(stateName, stateDescriptor) ⽅法,提供⼀个可以查询状态。
// ValueState
QueryableStateStream asQueryableState(
String queryableStateName,
ValueStateDescriptor stateDescriptor)
// Shortcut for explicit ValueStateDescriptor variant
QueryableStateStream asQueryableState(String queryableStateName)
// FoldingState
QueryableStateStream asQueryableState(
String queryableStateName,
FoldingStateDescriptor stateDescriptor)
// ReducingState
QueryableStateStream asQueryableState(
String queryableStateName,
ReducingStateDescriptor stateDescriptor)
返回的QueryableStateStream可以看作是⼀个Sink,因为⽆法对QueryableStateStream进⼀步转换。在内部,QueryableStateStream被转换为运算符,该运算符使⽤所有传⼊记录来更新可查询状态实例。更新逻辑由asQueryableState调⽤中提供的StateDescriptor的类型隐含。在类似以下的程序中,keyedstream的所有记录将通过ValueState.update(value)⽤于更新状态实例:
stream.keyBy(0).asQueryableState("query-name")
object FlinkWordCountQueryableStream {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
//间隔5s执⾏⼀次checkpoint 精准⼀次
env.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE)
//设置检查点超时 4s
env.getCheckpointConfig.setCheckpointTimeout(4000)
//开启本次检查点 与上⼀次完成的检查点时间间隔不得⼩于 2s 优先级⾼于 checkpoint interval
env.getCheckpointConfig.setMinPauseBetweenCheckpoints(2000)
//如果检查点失败,任务宣告退出 setFailOnCheckpointingErrors(true)
env.getCheckpointConfig.setTolerableCheckpointFailureNumber(0)
//设置如果任务取消,系统该如何处理检查点数据
//RETAIN_ON_CANCELLATION:如果取消任务的时候,没有加--savepoint,系统会保留检查点数据
//DELETE_ON_CANCELLATION:取消任务,⾃动是删除检查点(不建议使⽤)
env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.R
ETAIN_ON_CANCELLATION)
var rsd=new ReducingStateDescriptor[(String,Int)]("reducestate",new
ReduceFunction[(String, Int)] {
override def reduce(v1: (String, Int), v2: (String, Int)): (String, Int) = {
(v1._1,(v1._2+v2._2))
}
},createTypeInformation[(String,Int)])
env.socketTextStream("CentOS", 9999)
.flatMap(line => line.split("\\s+"))
.map(word => (word, 1))
.keyBy(0)
.asQueryableState("wordcount", rsd)//状态名字,后期查询需要
//5.执⾏流计算任务
env.execute("Stream WordCount")
}
}
Managed Keyed State
class WordCountMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
var vs:ValueState[Int]=_
override def open(parameters: Configuration): Unit = {
//1.创建对应状态描述符
val vsd = new ValueStateDescriptor[Int]("wordcount", createTypeInformation[Int])
vsd.setQueryable("query-wc")
//2.获取RuntimeContext
var context: RuntimeContext = getRuntimeContext
//3.获取指定类型状态
vs=context.getState(vsd)
}
override def map(value: (String, Int)): (String, Int) = {
//获取历史值
val historyData = vs.value()
//更新状态
vs.update(historyData+value._2)
//返回最新值
(value._1,vs.value())
}
}
object FlinkWordCountQueryable {
def main(args: Array[String]): Unit = {
//1.创建流计算执⾏环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
//2.创建DataStream - 细化
val text = env.socketTextStream("CentOS", 9999)
//3.执⾏DataStream的转换算⼦
val counts = text.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.map(new WordCountMapFunction)
//4.将计算的结果在控制打印
counts.print()
//5.执⾏流计算任务
env.execute("Stream WordCount")
}
}
Querying State
- 引入依赖:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>1.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-queryable-state-client-java</artifactId>
<version>1.10.0</version>
</dependency>
- 查询代码如下:
//链接proxy服务器
val client = new QueryableStateClient("CentOS", 9069)
var jobID=JobID.fromHexString("dc60cd61dc2d591014c062397e3bd6b9")
var queryName="wordcount" //状态名字
var queryKey="this" //⽤户需要查询的key
var rsd=new ReducingStateDescriptor[(String,Int)]("reducestate",new
ReduceFunction[(String, Int)] {
override def reduce(v1: (String, Int), v2: (String, Int)): (String, Int) = {
(v1._1,(v1._2+v2._2))
}
},createTypeInformation[(String,Int)])
val resultFuture = client.getKvState(jobID, queryName, queryKey,
createTypeInformation[String], rsd)
//同步获取结果
val state: ReducingState[(String, Int)] = resultFuture.get()
println("结果:"+state.get())
client.shutdownAndWait()
异步获取结果
resultFuture.thenAccept(new Consumer[ReducingState[(String, Int)]] {
override def accept(t: ReducingState[(String, Int)]): Unit = {
println("结果:"+t.get())
}
})
Thread.sleep(10000)
client.shutdownAndWait()