Flink Standalone Cluster
组件 | 版本 | 下载地址 |
JDK | 1.8 | |
Flink | 1.10.2 | |
环境要求
- Java 1.8.x或更高版本,
- ssh(必须运行 sshd 才能使用管理远程组件的 Flink 脚本)
1.解压flink
[root@master ~]#
tar -xzvf /chinaskills/flink-1.10.2-bin-scala_2.11.tgz -C /usr/local/src/
2、重命名为Flink
[root@master ~]#
mv /usr/local/src/flink-1.10.2 /usr/local/src/flink
3、配置环境变量并加载
[root@master ~]#
vim /root/.bash_profile
配置内容
export FLINK_HOME=/usr/local/src/flink
export PATH=$PATH:$FLINK_HOME/bin
加载环境变量
source /root/.bash_profile
4、配置flink-conf.yaml
[root@master ~]#
vim /usr/local/src/flink/conf/flink-conf.yaml
# jobmanager的地址
jobmanager.rpc.address: master
# jobmanager的端口
jobmanager.rpc.port: 6123
# 每个jobmanager的内存可用量
jobmanager.heap.size: 1024m
# 每个 TaskManager 的可用内存量
taskmanager.memory.process.size: 1728m
# 每台机器的可用 CPU 数量
taskmanager.numberOfTaskSlots: 1
# 集群中的 CPU 总数
parallelism.default: 1
# 临时目录
io.tmp.dirs: /usr/local/src/flink/tmp
mkdir -p /usr/local/src/flink/tmp
5、配置Slaves
[root@master ~]#
vim /usr/local/src/flink/conf/slaves
slave1
slave2
6、分发flink
[root@master ~]#
scp -r /usr/local/src/flink root@slave1:/usr/local/src/
scp -r /usr/local/src/flink root@slave2:/usr/local/src/
7、启动集群
[root@master ~]#
start-cluster.sh
8、运行测试任务
代码部分
package wordcount
import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.api.scala._
/**
* 流式WordCount
*/
object StreamWordCount {
def main(args: Array[String]): Unit = {
// 从外部命令提取参数,作为socket的主机名和端口
val parameterTool: ParameterTool = ParameterTool.fromArgs(args)
val host: String = parameterTool.get("host")
val port: Int = parameterTool.getInt("port")
//创建一个流式执行环境
val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
// 通过socket接收远程服务器数据
val dataStream: DataStream[String] = environment.socketTextStream(host, port)
// 进行数据转换
val result: DataStream[(String, Int)] = dataStream
.flatMap(_.split(" "))
.filter(_.nonEmpty) //过滤空字符串
.map((_, 1))
.keyBy(0)
.sum(1)
// 打印输出
result.print()
// 启动任务执行
environment.execute("Stream Word Count")
}
}
打完jar包之后命令行运行
服务器操作:
[root@master ~]# nc -lk 7777
hello word
flume hadoop
hello word
flume hadoop
flink run -c wordcount.StreamWordCount -p 2 /root/target/FlinkProject-1.0-SNAPSHOT-jar-with-dependencies.jar --host 192.168.222.201 -port 7777
9、Flink on Yarn
9.1、配置hadoop环境
[root@master ~]#
vim /root/.bash_profile
配置内容:
export HADOOP_CLASSPATH=`hadoop classpath`
source /root/.bash_profile
9.2、检测hadoop classpath环境
echo $HADOOP_CLASSPATH
flink run -m yarn-cluster -e yarn-per-job /opt/flink/examples/batch/WordCount.jar
class path环境
echo $HADOOP_CLASSPATH
flink run -m yarn-cluster -e yarn-per-job /opt/flink/examples/batch/WordCount.jar