Flink Standalone Cluster

组件

版本

下载地址

JDK

1.8

Download JDK

Flink

1.10.2

Download Flink

环境要求

  • Java 1.8.x或更高版本,
  • ssh(必须运行 sshd 才能使用管理远程组件的 Flink 脚本)

1.解压flink

[root@master ~]#

tar -xzvf /chinaskills/flink-1.10.2-bin-scala_2.11.tgz -C /usr/local/src/

2、重命名为Flink

[root@master ~]#

mv /usr/local/src/flink-1.10.2 /usr/local/src/flink

3、配置环境变量并加载

[root@master ~]#

vim /root/.bash_profile

配置内容

export FLINK_HOME=/usr/local/src/flink
export PATH=$PATH:$FLINK_HOME/bin

加载环境变量

source /root/.bash_profile

4、配置flink-conf.yaml

[root@master ~]#

vim /usr/local/src/flink/conf/flink-conf.yaml
# jobmanager的地址
jobmanager.rpc.address: master
# jobmanager的端口
jobmanager.rpc.port: 6123
# 每个jobmanager的内存可用量
jobmanager.heap.size: 1024m
# 每个 TaskManager 的可用内存量
taskmanager.memory.process.size: 1728m
# 每台机器的可用 CPU 数量
taskmanager.numberOfTaskSlots: 1
# 集群中的 CPU 总数
parallelism.default: 1
# 临时目录
io.tmp.dirs: /usr/local/src/flink/tmp
mkdir -p /usr/local/src/flink/tmp

5、配置Slaves

[root@master ~]#

vim /usr/local/src/flink/conf/slaves
slave1
slave2

6、分发flink

[root@master ~]#

scp -r /usr/local/src/flink root@slave1:/usr/local/src/
scp -r /usr/local/src/flink root@slave2:/usr/local/src/

7、启动集群

[root@master ~]#

start-cluster.sh

8、运行测试任务

代码部分

package wordcount

import org.apache.flink.api.java.utils.ParameterTool
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.api.scala._

/**
 * 流式WordCount
 */
object StreamWordCount {
  def main(args: Array[String]): Unit = {
    // 从外部命令提取参数,作为socket的主机名和端口
    val parameterTool: ParameterTool = ParameterTool.fromArgs(args)
    val host: String = parameterTool.get("host")
    val port: Int = parameterTool.getInt("port")
    //创建一个流式执行环境
    val environment: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    // 通过socket接收远程服务器数据
    val dataStream: DataStream[String] = environment.socketTextStream(host, port)
    // 进行数据转换
    val result: DataStream[(String, Int)] = dataStream
      .flatMap(_.split(" "))
      .filter(_.nonEmpty) //过滤空字符串
      .map((_, 1))
      .keyBy(0)
      .sum(1)
    // 打印输出
    result.print()
    // 启动任务执行
    environment.execute("Stream Word Count")
  }
}

打完jar包之后命令行运行

服务器操作:

[root@master ~]# nc -lk 7777
hello word
flume hadoop
hello word
flume hadoop
flink run -c wordcount.StreamWordCount -p 2 /root/target/FlinkProject-1.0-SNAPSHOT-jar-with-dependencies.jar --host 192.168.222.201  -port  7777

9、Flink on Yarn

9.1、配置hadoop环境

[root@master ~]#

vim /root/.bash_profile

配置内容:

export HADOOP_CLASSPATH=`hadoop classpath`
source /root/.bash_profile

9.2、检测hadoop classpath环境

echo $HADOOP_CLASSPATH
flink run -m yarn-cluster -e yarn-per-job  /opt/flink/examples/batch/WordCount.jar

class path环境

echo $HADOOP_CLASSPATH
flink run -m yarn-cluster -e yarn-per-job  /opt/flink/examples/batch/WordCount.jar