目录

  • 前言
  • 搭建环境准备
  • 开始搭建
  • 新建解压目录
  • 解压安装包到指定目录
  • 配置各个软件环境变量
  • 配置Hadoop分布式文件系统
  • 配置Spark
  • 配置Flink
  • 配置Kafka
  • 分发
  • 分发环境变量文件
  • 分发配置完毕的文件
  • 修改从机配置文件
  • 启动各个软件
  • 启动Hadoop集群
  • 初始化namenode文件系统
  • 启动dfs分布式文件系统
  • WebUI界面
  • 启动yarn资源调度系统
  • WebUI界面
  • 启动Spark
  • WebUI界面
  • 启动Flink
  • WebUI界面
  • 启动Kafka
  • Kafka说明
  • 启动zookeeper
  • 启动Kafka
  • 配置Flume
  • 解压
  • 配置
  • 测试
  • 其他软件说明


前言

  • 竞赛技术平台软件、要求 [第一阶段]

hadoop 免费发行版_hadoop


hadoop 免费发行版_spark_02

  1. 笔者环境:
  • 环境系统:Centos 7,【三台:Master,Slave1,Slave2】
  • 操作系统:Windows 11 家庭版 [21H2]
  • 操作工具:Xshell 7VMware 16【创建虚机】
  • 搭建Hadoop环境:完全分布式
  1. 环境用软件:

天翼云盘下载:点我跳转


百度云盘下载:点我跳转[密码6273]



  • apache-flume-1.7.0-bin.tar.gz
  • apache-hive-2.3.4-bin.tar.gz
  • flink-1.10.2-bin-scala_2.11.tgz
  • hadoop-2.7.7.tar.gz
  • jdk-8u291-linux-x64.tar.gz
  • kafka_2.11-2.0.0.tgz
  • mysql-5.7.34-1.el7.x86_64.rpm-bundle.tar
  • mysql-5.7.34-el7-x86_64.tar.gz
  • mysql-connector-java-5.1.49.jar
  • redis-4.0.1.tar.gz
  • scala-2.11.8.tgz
  • spark-2.1.1-bin-hadoop2.7.tgz
  1. 笔者操作习惯:
  • tar 包存放于/usr/tar/文件夹下
  • 解压后的软件包存放于/usr/apps/文件夹下
  • 本文采用关闭防火墙方式
# 三台主机都要执行
systemctl stop firewalld
  • 笔者IP为局域网

搭建环境准备

  1. 给主机改名
  • Master节点 - [节点命名无要求]
# Master节点
hostnamectl set-hostname master
# 刷新一下
bash
# 结果
[root@master ~]#
  • Slave1节点 - [节点命名无要求]
# Slave1节点
hostnamectl set-hostname slave1
# 刷新一下
bash
# 结果
[root@slave1 ~]#
  • Slave2节点 - [节点命名无要求]
# Slave2节点
hostnamectl set-hostname slave2
# 刷新一下
bash
# 结果
[root@slave2 ~]#
  1. 安装必备插件软件

三台主机都要安装

# 安装彩色编辑命令
yum install -y vim
# 安装自动校准时间服务器
yum install -y ntp
# 安装MySQL所需要的网络工具
yum install -y net-tools
# 安装上传命令
yum install -y lrzsz

Master主机单独安装

# 安装C编译打包命令,Redis需要用到
yum install -y gcc
  1. 修改IP映射文件分发给其他两台主机
  • 修改
vim /etc/hosts
  • 示例 IP请改为自己的IP
[root@master ~]# vim /etc/hosts
# 加入以下内容:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.38.144 master
192.168.38.145 slave1
192.168.38.146 slave2
  • 分发
scp /etc/hosts slave1:/etc/
  • 注意

笔者的Linux系统没有创建其他用户,如果创建了其他用户,需要将该文件发送到你用到的用户的目录下,一般为root,书写方式请加上用户名@slave1:/etc/


  • 示例
# 分发给Slave1主机
[root@master ~]# scp /etc/hosts slave1:/etc/

# ----------分割线----------

# 分发给Slave2主机
[root@master ~]# scp /etc/hosts slave2:/etc/
  1. 配置三台主机免密互通
  • 三台主机生成密钥 [一路回车]
# Master主机生成密钥
[root@master ~]# ssh-keygen -t rsa

# ----------分割线----------

# Slave1主机生成密钥
[root@slave1 ~]# ssh-keygen -t rsa

# ----------分割线----------

# Slave2主机生成密钥
[root@slave2 ~]# ssh-keygen -t rsa
  • 示例 [仅Master,实际操作要配置三台]
[root@master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
bf:cb:43:5b:6e:7f:17:66:d2:c0:b4:71:f7:0d:1a:22 root@master
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|        E . .o...|
|         . .oo+.+|
|            .+  o|
|        S     o  |
|         .. .. = |
|         ..+  + .|
|         .o.o   o|
|          ++ ....|
+-----------------+
  • 三台主机将公钥发送到master主机
  • 注意

不管是Linux主机中操作,还是使用Xshell工具操作,输入密码是没有回显的,故看不到输入的密码

# Master主机
ssh-copy-id master

# Slave1主机
ssh-copy-id master

# Slave2主机
ssh-copy-id master
  • 示例 [仅Master,实际操作需配置三台]
[root@master ~]# ssh-copy-id master
The authenticity of host 'master (192.168.38.141)' can't be established.
ECDSA key fingerprint is 37:7c:ab:d9:86:14:b2:fe:9c:17:3d:5d:3a:ff:ce:c1.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@master's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'master'"
and check to make sure that only the key(s) you wanted were added.
  • Master主机将公钥发送到另外两台主机
  • 注意

密钥默认的存放地址为

/root/.ssh/

,上方生成密钥中,以及操作生成密钥时均可查看详细信息

公钥合并文件为

/root/.ssh/

下的

authorized_keys

# 将汇总到Master主机的公钥发送到Slave1
[root@master ~]# scp /root/.ssh/authorized_keys slave1:/root/.ssh/

# ----------分割线----------

# 将汇总到Master主机的公钥发送到Slave2
[root@master ~]# scp /root/.ssh/authorized_keys slave2:/root/.ssh/
  • 示例
[root@master ~]# scp /root/.ssh/authorized_keys slave1:/root/.ssh/
root@slave1's password: 
authorized_keys                                                                                                                                            100% 1179     1.2KB/s   00:00    

[root@master ~]# scp /root/.ssh/authorized_keys slave2:/root/.ssh/
root@slave2's password: 
authorized_keys
  • 测试免密 [不需要输入密码为成功]

退出为exit

ssh master
ssh slave1
ssh slave2
  1. 上传环境所需软件至Master主机
  • 解释

将上方提到的软件上传到

Master主机

/usr/tar/

,上方生成密钥中,以及操作生成密钥时均可查看详细信息


Xshell用户可进入到

/usr/tar

目录后选中全部包

拖动到Xshell窗口


因后期需要使用

Master主机

分发给其他

两台从机

故只需将包上传到Master主机仅可,减少不必要的流量消耗和文件传输时间

  • 上传完毕
[root@master tar]# pwd
/usr/tar
[root@master tar]# ll
总用量 1732092
-rw-r--r--. 1 root root  55711670 10月 19 21:42 apache-flume-1.7.0-bin.tar.gz
-rw-r--r--. 1 root root 232234292 10月 19 21:42 apache-hive-2.3.4-bin.tar.gz
-rw-r--r--. 1 root root 289890742 11月 21 18:10 flink-1.10.2-bin-scala_2.11.tgz
-rw-r--r--. 1 root root 218720521 10月 19 21:42 hadoop-2.7.7.tar.gz
-rw-r--r--. 1 root root 144935989 10月 19 21:42 jdk-8u291-linux-x64.tar.gz
-rw-r--r--. 1 root root  55751827 10月 19 21:42 kafka_2.11-2.0.0.tgz
-rw-r--r--. 1 root root 543856640 10月 19 21:43 mysql-5.7.34-1.el7.x86_64.rpm-bundle.tar
-rw-r--r--. 1 root root   1006904 10月 19 21:44 mysql-connector-java-5.1.49.jar
-rw-r--r--. 1 root root   1711660 10月 19 21:44 redis-4.0.1.tar.gz
-rw-r--r--. 1 root root  28678231 11月  9 18:55 scala-2.11.8.tgz
-rw-r--r--. 1 root root 201142612 10月 19 21:41 spark-2.1.1-bin-hadoop2.7.tgz



若使用

VMware

的用户此时可以建立

“快照”

了,搭建完之后可随时恢复重新练习



开始搭建

新建解压目录

  • 软件包解压后存放位置
mkdir -p /usr/apps/
  • 解释

Hive、MySQL

Flume

仅需在Master主机上,故暂时不解压,方便发送传输速度,后面会单独

Master

主机上安装


解压安装包到指定目录

[root@master tar]# tar -zxf jdk-8u291-linux-x64.tar.gz -C /usr/apps/
[root@master tar]# tar -zxf hadoop-2.7.7.tar.gz -C /usr/apps/
[root@master tar]# tar -zxf scala-2.11.8.tgz -C /usr/apps/
[root@master tar]# tar -zxf spark-2.1.1-bin-hadoop2.7.tgz -C /usr/apps/
[root@master tar]# tar -zxf flink-1.10.2-bin-scala_2.11.tgz -C /usr/apps/
[root@master tar]# tar -zxf kafka_2.11-2.0.0.tgz -C /usr/apps/

配置各个软件环境变量

  1. 编辑环境变量文件
vim /etc/profile
  • 示例
[root@master apps]# vim /etc/profile
  1. 修改内容如下 [文章末添加]大写"G"快速定位文章底部
  • 示例 [文件尾部]
for i in /etc/profile.d/*.sh ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then
            . "$i"
        else
            . "$i" >/dev/null
        fi
    fi
done

unset i
unset -f pathmunge

# JAVA_HOME
export JAVA_HOME=/usr/apps/jdk1.8.0_291
export PATH=$JAVA_HOME/bin:$PATH

# HADOOP_HOME
export HADOOP_HOME=/usr/apps/hadoop-2.7.7
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

# SCALA_HOME
export SCALA_HOME=/usr/apps/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH

# SPARK_HOME
export SPARK_HOME=/usr/apps/spark-2.1.1-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH

# FLINK_HOME
export FLINK_HOME=/usr/apps/flink-1.10.2
export PATH=$FLINK_HOME/bin:$PATH

# KAFKA_HOME
export KAFKA_HOME=/usr/apps/kafka_2.11-2.0.0
export PATH=$KAFKA_HOME/bin:$PATH

配置Hadoop分布式文件系统

  1. 进入Hadoop配置文件路径
cd /usr/apps/hadoop-2.7.7/etc/hadoop/
  • 示例
[root@master apps]# cd /usr/apps/hadoop-2.7.7/etc/hadoop/
[root@master hadoop]# ll
总用量 152
-rw-r--r--. 1 1000 ftp  4436 7月  19 2018 capacity-scheduler.xml
-rw-r--r--. 1 1000 ftp  1335 7月  19 2018 configuration.xsl
-rw-r--r--. 1 1000 ftp   318 7月  19 2018 container-executor.cfg
-rw-r--r--. 1 1000 ftp   774 7月  19 2018 core-site.xml
-rw-r--r--. 1 1000 ftp  3670 7月  19 2018 hadoop-env.cmd
-rw-r--r--. 1 1000 ftp  4224 7月  19 2018 hadoop-env.sh
-rw-r--r--. 1 1000 ftp  2598 7月  19 2018 hadoop-metrics2.properties
-rw-r--r--. 1 1000 ftp  2490 7月  19 2018 hadoop-metrics.properties
-rw-r--r--. 1 1000 ftp  9683 7月  19 2018 hadoop-policy.xml
-rw-r--r--. 1 1000 ftp   775 7月  19 2018 hdfs-site.xml
-rw-r--r--. 1 1000 ftp  1449 7月  19 2018 httpfs-env.sh
-rw-r--r--. 1 1000 ftp  1657 7月  19 2018 httpfs-log4j.properties
-rw-r--r--. 1 1000 ftp    21 7月  19 2018 httpfs-signature.secret
-rw-r--r--. 1 1000 ftp   620 7月  19 2018 httpfs-site.xml
-rw-r--r--. 1 1000 ftp  3518 7月  19 2018 kms-acls.xml
-rw-r--r--. 1 1000 ftp  1527 7月  19 2018 kms-env.sh
-rw-r--r--. 1 1000 ftp  1631 7月  19 2018 kms-log4j.properties
-rw-r--r--. 1 1000 ftp  5540 7月  19 2018 kms-site.xml
-rw-r--r--. 1 1000 ftp 11801 7月  19 2018 log4j.properties
-rw-r--r--. 1 1000 ftp   951 7月  19 2018 mapred-env.cmd
-rw-r--r--. 1 1000 ftp  1383 7月  19 2018 mapred-env.sh
-rw-r--r--. 1 1000 ftp  4113 7月  19 2018 mapred-queues.xml.template
-rw-r--r--. 1 1000 ftp   758 7月  19 2018 mapred-site.xml.template
-rw-r--r--. 1 1000 ftp    10 7月  19 2018 slaves
-rw-r--r--. 1 1000 ftp  2316 7月  19 2018 ssl-client.xml.example
-rw-r--r--. 1 1000 ftp  2697 7月  19 2018 ssl-server.xml.example
-rw-r--r--. 1 1000 ftp  2250 7月  19 2018 yarn-env.cmd
-rw-r--r--. 1 1000 ftp  4567 7月  19 2018 yarn-env.sh
-rw-r--r--. 1 1000 ftp   690 7月  19 2018 yarn-site.xml
  1. 复制mapred-site.xml.template模板为mapred-site.xml
cp mapred-site.xml.template mapred-site.xml.template
  • 示例
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml.template
  1. 编辑hadoop-env.sh
vim hadoop-env.sh
  • 25行的JAVA_HOME需要更改 [:set nu]为显示行号,且该命令下文不再提示
  • 示例
19 # The only required environment variable is JAVA_HOME.  All others are
 20 # optional.  When running a distributed configuration it is best to
 21 # set JAVA_HOME in this file, so that it is correctly defined on
 22 # remote nodes.
 23 
 24 # The java implementation to use.
 25 export JAVA_HOME=/usr/apps/jdk1.8.0_291
 26 
 27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
 28 # that bind to privileged ports to provide authentication of data transfer
 29 # protocol.  Jsvc is not required if SASL is configured for authentication of
 30 # data transfer protocol using non-privileged ports.
 31 #export JSVC_HOME=${JSVC_HOME}
  1. 编辑core-site.xml
vim core-site.xml
  • 你可能需要用到的代码模板 [ 便于复制 ]
  • Hadoop配置文件中几乎都会用到该模板 [ 真实比赛不会提供 ]
<property>
		<name></name>
		<value></value>
	</property>
  • 示例 [ 文章末添加,注意标签 ]
<configuration>

	<property>
		<!-- 指定HDFS中NameNode的地址-->
		<name>fs.default.name</name>
		<value>hdfs://master:9000</value>
	</property>

	<property>
		<!-- 指定Hadoop运行时产生文件的存储目录-->
		<name>hadoop.tmp.dir</name>
		<value>/usr/apps/data/hadoop</value>
	</property>
	
</configuration>
  1. 编辑hdfs-site.xml
vim hdfs-site.xml
  • 示例 [ 文章末添加,注意标签 ] 如下副本数,根据实际操作选择几台
<configuration>

	<property>
	<!-- 指定文件副本数 -->
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	
	<property>
		<!-- 指定secondary主机和端口 -->
		<!-- secondary:辅助管理namenode主节点 -->
		<name>dfs.namenode.secondary.http-address</name>
		<value>slave1:50090</value>
	</property>
	
</configuration>
  1. 编辑mapred-site.xml
vim mapred-site.xml
  • 示例 [ 文章末添加,注意标签 ]
<configuration>

	<property>
		<!-- 指定MapReduce运行时框架,这里指定在Yarn上,默认是local -->
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	
</configuration>
  1. 编辑yarn-site.xml
vim yarn-site.xml
  • 示例 [ 文章末添加,注意标签 ]
<configuration>

	<!-- Site specific YARN configuration properties -->
	<property>
		<!-- yarn的主节点在master主机上 -->
		<name>yarn.resourcemanager.hostname</name>
		<value>master</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	
</configuration>
  1. 编辑slaves
vim slaves
  • 示例 [ 文章末添加,注意标签 ]
master
slave1
slave2





配置Spark

  1. 进入Spark配置文件路径
cd /usr/apps/spark-2.1.1-bin-hadoop2.7/conf/
  • 示例
[root@master hadoop]# cd /usr/apps/spark-2.1.1-bin-hadoop2.7/conf/
[root@master conf]# ll
总用量 32
-rw-r--r--. 1 500 500  987 4月  26 2017 docker.properties.template
-rw-r--r--. 1 500 500 1105 4月  26 2017 fairscheduler.xml.template
-rw-r--r--. 1 500 500 2025 4月  26 2017 log4j.properties.template
-rw-r--r--. 1 500 500 7313 4月  26 2017 metrics.properties.template
-rw-r--r--. 1 500 500  865 4月  26 2017 slaves.template
-rw-r--r--. 1 500 500 1292 4月  26 2017 spark-defaults.conf.template
-rwxr-xr-x. 1 500 500 3960 4月  26 2017 spark-env.sh.template
  1. 复制spark-env.sh.template模板为spark-env.sh
cp spark-env.sh.template spark-env.sh
  • 示例
[root@master conf]# cp spark-env.sh.template spark-env.sh
  1. 编辑spark-env.sh
vim spark-env.sh
  • 示例 [ 文章末添加 ]
# 各个软件的路径,不再细说
export JAVA_HOME=/usr/apps/jdk1.8.0_291
export HADOOP_HOME=/usr/apps/hadoop-2.7.7
export HADOOP_CONF_DIR=/usr/apps/hadoop-2.7.7/etc/hadoop
export SCALA_HOME=/usr/apps/scala-2.11.8
# Spark的主机IP,写IP同理
export SPARK_MASTER_IP=master
# Spark的内存
export SPARK_WORKER_MEMORY=8G
# Spark的核心数
export SPARK_WORKER_CORES=4
# 每台机器的实例化机Worker,如果写2,那么就是每台从机两个Worker进程
export SPARK_WORKER_INSTANCES=1
  1. 编辑slaves
  • 有心的读者可能看到有一个文件:slaves.template无需复制该模板文件,直接新建编辑slaves文件即可
vim slaves
  • 示例
slave1
slave2





配置Flink

  1. 进入Flink配置文件路径
cd /usr/apps/flink-1.10.2/conf
  • 示例
[root@master hadoop]# cd /usr/apps/flink-1.10.2/conf
[root@master conf]# ll
总用量 60
-rw-r--r--. 1 root root 10202 8月  15 2020 flink-conf.yaml
-rw-r--r--. 1 root root  2138 8月  15 2020 log4j-cli.properties
-rw-r--r--. 1 root root  1884 8月  15 2020 log4j-console.properties
-rw-r--r--. 1 root root  1939 8月  15 2020 log4j.properties
-rw-r--r--. 1 root root  1709 8月  15 2020 log4j-yarn-session.properties
-rw-r--r--. 1 root root  2294 8月  15 2020 logback-console.xml
-rw-r--r--. 1 root root  2331 8月  15 2020 logback.xml
-rw-r--r--. 1 root root  1550 8月  15 2020 logback-yarn.xml
-rw-r--r--. 1 root root    15 8月  15 2020 masters
-rw-r--r--. 1 root root    10 8月  15 2020 slaves
-rw-r--r--. 1 root root  5424 8月  15 2020 sql-client-defaults.yaml
-rw-r--r--. 1 root root  1434 8月  15 2020 zoo.cfg
  1. 编辑flink-conf.yaml
vim flink-conf.yaml
  • 示例 [ 内存和TaskSlots根据要求更改 ]
# JobManager runs.
# rpc通信地址
jobmanager.rpc.address: master

# The RPC port where the JobManager is reachable.
# rpc端口
jobmanager.rpc.port: 6123


# The heap size for the JobManager JVM
# 资源调度内存
jobmanager.heap.size: 2048m


# The total process memory size for the TaskManager.
#
# Note this accounts for all memory usage within the TaskManager process, including JVM metaspace and other overhead.
# 任务运行内存,该内存可尽量大一点
taskmanager.memory.process.size: 4096m

# To exclude JVM metaspace and overhead, please, use total Flink memory size instead of 'taskmanager.memory.process.size'.
# It is not recommended to set both 'taskmanager.memory.process.size' and Flink memory.
#
# taskmanager.memory.flink.size: 1280m

# The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline.
# 任务并行度 [ “插槽” ]
taskmanager.numberOfTaskSlots: 5

# The parallelism used for programs that did not specify and other parallelism.
# 默认并行度:代码 > WebUI界面 > 默认
parallelism.default: 3

# The default file system scheme and authority.
  1. 编辑masters
vim masters
  • 示例
master:8081
  1. 编辑slaves
vim slaves
  • 示例
slave1
slave2

配置Kafka

  1. 进入Kafka配置文件路径
cd /usr/apps/kafka_2.11-2.0.0/config/
  • 示例
[root@master config]# pwd
/usr/apps/kafka_2.11-2.0.0/config
[root@master config]# ll
总用量 68
-rw-r--r--. 1 root root  906 7月  24 2018 connect-console-sink.properties
-rw-r--r--. 1 root root  909 7月  24 2018 connect-console-source.properties
-rw-r--r--. 1 root root 5321 7月  24 2018 connect-distributed.properties
-rw-r--r--. 1 root root  883 7月  24 2018 connect-file-sink.properties
-rw-r--r--. 1 root root  881 7月  24 2018 connect-file-source.properties
-rw-r--r--. 1 root root 1111 7月  24 2018 connect-log4j.properties
-rw-r--r--. 1 root root 2262 7月  24 2018 connect-standalone.properties
-rw-r--r--. 1 root root 1221 7月  24 2018 consumer.properties
-rw-r--r--. 1 root root 4727 7月  24 2018 log4j.properties
-rw-r--r--. 1 root root 1919 7月  24 2018 producer.properties
-rw-r--r--. 1 root root 6851 7月  24 2018 server.properties
-rw-r--r--. 1 root root 1032 7月  24 2018 tools-log4j.properties
-rw-r--r--. 1 root root 1169 7月  24 2018 trogdor.conf
-rw-r--r--. 1 root root 1023 7月  24 2018 zookeeper.properties
  1. 编辑server.properties
vim server.properties
  • 示例 [ 上半部分仅列出修改内容,下半部分贴出全部配置文件 ]
# 21行,作为Kafka的唯一标识,Slave1、Slave2要更改,后面
broker.id=1
# 31行,将注释放开,添加为本机[ Master ]IP,且Slave1,Slave2需要更改为“本机IP”或映射名称
listeners=PLAINTEXT://master:9092
# 添加host.name,其他机器需要改为“本机IP”或映射名称
# 此处的host.name为本机IP(重要),如果不改,则客户端会抛出:
# Producer connection to localhost:9092 unsuccessful 错误!
# 32行添加
host.name=master
# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
# 36行,打开注释,其他机器需要改为“本机IP”或映射名称
advertised.listeners=PLAINTEXT://master:9092
# 60行,Kafka的数据存盘位置,并不是log文件存放位置
log.dirs=/usr/apps/data/kafka-logs
# 65行,topic在当前broker上的分片个数,要求等于机器数量
num.partitions=3
# 74 ~ 76 行
# __consumer_offsets副本数量
offsets.topic.replication.factor=3
# 分区数
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
# 123行,zookeeper通信地址
zookeeper.connect=master:2181,slave1:2181,slave2:2181
  • 示例 [ 全部代码 ]
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

############################# Socket Server Settings #############################

#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://master:9092
host.name=master
# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
advertised.listeners=PLAINTEXT://master:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/usr/apps/data/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=3

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=master:2181,slave1:2181,slave2:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
  1. 编辑zookeeper.properties
vim zookeeper.properties
  • 示例
# 存放zookeeper唯一标识“myid”文件
dataDir=/usr/apps/data/zk/zkdata
# 存放zookeeper的log日志
dataLogDir=/usr/apps/data/zk/zklog
# the port at which the clients will connect
# zookeeper的通信端口
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
# 最大连接数,注释掉
# maxClientCnxns=0
# CS通信心跳数(毫秒/ms)
tickTime=2000
# LF初始通信时限
# 集群中的follower服务器(F)与leader服务器(L)之间 初始连接 时能容忍的最多心跳数(tickTime的数量)。
initLimit=10
# LF同步通信时限
# 集群中的follower服务器(F)与leader服务器(L)之间 请求和应答 之间能容忍的最多心跳数(tickTime的数量)。
syncLimit=5
# 各个主机地址和端口
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
  1. 新建文件夹:zkdata、zklog
mkdir -p /usr/apps/data/zk/zkdata
mkdir -p /usr/apps/data/zk/zklog
  • 示例
[root@master config]# mkdir -p /usr/apps/data/zk/zkdata
[root@master config]# mkdir -p /usr/apps/data/zk/zklog
  1. 向文件输入内容:输入内容自动创建文件
mkdir -p /usr/apps/data/zk/zkdata
mkdir -p /usr/apps/data/zk/zklog
  • 示例
# 将“1”输入到“myid”文件中,该值(1)为唯一值,其他两台从机需要更改。
echo 1 > /usr/apps/data/zk/zkdata/myid

# 查看“myid”文件中的内容
[root@master config]# cat /usr/apps/data/zk/zkdata/myid
1





分发

分发环境变量文件

scp /etc/profile slave1:/etc/
scp /etc/profile slave2:/etc/
  • 示例
[root@master ~]# scp /etc/profile slave1:/etc/
profile                                                                                                                                                      100% 2319     2.3KB/s   00:00    
[root@master ~]# scp /etc/profile slave2:/etc/
profile                                                                                                                                                      100% 2319     2.3KB/s   00:00

分发配置完毕的文件

  • 分发时间可能有点漫长,耐心等待。若出现需要输入密码,请重新配置免密!
scp -r /usr/apps/ slave1:/usr/
scp -r /usr/apps/ slave2:/usr/
  • 示例 [ 内容过多,不作详细展示 ]
[root@master ~]# scp -r /usr/apps/ slave1:/usr/
# 文件分发过程...
# 文件分发过程...
# 文件分发过程...
[root@master ~]# scp -r /usr/apps/ slave2:/usr/
# 文件分发过程...
# 文件分发过程...
# 文件分发过程...
  • 刷新使之生效 [ Slave1、Slave2操作 ]




修改从机配置文件

  1. 修改Kafka中server.properties配置文件
vim /usr/apps/kafka_2.11-2.0.0/config/server.properties
  • 示例 [ Slave1从机 ]
[root@slave1 ~]# vim /usr/apps/kafka_2.11-2.0.0/config/server.properties

# 更改文件如下
broker.id=2
listeners=PLAINTEXT://slave1:9092
host.name=slave1
advertised.listeners=PLAINTEXT://slave1:9092
  • 示例 [ Slave2从机 ]
[root@slave2 ~]# vim /usr/apps/kafka_2.11-2.0.0/config/server.properties

# 更改文件如下
broker.id=3
listeners=PLAINTEXT://slave2:9092
host.name=slave1
advertised.listeners=PLAINTEXT://slave2:9092
  1. 修改zookeeper的myid文件的值
vim /usr/apps/data/zk/zkdata/myid
  • 示例 [ Slave1从机 ]
[root@slave1 ~]# vim /usr/apps/data/zk/zkdata/myid
# 修改结果
[root@slave1 ~]# cat /usr/apps/data/zk/zkdata/myid 
2
  • 示例 [ Slave2从机 ]
[root@slave2 ~]# vim /usr/apps/data/zk/zkdata/myid
# 修改结果
[root@slave2 ~]# cat /usr/apps/data/zk/zkdata/myid 
3





启动各个软件

  • 操作之前请务必查看三台防火墙是否关闭!
# 查看防火墙状态
systemctl status firewalld
# 关闭防火墙
systemctl stop firewalld
# 开启防火墙
systemctl start firewalld

启动Hadoop集群

  1. 启动HDFS分布式文件系统
  • Master主机操作即可
初始化namenode文件系统
  • 仅第一次启动执行,在正常工作环境中此操作和"rm -rf /* "同等级别,请务必三思而后行。
hdfs namenode -format
  • 示例 [ 日志等很多信息可从中查看 ]
[root@master ~]# hdfs namenode -format
21/12/08 05:02:07 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/192.168.38.144
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.7
STARTUP_MSG:   classpath = /usr/apps/hadoop-2.7.7/etc/hadoop:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/activation-1.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/hadoop-annotations-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/xz-1.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/junit-4.11.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/hadoop-auth-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/avro-1.7.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/guava-11.0.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-io-2.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/paranamer-2.3.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jettison-1.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/asm-3.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jsch-0.1.54.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-net-3.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/gson-2.2.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/hadoop-common-2.7.7-tests.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/hadoop-nfs-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/common/hadoop-common-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/hadoop-hdfs-2.7.7-tests.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/hadoop-hdfs-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/activation-1.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/xz-1.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/guice-3.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/asm-3.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-server-common-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-client-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-common-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-api-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-registry-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.7-tests.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.7.jar:/usr/apps/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.7.jar:/usr/apps/hadoop-2.7.7/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = Unknown -r c1aad84bd27cd79c3d1a7dd58202a8c3ee1ed3ac; compiled by 'stevel' on 2018-07-18T22:47Z
STARTUP_MSG:   java = 1.8.0_291
************************************************************/
21/12/08 05:02:07 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
21/12/08 05:02:07 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-44e90c45-4082-4fd0-ae6b-3379c1b55cb2
21/12/08 05:02:08 INFO namenode.FSNamesystem: No KeyProvider found.
21/12/08 05:02:08 INFO namenode.FSNamesystem: fsLock is fair: true
21/12/08 05:02:08 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
21/12/08 05:02:08 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
21/12/08 05:02:08 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
21/12/08 05:02:08 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
21/12/08 05:02:08 INFO blockmanagement.BlockManager: The block deletion will start around 2021 十二月 08 05:02:08
21/12/08 05:02:08 INFO util.GSet: Computing capacity for map BlocksMap
21/12/08 05:02:08 INFO util.GSet: VM type       = 64-bit
21/12/08 05:02:08 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
21/12/08 05:02:08 INFO util.GSet: capacity      = 2^21 = 2097152 entries
21/12/08 05:02:08 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
21/12/08 05:02:08 INFO blockmanagement.BlockManager: defaultReplication         = 3
21/12/08 05:02:08 INFO blockmanagement.BlockManager: maxReplication             = 512
21/12/08 05:02:08 INFO blockmanagement.BlockManager: minReplication             = 1
21/12/08 05:02:08 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
21/12/08 05:02:08 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
21/12/08 05:02:08 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
21/12/08 05:02:08 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
21/12/08 05:02:08 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
21/12/08 05:02:08 INFO namenode.FSNamesystem: supergroup          = supergroup
21/12/08 05:02:08 INFO namenode.FSNamesystem: isPermissionEnabled = true
21/12/08 05:02:08 INFO namenode.FSNamesystem: HA Enabled: false
21/12/08 05:02:08 INFO namenode.FSNamesystem: Append Enabled: true
21/12/08 05:02:09 INFO util.GSet: Computing capacity for map INodeMap
21/12/08 05:02:09 INFO util.GSet: VM type       = 64-bit
21/12/08 05:02:09 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
21/12/08 05:02:09 INFO util.GSet: capacity      = 2^20 = 1048576 entries
21/12/08 05:02:09 INFO namenode.FSDirectory: ACLs enabled? false
21/12/08 05:02:09 INFO namenode.FSDirectory: XAttrs enabled? true
21/12/08 05:02:09 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
21/12/08 05:02:09 INFO namenode.NameNode: Caching file names occuring more than 10 times
21/12/08 05:02:09 INFO util.GSet: Computing capacity for map cachedBlocks
21/12/08 05:02:09 INFO util.GSet: VM type       = 64-bit
21/12/08 05:02:09 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
21/12/08 05:02:09 INFO util.GSet: capacity      = 2^18 = 262144 entries
21/12/08 05:02:09 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
21/12/08 05:02:09 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
21/12/08 05:02:09 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
21/12/08 05:02:09 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
21/12/08 05:02:09 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
21/12/08 05:02:09 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
21/12/08 05:02:09 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
21/12/08 05:02:09 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
21/12/08 05:02:09 INFO util.GSet: Computing capacity for map NameNodeRetryCache
21/12/08 05:02:09 INFO util.GSet: VM type       = 64-bit
21/12/08 05:02:09 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
21/12/08 05:02:09 INFO util.GSet: capacity      = 2^15 = 32768 entries
21/12/08 05:02:09 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1995135241-192.168.38.144-1638910929378
21/12/08 05:02:09 INFO common.Storage: Storage directory /usr/apps/data/hadoop/dfs/name has been successfully formatted.
21/12/08 05:02:09 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/apps/data/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
21/12/08 05:02:09 INFO namenode.FSImageFormatProtobuf: Image file /usr/apps/data/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
21/12/08 05:02:09 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/12/08 05:02:09 INFO util.ExitUtil: Exiting with status 0
21/12/08 05:02:09 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.38.144
************************************************************/
启动dfs分布式文件系统
start-dfs.sh
  • 示例 [ 日志等很多信息可从中查看 ]
[root@master ~]# start-dfs.sh 
Starting namenodes on [master]
master: starting namenode, logging to /usr/apps/hadoop-2.7.7/logs/hadoop-root-namenode-master.out
master: starting datanode, logging to /usr/apps/hadoop-2.7.7/logs/hadoop-root-datanode-master.out
slave2: starting datanode, logging to /usr/apps/hadoop-2.7.7/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /usr/apps/hadoop-2.7.7/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/apps/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-slave1.out
  • 查看进程 [ 三台主机 ]
jps
[root@master ~]# jps
5440 Jps
5108 NameNode
5237 DataNode
[root@slave1 ~]# jps
4562 Jps
4392 DataNode
4510 SecondaryNameNode
[root@slave2 jdk1.8.0_291]# jps
4133 DataNode
4216 Jps
WebUI界面
  • 在Windows主机运行时,若访问不到,下方Master替换为IP地址或在Windows的hosts文件添加IP映射
# 该目录下存在“hosts”文件
C:\Windows\System32\drivers\etc
  • 在Master本机运行时,若访问不到,查看配置文件是否有误:/etc/profile
http://master:50070

hadoop 免费发行版_flink_03

启动yarn资源调度系统
start-yarn.sh
  • 示例 [ 日志等很多信息可从中查看 ]
[root@master ~]# start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/apps/hadoop-2.7.7/logs/yarn-root-resourcemanager-master.out
slave1: starting nodemanager, logging to /usr/apps/hadoop-2.7.7/logs/yarn-root-nodemanager-slave1.out
slave2: starting nodemanager, logging to /usr/apps/hadoop-2.7.7/logs/yarn-root-nodemanager-slave2.out
master: starting nodemanager, logging to /usr/apps/hadoop-2.7.7/logs/yarn-root-nodemanager-master.out
  • 查看进程 [ 三台主机 ]
jps
[root@master ~]# jps
5108 NameNode
5492 ResourceManager
5237 DataNode
5878 Jps
5593 NodeManager
[root@slave1 ~]# jps
4643 NodeManager
4392 DataNode
4749 Jps
4510 SecondaryNameNode
[root@slave2 jdk1.8.0_291]# jps
4258 NodeManager
4133 DataNode
4359 Jps
WebUI界面
  • 在Windows主机运行时,若访问不到,下方Master替换为IP地址或在Windows的hosts文件添加IP映射
# 该目录下存在“hosts”文件
C:\Windows\System32\drivers\etc
  • 在Master本机运行时,若访问不到,查看配置文件是否有误:/etc/profile
http://master:8088

hadoop 免费发行版_kafka_04





启动Spark

  • 仅Master主机执行
  • 因Hadoop和Spark的启动命令一致,故环境变量中没有配置Spark的sbin目录
[root@master ~]# /usr/apps/spark-2.1.1-bin-hadoop2.7/sbin/start-all.sh
  • 示例
[root@master ~]# /usr/apps/spark-2.1.1-bin-hadoop2.7/sbin/start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /usr/apps/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/apps/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/apps/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
  • 查看进程 [ 三台主机 ]
jps
[root@master ~]# jps
5906 Master
5971 Jps
5108 NameNode
5492 ResourceManager
5237 DataNode
5593 NodeManager
[root@slave1 ~]# jps
4865 Jps
4818 Worker
4643 NodeManager
4392 DataNode
4510 SecondaryNameNode
[root@slave2 ~]# jps
4258 NodeManager
4452 Jps
4133 DataNode
4395 Worker
WebUI界面
  • 在Windows主机运行时,若访问不到,下方Master替换为IP地址或在Windows的hosts文件添加IP映射
# 该目录下存在“hosts”文件
C:\Windows\System32\drivers\etc
  • 在Master本机运行时,若访问不到,查看配置文件是否有误:/etc/profile
http://master:8080

hadoop 免费发行版_hadoop_05





启动Flink

  • 仅Master主机执行
  • 因配置了Flink的环境变量,直接执行即可
start-cluster.sh
  • 示例
[root@master ~]# start-cluster.sh 
Starting cluster.
Starting standalonesession daemon on host master.
Starting taskexecutor daemon on host slave1.
Starting taskexecutor daemon on host slave2.
  • 查看进程 [ 三台主机 ]
jps
[root@master ~]# jps
5906 Master
5108 NameNode
5492 ResourceManager
5237 DataNode
5593 NodeManager
6426 Jps
6365 StandaloneSessionClusterEntrypoint
[root@slave1 ~]# jps
4818 Worker
4643 NodeManager
5238 TaskManagerRunner
5286 Jps
4392 DataNode
4510 SecondaryNameNode
[root@slave2 ~]# jps
4258 NodeManager
4836 Jps
4133 DataNode
4791 TaskManagerRunner
4395 Worker
WebUI界面
  • 在Windows主机运行时,若访问不到,下方Master替换为IP地址或在Windows的hosts文件添加IP映射
# 该目录下存在“hosts”文件
C:\Windows\System32\drivers\etc
  • 在Master本机运行时,若访问不到,查看配置文件是否有误:/etc/profile
http://master:8081

hadoop 免费发行版_hadoop 免费发行版_06


hadoop 免费发行版_spark_07





启动Kafka

Kafka说明


Kafka依赖于

zookeeper

发送消息、发送指令,所以需要先启动zookeeper

启动zookeeper
  • 命令解释 [ Kafka命令意思相同,故Kafka命令不再作解释 ]
# zookeeper启动命令,反之zookeeper-server-stop.sh
zookeeper-server-start.sh
# 后台启动,也称守护进程
# 若没有该命令,则该进程为阻塞进程:即启动后什么都不允许输入,只能新建窗口,且窗口不能关闭。
-daemon
# 使用Kafka环境变量下的/config/zookeeper.properties配置文件启动zookeeper
$KAFKA_HOME/config/zookeeper.properties
  • 此命令三台主机都要执行
zookeeper-server-start.sh -daemon $KAFKA_HOME/config/zookeeper.properties
  • 此命令若不使用后台启动命令-daemon被报错刷屏,请勿担心此错误,将其他两台zookeeper启动完毕后即不会报错。
  • 原因是zookeeper连接不到另外的主机因为其他主机还未开启导致,开启后不再报错。
  • 查看进程 [ 三台主机 ]
jps
[root@master ~]# jps
5906 Master
5108 NameNode
5492 ResourceManager
10804 Jps
5237 DataNode
5593 NodeManager
10697 QuorumPeerMain
6365 StandaloneSessionClusterEntrypoint
[root@slave1 ~]# jps
4818 Worker
4643 NodeManager
5238 TaskManagerRunner
4392 DataNode
7624 Jps
7547 QuorumPeerMain
4510 SecondaryNameNode
[root@slave2 ~]# jps
5729 Jps
4258 NodeManager
4133 DataNode
4791 TaskManagerRunner
5690 QuorumPeerMain
4395 Worker
启动Kafka
  • 此命令三台主机都要执行
kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
  • 查看进程 [ 三台主机 ]
jps
[root@master ~]# jps
5906 Master
5108 NameNode
5492 ResourceManager
5237 DataNode
11095 Kafka
11159 Jps
5593 NodeManager
10697 QuorumPeerMain
6365 StandaloneSessionClusterEntrypoint
[root@slave1 ~]# jps
7904 Kafka
4818 Worker
4643 NodeManager
5238 TaskManagerRunner
4392 DataNode
7547 QuorumPeerMain
7964 Jps
4510 SecondaryNameNode
[root@slave2 ~]# jps
6065 Jps
4258 NodeManager
4133 DataNode
6005 Kafka
4791 TaskManagerRunner
5690 QuorumPeerMain
4395 Worker





配置Flume

解压
tar -zxf /usr/tar/apache-flume-1.7.0-bin.tar.gz -C /usr/apps/
配置
  • 配置Flume环境变量
vim /etc/profile
  • 示例
# FLUME_HOME
export FLUME_HOME=/usr/apps/apache-flume-1.7.0-bin
export PATH=$FLUME_HOME/bin:$PATH
  • 刷新使之生效
source /etc/profile
  • 进入到Flume配置文件夹
cd /usr/apps/apache-flume-1.7.0-bin/conf/
  • 示例
[root@master apps]# cd /usr/apps/apache-flume-1.7.0-bin/conf/
[root@master conf]# ll
总用量 16
-rw-r--r--. 1 root root 1661 9月  26 2016 flume-conf.properties.template
-rw-r--r--. 1 root root 1455 9月  26 2016 flume-env.ps1.template
-rw-r--r--. 1 root root 1565 9月  26 2016 flume-env.sh.template
-rw-r--r--. 1 root root 3107 9月  26 2016 log4j.properties
  • 复制配置模板文件
cp flume-env.sh.template flume-env.sh
  • 编辑Flume配置文件
vim flume-env.sh
  • 示例
  • 22 [ 示例 ]
# If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced
# during Flume startup.

# Enviroment variables can be set here.

export JAVA_HOME=/usr/apps/jdk1.8.0_291

# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
# export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
  • 自主编写配置文件
  • 写作原因和赛程要求不一,本次配置文件将创建在Flume解压文件下job目录,说明: [ 该目录需自行创建 ]
  • 创建配置存放目录
mkdir /usr/apps/apache-flume-1.7.0-bin/job
  • 编写配置文件,且文件名称为:file-flume-kafka.conf

该名称意义为:

文件为数据源-flume采集-存入kafka的conf配置文件

vim /usr/apps/apache-flume-1.7.0-bin/job/file-flume-kafka.conf
  • 配置文件内容 [ 模板 ]
  • 以下配置文件中需要修改的内容为:/File_Path/、Topic_Name应改为:具体指向文件
# 定义各个组件名称,a为项目名总称,=后面为各个组件的名称
a.sources = s1
a.channels = c1
a.sinks = k1

# 采集类型为exec采集,具体格式请参照Flume开发者文档
a.sources.s1.type = exec
# 采集文件路径,-F为追踪采集,-f为不追踪采集,具体自行百度。
a.sources.s1.command = tail -F /File_Path/

# 定义Channel为内存,即将数据存放于内存中
a.channels.c1.type = memory
# 内存满1000M滚动一次
a.channels.c1.capacity = 1000
# 每次传递信息条数,1000M和100条数据,满足其中一个条件就会滚动一次。
a.channels.c1.transactionCapacity = 100

# 输出类型为Flume官方定义,参照Flume开发者文档
a.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
# 采集后存入的Topic主题
a.sinks.k1.kafka.topic = Topic_Name
# bootstrap框架服务主机,有多台写多台,可不一致
a.sinks.k1.kafka.bootstrap.servers = master:9092,slave1:9092,slave2:9092
# 限制source和sink对event批量处理为每次20条
a.sinks.k1.kafka.flumeBatchSize = 20
# 确定发送数据状态,常见的有0、1、all,具体作用自行百度
a.sinks.k1.kafka.producer.acks = 1
# 限制每次滚动的时间,空间未满,时间还够,则继续向该批次中加入数据
a.sinks.k1.kafka.producer.linger.ms = 1
# 定义数据压缩类型为snappy
a.sinks.k1.kafka.producer.compression.type = snappy

# 各个组件之间的关系
# sources的s1的数据放到channels的c1中
a.sources.s1.channels = c1
# 接受数据的sink的k1接受channel的c1数据
a.sinks.k1.channel = c1
# channels和channel的区别在于:采集后可将数据存放于多个channels,但接收者只能接收一个channel的消息。
  • 示例
  • 采集apache.log日志文件 [ 示例 ]
a.sources = s1
a.channels = c1
a.sinks = k1

a.sources.s1.type = exec
a.sources.s1.command = tail -F /usr/apps/data/apache.log

a.channels.c1.type = memory
a.channels.c1.capacity = 1000
a.channels.c1.transactionCapacity = 100

a.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a.sinks.k1.kafka.topic = file-test
a.sinks.k1.kafka.bootstrap.servers = master:9092,slave1:9092,slave2:9092
a.sinks.k1.kafka.flumeBatchSize = 20
a.sinks.k1.kafka.producer.acks = 1
a.sinks.k1.kafka.producer.linger.ms = 1
a.sinks.k1.kafka.producer.compression.type = snappy

a.sources.s1.channels = c1
a.sinks.k1.channel = c1
测试
  • 创建和配置中对应的topic
kafka-topics.sh --create --zookeeper master:2181 --topic file-test --replication-factor 3 --partitions 3
  • 示例
[root@master ~]# kafka-topics.sh --create --zookeeper master:2181 --topic file-test --replication-factor 3 --partitions 3
Created topic "file-test".
  • 开始采集
flume-ng agent -c $FLUME_HOME/conf -f $FLUME_HOME/job/file-flume-kafka.conf -n a -Dflume.root.logger=INFO,console
  • 开启Kafka消费者 [ 随便哪一台机器,笔者使用Slave1 ]
kafka-console-consumer.sh --bootstrap-server master:9092 --topic file-test --from-beginning
  • 修改被监听文件的数据并保存

hadoop 免费发行版_flink_08





其他软件说明

因本篇文章字数太多,导致编辑器卡顿,且该篇博客已经完成第一模块,若向“以防万一”,想看该项目中提到的更多软件搭建教程请戳此链接: 文章名称