一. DolphinScheduler部署说明
1.1 软硬件环境要求
1.1.1 操作系统版本要求
操作系统 | 版本 |
Red Hat Enterprise Linux | 7.0 及以上 |
CentOS | 7.0 及以上 |
Oracle Enterprise Linux | 7.0 及以上 |
Ubuntu LTS | 16.04 及以上 |
1.1.2 服务器硬件要求
CPU | 内存 | 网络 |
4核+ | 8 GB+ | 千兆网卡 |
1.2 部署模式
DolphinScheduler支持多种部署模式,包括单机模式(Standalone)、伪集群模式(Pseudo-Cluster)、集群模式(Cluster)等。
1.2.1 单机模式
单机模式(standalone)模式下,所有服务均集中于一个StandaloneServer进程中,并且其中内置了注册中心Zookeeper和数据库H2。只需配置JDK环境,就可一键启动DolphinScheduler,快速体验其功能。
1.2.2 伪集群模式
伪集群模式(Pseudo-Cluster)是在单台机器部署 DolphinScheduler 各项服务,该模式下master、worker、api server、logger server等服务都只在同一台机器上。Zookeeper和数据库需单独安装并进行相应配置。
1.2.3 集群模式
集群模式(Cluster)与伪集群模式的区别就是在多台机器部署 DolphinScheduler各项服务,并且可以配置多个Master及多个Worker。
二. DolphinScheduler集群模式部署
2.1 集群规划
集群模式下,可配置多个Master及多个Worker。通常可配置2~3个Master,若干个Worker。由于集群资源有限,此处配置一个Master,三个Worker,集群规划如下。
node1 master、worker
node2 worker
node3 worker
2.2 前置准备工作
1)三台节点均需部署JDK(1.8+),并配置相关环境变量。
2)需部署数据库,支持MySQL(5.7+)或者PostgreSQL(8.2.15+)。
3)需部署Zookeeper(3.4.6+)。
4)三台节点均需安装进程管理工具包psmisc。
[linux@node1 ~]$ sudo yum install -y psmisc
[linux@node2 ~]$ sudo yum install -y psmisc
[linux@node3 ~]$ sudo yum install -y psmisc
2.3 解压DolphinScheduler安装包
1)上传DolphinScheduler安装包到node1节点的/opt/software目录
2)解压安装包到当前目录
注:解压目录并非最终的安装目录
[linux@node1 software]$ tar -zxvf apache-dolphinscheduler-1.3.9-bin.tar.gz -C /opt/server/
3)改名
[linux@node1 server]$ mv apache-dolphinscheduler-1.3.9-bin dolphinscheduler-bin
2.4 初始化数据库
DolphinScheduler 元数据存储在关系型数据库中,故需创建相应的数据库和用户。
1)创建数据库
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
2)创建用户
mysql> CREATE USER 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler';
注:
若出现以下错误信息,表明新建用户的密码过于简单。
ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
可提高密码复杂度或者执行以下命令降低MySQL密码强度级别。
mysql> set global validate_password_length=4;
mysql> set global validate_password_policy=0;
3)赋予用户相应权限
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';
mysql> flush privileges;
4)修改数据源配置文件
进入DolphinScheduler解压目录
[linux@node1 ~]$ cd /opt/server/dolphinscheduler-bin/conf/
修改conf目录下的datasource.properties文件
[linux@node1 conf]$ vim datasource.properties
修改内容如下
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://node1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
spring.datasource.username=dolphinscheduler
spring.datasource.password=dolphinscheduler
5)拷贝MySQL驱动到DolphinScheduler的解压目录下的lib中
[linux@node1 conf]$ cp /opt/software/mysql/mysql-connector-java-5.1.27-bin.jar /opt/server/dolphinscheduler-bin/lib/
6)执行数据库初始化脚本
[linux@node1 dolphinscheduler-bin]$ sh script/create-dolphinscheduler.sh
2.5 配置一键部署脚本
修改解压目录下的conf/config目录下的install_config.conf文件
[linux@node1 dolphinscheduler-bin]$ cd conf/config/
[linux@node1 config]$ vim install_config.conf
# postgresql or mysql
dbtype="mysql"
# db config
# db username
username="dolphinscheduler"
# database name
dbname="dolphinscheduler"
# db passwprd
# NOTICE: if there are special characters, please use the \ to escape, for example, `[` escape to `\[`
password="dolphinscheduler"
# zk cluster
zkQuorum="node1:2181,node2:2181,node3:2181"
# Note: the target installation path for dolphinscheduler, please not config as the same as the current path (pwd)
installPath="/opt/server/dolphinscheduler"
# deployment user
deployUser="linux"
# alert config
# mail server host
mailServerHost="smtp.exmail.qq.com"
# mail server port
mailServerPort="25"
# user
mailUser="xxxxxxxxxx"
# sender password
# note: The mail.passwd is email service authorization code, not the email login password.
mailPassword="xxxxxxxxxx"
# TLS mail protocol support
starttlsEnable="true"
# SSL mail protocol support
# only one of TLS and SSL can be in the true state.
sslEnable="false"
# user data local directory path, please make sure the directory exists and have read write permissions
dataBasedirPath="/tmp/dolphinscheduler"
# resource storage type: HDFS, S3, NONE
resourceStorageType="HDFS"
resourceUploadPath="/dolphinscheduler"
# if S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
defaultFS="hdfs://node1:9820"
# if resourceStorageType is S3, the following three configuration is required, otherwise please ignore
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# resourcemanager port, the default value is 8088 if not specified
resourceManagerHttpAddressPort="8088"
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarnHaIps=""
singleYarnIp="node2"
# who have permissions to create directory under HDFS/S3 root path
# Note: if kerberos is enabled, please config hdfsRootUser=
hdfsRootUser="linux"
# kerberos config
# whether kerberos starts, if kerberos starts, following four items need to config, otherwise please ignore
kerberosStartUp="false"
# kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# kerberos expire time, the unit is hour
kerberosExpireTime="2"
api server port
apiServerPort="12345"
# install hosts
# Note: install the scheduled hostname list. If it is pseudo-distributed, just write a pseudo-distributed hostname
ips="node1,node2,node3"
# ssh port, default 22
# Note: if ssh port is not default, modify here
sshPort="22"
# run master machine
# Note: list of hosts hostname for deploying master
masters="node1"
# run worker machine
# note: need to write the worker group name of each worker, the default value is "default"
workers="node1:default,node2:default,node3:default"
# run alert machine
# note: list of machine hostnames for deploying alert server
alertServer="node1"
# run api machine
# note: list of machine hostnames for deploying api server
apiServers="node1"
2.6 一键部署DolphinScheduler
1)启动Zookeeper集群和Hadoop集群
[linux@node1 dolphinscheduler-bin]$ zookeeper.sh start
[linux@node1 dolphinscheduler-bin]$ cluster.sh start
2)一键部署并启动DolphinScheduler
[linux@node1 dolphinscheduler-bin]$ ./install.sh
3)查看DolphinScheduler进程
==========node1============
3009 NodeManager
3187 JobHistoryServer
2708 DataNode
3750 MasterServer
3798 WorkerServer
3846 LoggerServer
3975 Jps
2344 QuorumPeerMain
2556 NameNode
3900 AlertServer
3949 ApiApplicationServer
==========node2============
2432 NodeManager
3072 LoggerServer
2018 QuorumPeerMain
3028 WorkerServer
2118 DataNode
3110 Jps
2300 ResourceManager
==========node3============
2130 SecondaryNameNode
2227 NodeManager
2643 LoggerServer
2599 WorkerServer
2682 Jps
2027 DataNode
1933 QuorumPeerMain
4)访问DolphinScheduler UI
DolphinScheduler UI地址为
http://node1:12345/dolphinscheduler
初始用户的用户名为:admin,密码为dolphinscheduler123
2.7 DolphinScheduler启停命令
DolphinScheduler的启停脚本均位于其安装目录的bin目录下。
1)一键启停所有服务
./bin/start-all.sh
./bin/stop-all.sh
注意同Hadoop的启停脚本进行区分。
2)启停 Master
./bin/dolphinscheduler-daemon.sh start master-server
./bin/dolphinscheduler-daemon.sh stop master-server
3)启停 Worker
./bin/dolphinscheduler-daemon.sh start worker-server
./bin/dolphinscheduler-daemon.sh stop worker-server
4)启停 Api
./bin/dolphinscheduler-daemon.sh start api-server
./bin/dolphinscheduler-daemon.sh stop api-server
5)启停 Logger
./bin/dolphinscheduler-daemon.sh start logger-server
./bin/dolphinscheduler-daemon.sh stop logger-server
6)启停 Alert
./bin/dolphinscheduler-daemon.sh start alert-server
./bin/dolphinscheduler-daemon.sh stop alert-server
2.8 启动单机版dolphinscheduler
关闭集群版dolphinscheduler,关闭zookeeper,在安装目录下执行
[linux@node1 dolphinscheduler]$ bin/dolphinscheduler-daemon.sh start standalone-server