一. DolphinScheduler部署说明

1.1 软硬件环境要求

1.1.1 操作系统版本要求

操作系统

版本

Red Hat Enterprise Linux

7.0 及以上

CentOS

7.0 及以上

Oracle Enterprise Linux

7.0 及以上

Ubuntu LTS

16.04 及以上

1.1.2 服务器硬件要求

CPU

内存

网络

4核+

8 GB+

千兆网卡

1.2 部署模式

DolphinScheduler支持多种部署模式,包括单机模式(Standalone)、伪集群模式(Pseudo-Cluster)、集群模式(Cluster)等。
1.2.1 单机模式
单机模式(standalone)模式下,所有服务均集中于一个StandaloneServer进程中,并且其中内置了注册中心Zookeeper和数据库H2。只需配置JDK环境,就可一键启动DolphinScheduler,快速体验其功能。
1.2.2 伪集群模式
伪集群模式(Pseudo-Cluster)是在单台机器部署 DolphinScheduler 各项服务,该模式下master、worker、api server、logger server等服务都只在同一台机器上。Zookeeper和数据库需单独安装并进行相应配置。
1.2.3 集群模式
集群模式(Cluster)与伪集群模式的区别就是在多台机器部署 DolphinScheduler各项服务,并且可以配置多个Master及多个Worker。

二. DolphinScheduler集群模式部署

2.1 集群规划

集群模式下,可配置多个Master及多个Worker。通常可配置2~3个Master,若干个Worker。由于集群资源有限,此处配置一个Master,三个Worker,集群规划如下。
node1 master、worker
node2 worker
node3 worker

2.2 前置准备工作

1)三台节点均需部署JDK(1.8+),并配置相关环境变量。
2)需部署数据库,支持MySQL(5.7+)或者PostgreSQL(8.2.15+)。
3)需部署Zookeeper(3.4.6+)。
4)三台节点均需安装进程管理工具包psmisc。

[linux@node1 ~]$ sudo yum install -y psmisc
[linux@node2 ~]$ sudo yum install -y psmisc
[linux@node3 ~]$ sudo yum install -y psmisc

2.3 解压DolphinScheduler安装包

1)上传DolphinScheduler安装包到node1节点的/opt/software目录
2)解压安装包到当前目录
注:解压目录并非最终的安装目录

[linux@node1 software]$ tar -zxvf apache-dolphinscheduler-1.3.9-bin.tar.gz -C /opt/server/

3)改名

[linux@node1 server]$ mv apache-dolphinscheduler-1.3.9-bin dolphinscheduler-bin

2.4 初始化数据库

DolphinScheduler 元数据存储在关系型数据库中,故需创建相应的数据库和用户。
1)创建数据库

mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

2)创建用户

mysql> CREATE USER 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler';

注:
若出现以下错误信息,表明新建用户的密码过于简单。

ERROR 1819 (HY000): Your password does not satisfy the current policy requirements

可提高密码复杂度或者执行以下命令降低MySQL密码强度级别。

mysql> set global validate_password_length=4;
mysql> set global validate_password_policy=0;

3)赋予用户相应权限

mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';
mysql> flush privileges;

4)修改数据源配置文件
进入DolphinScheduler解压目录

[linux@node1 ~]$ cd /opt/server/dolphinscheduler-bin/conf/

修改conf目录下的datasource.properties文件

[linux@node1 conf]$ vim datasource.properties

修改内容如下

spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://node1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
spring.datasource.username=dolphinscheduler
spring.datasource.password=dolphinscheduler

5)拷贝MySQL驱动到DolphinScheduler的解压目录下的lib中

[linux@node1 conf]$ cp /opt/software/mysql/mysql-connector-java-5.1.27-bin.jar /opt/server/dolphinscheduler-bin/lib/

6)执行数据库初始化脚本

[linux@node1 dolphinscheduler-bin]$ sh script/create-dolphinscheduler.sh

2.5 配置一键部署脚本

修改解压目录下的conf/config目录下的install_config.conf文件

[linux@node1 dolphinscheduler-bin]$ cd conf/config/
[linux@node1 config]$ vim install_config.conf
# postgresql or mysql
dbtype="mysql"

# db config

# db username
username="dolphinscheduler"

# database name
dbname="dolphinscheduler"

# db passwprd
# NOTICE: if there are special characters, please use the \ to escape, for example, `[` escape to `\[`
password="dolphinscheduler"

# zk cluster
zkQuorum="node1:2181,node2:2181,node3:2181"

# Note: the target installation path for dolphinscheduler, please not config as the same as the current path (pwd)
installPath="/opt/server/dolphinscheduler"

# deployment user
deployUser="linux"


# alert config
# mail server host
mailServerHost="smtp.exmail.qq.com"

# mail server port
mailServerPort="25"


# user
mailUser="xxxxxxxxxx"

# sender password
# note: The mail.passwd is email service authorization code, not the email login password.
mailPassword="xxxxxxxxxx"

# TLS mail protocol support
starttlsEnable="true"

# SSL mail protocol support
# only one of TLS and SSL can be in the true state.
sslEnable="false"


# user data local directory path, please make sure the directory exists and have read write permissions
dataBasedirPath="/tmp/dolphinscheduler"

# resource storage type: HDFS, S3, NONE
resourceStorageType="HDFS"

resourceUploadPath="/dolphinscheduler"

# if S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
defaultFS="hdfs://node1:9820"

# if resourceStorageType is S3, the following three configuration is required, otherwise please ignore
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"

# resourcemanager port, the default value is 8088 if not specified
resourceManagerHttpAddressPort="8088"

# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarnHaIps=""

singleYarnIp="node2"

# who have permissions to create directory under HDFS/S3 root path
# Note: if kerberos is enabled, please config hdfsRootUser=
hdfsRootUser="linux"

# kerberos config
# whether kerberos starts, if kerberos starts, following four items need to config, otherwise please ignore
kerberosStartUp="false"
# kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# kerberos expire time, the unit is hour
kerberosExpireTime="2"

  api server port
apiServerPort="12345"


# install hosts
# Note: install the scheduled hostname list. If it is pseudo-distributed, just write a pseudo-distributed hostname
ips="node1,node2,node3"

# ssh port, default 22
# Note: if ssh port is not default, modify here
sshPort="22"

# run master machine
# Note: list of hosts hostname for deploying master
masters="node1"

# run worker machine
# note: need to write the worker group name of each worker, the default value is "default"
workers="node1:default,node2:default,node3:default"

# run alert machine
# note: list of machine hostnames for deploying alert server
alertServer="node1"

# run api machine
# note: list of machine hostnames for deploying api server
apiServers="node1"

2.6 一键部署DolphinScheduler

1)启动Zookeeper集群和Hadoop集群

[linux@node1 dolphinscheduler-bin]$ zookeeper.sh start
[linux@node1 dolphinscheduler-bin]$ cluster.sh start

2)一键部署并启动DolphinScheduler

[linux@node1 dolphinscheduler-bin]$ ./install.sh

3)查看DolphinScheduler进程

==========node1============
3009 NodeManager
3187 JobHistoryServer
2708 DataNode
3750 MasterServer
3798 WorkerServer
3846 LoggerServer
3975 Jps
2344 QuorumPeerMain
2556 NameNode
3900 AlertServer
3949 ApiApplicationServer
==========node2============
2432 NodeManager
3072 LoggerServer
2018 QuorumPeerMain
3028 WorkerServer
2118 DataNode
3110 Jps
2300 ResourceManager
==========node3============
2130 SecondaryNameNode
2227 NodeManager
2643 LoggerServer
2599 WorkerServer
2682 Jps
2027 DataNode
1933 QuorumPeerMain

4)访问DolphinScheduler UI
DolphinScheduler UI地址为

http://node1:12345/dolphinscheduler

初始用户的用户名为:admin,密码为dolphinscheduler123

2.7 DolphinScheduler启停命令

DolphinScheduler的启停脚本均位于其安装目录的bin目录下。

1)一键启停所有服务

./bin/start-all.sh
./bin/stop-all.sh

注意同Hadoop的启停脚本进行区分。
2)启停 Master

./bin/dolphinscheduler-daemon.sh start master-server
./bin/dolphinscheduler-daemon.sh stop master-server

3)启停 Worker

./bin/dolphinscheduler-daemon.sh start worker-server
./bin/dolphinscheduler-daemon.sh stop worker-server

4)启停 Api

./bin/dolphinscheduler-daemon.sh start api-server
./bin/dolphinscheduler-daemon.sh stop api-server

5)启停 Logger

./bin/dolphinscheduler-daemon.sh start logger-server
./bin/dolphinscheduler-daemon.sh stop logger-server

6)启停 Alert

./bin/dolphinscheduler-daemon.sh start alert-server
./bin/dolphinscheduler-daemon.sh stop alert-server

2.8 启动单机版dolphinscheduler

关闭集群版dolphinscheduler,关闭zookeeper,在安装目录下执行

[linux@node1 dolphinscheduler]$ bin/dolphinscheduler-daemon.sh start standalone-server