Azkaban 基础篇

工作流概述:

  • 请假、借款
  • JavaEE:jBPM、Activiti


apache调度python azkaban调度python_apache调度python


工作流调度系统的重要性

apache调度python azkaban调度python_Hadoop_02


crontab的问题和优势:

对于定时调度能够很好的执行,但是对于依赖调度束手无策,只能够估计时间

apache调度python azkaban调度python_apache调度python_03


常用的调度框架:

Azkaban LinkedIn开源

Oozie apache开源

Zeus 阿里开源

apache调度python azkaban调度python_Hadoop_04

Azkaban 概述:

apache调度python azkaban调度python_Hadoop_05


特点

注意模块化和可插拔特性

apache调度python azkaban调度python_apache调度python_06

Azkaban 架构

apache调度python azkaban调度python_ci_07


apache调度python azkaban调度python_ci_08


apache调度python azkaban调度python_spark_09


apache调度python azkaban调度python_apache调度python_10

WebServer主要是界面

apache调度python azkaban调度python_Hadoop_11


apache调度python azkaban调度python_Hive_12

Azkaban 运行模式

apache调度python azkaban调度python_spark_13


apache调度python azkaban调度python_ci_14


apache调度python azkaban调度python_Hadoop_15

apache调度python azkaban调度python_apache调度python_16


apache调度python azkaban调度python_spark_17

测试

首先创建project

apache调度python azkaban调度python_apache调度python_18


在Projects中可以看到

apache调度python azkaban调度python_Hadoop_19


点进去后,点upload文件

apache调度python azkaban调度python_Hadoop_20

选择zip包

apache调度python azkaban调度python_Hadoop_21


upload后点击Execute flow

apache调度python azkaban调度python_Hive_22


apache调度python azkaban调度python_Hive_23


apache调度python azkaban调度python_spark_24


点击Execute

apache调度python azkaban调度python_apache调度python_25


显示执行成功了

apache调度python azkaban调度python_spark_26

点进去看一下

apache调度python azkaban调度python_Hive_27

Flow Log

apache调度python azkaban调度python_Hadoop_28


再点job list 中的details

apache调度python azkaban调度python_apache调度python_29

可以看到输出了Hello World

apache调度python azkaban调度python_spark_30

Azkaban实战篇

Dependency 作业

Dependencies是任务的依赖,表示执行之前需要先执行的任务

apache调度python azkaban调度python_apache调度python_31


apache调度python azkaban调度python_Hive_32


同样的套路

apache调度python azkaban调度python_Hive_33


apache调度python azkaban调度python_Hadoop_34


有Dependencie时,名称是以最后一个job的名字为准的

apache调度python azkaban调度python_Hive_35


点击Execute

apache调度python azkaban调度python_spark_36


apache调度python azkaban调度python_spark_37


成功了后job list

apache调度python azkaban调度python_apache调度python_38


job bar中成功输出了bar

apache调度python azkaban调度python_spark_39

HDFS 作业

apache调度python azkaban调度python_Hive_40


apache调度python azkaban调度python_spark_41


同样的方式将zip放在Azkaban上进行Execute,成功输出hadoop命令

apache调度python azkaban调度python_spark_42

MapReduce 作业

执行Hadoop任务

apache调度python azkaban调度python_Hive_43


注意放在job中的时候路径要写全

apache调度python azkaban调度python_spark_44


进行压缩

apache调度python azkaban调度python_apache调度python_45


如果command写错了,也可以直接在Azkaban中进行修改

apache调度python azkaban调度python_ci_46


apache调度python azkaban调度python_ci_47


试一下Hadoop的wordcount

apache调度python azkaban调度python_Hadoop_48


apache调度python azkaban调度python_Hive_49


改写一下job文件

apache调度python azkaban调度python_Hive_50


试用edit在Azkaban中修改command

apache调度python azkaban调度python_ci_51


再跑一遍,成功执行

apache调度python azkaban调度python_spark_52

Hive作业

一份txt文件

apache调度python azkaban调度python_apache调度python_53


建表的HQL语法如下

apache调度python azkaban调度python_Hive_54


将数据写入表中

apache调度python azkaban调度python_spark_55


Hive中执行sql语句

apache调度python azkaban调度python_ci_56


创建test.sql

apache调度python azkaban调度python_Hadoop_57


把sql和hive job打包

apache调度python azkaban调度python_Hive_58


放入Azkaban中运行,成功执行

apache调度python azkaban调度python_apache调度python_59

定时作业

找到执行时的Schedule

apache调度python azkaban调度python_Hadoop_60


apache调度python azkaban调度python_apache调度python_61


apache调度python azkaban调度python_spark_62


apache调度python azkaban调度python_ci_63


在Scheduling模块可以看到调度

apache调度python azkaban调度python_apache调度python_64


有显示调度的具体时间

apache调度python azkaban调度python_Hive_65

邮件告警

还是到Execute flow 中来

apache调度python azkaban调度python_ci_66


apache调度python azkaban调度python_Hadoop_67


apache调度python azkaban调度python_Hadoop_68


Azkaban中的SLA设置

apache调度python azkaban调度python_ci_69