AirFlow 提供了丰富的命令,在Anaconda虚拟环境中安装airflow这个文章的基础上开始整理。
首先进入到anaconda的python36虚拟环境,执行airflow -h
airflow -h
(python36) [root@localhost airflow]# airflow -h
usage: airflow [-h] GROUP_OR_COMMAND ...
positional arguments:
GROUP_OR_COMMAND
Groups:
celery Celery components
config View configuration
connections Manage connections
dags Manage DAGs
db Database operations
jobs Manage jobs
kubernetes Tools to help run the KubernetesExecutor
pools Manage pools
providers Display providers
roles Manage roles
tasks Manage tasks
users Manage users
variables Manage variables
Commands:
cheat-sheet Display cheat sheet
info Show information about current Airflow and environment
kerberos Start a kerberos ticket renewer
plugins Dump information about loaded plugins
rotate-fernet-key
Rotate encrypted connection credentials and variables
scheduler Start a scheduler instance
standalone Run an all-in-one copy of Airflow
sync-perm Update permissions for existing roles and optionally DAGs
triggerer Start a triggerer instance
version Show the version
webserver Start a Airflow webserver instance
optional arguments:
-h, --help show this help message and exit
(python36) [root@localhost airflow]#
帮助信息分成两大部分:Groups 和 Command
Groups
celery Celery 组件
config 查看配置信息
connections 管理链接
dags 管理DAG
db 数据库管理
jobs 作业管理
kubernetes 帮助运行KubernetesExecutor执行器
pools pools管理
providers Display providers
roles roles 角色管理
tasks tasks任务管理
users users用户管理
variables variables全局变量管理
Commands
cheat-sheet 显示备忘单
info 显示有关当前AirFlow的环境信息
kerberos 启动 kerberos 认证更新程序
plugins 转储有关已加载插件的信息
rotate-fernet-key 轮换加密的连接认证和变量
scheduler 启动一个scheduler实例
standalone 运行 Airflow 的多合一副本
sync-perm 更新现有角色和可选 DAG 的权限
triggerer 启动一个triggerer实例
version 显示版本
webserver 启动一个Airflow Web服务实例
Groups 里面还包含子命令,后续讲一一介绍
1. 查看配置信息
查看config这个Group的相关帮助
执行airflow config -h 或者 airflow config
airflow config -h
(python36) [root@localhost airflow]# airflow config -h
usage: airflow config [-h] COMMAND ...
View configuration
positional arguments:
COMMAND
get-value
Print the value of the configuration
list List options for the configuration
optional arguments:
-h, --help show this help message and exit
(python36) [root@localhost airflow]#
可以看到其中有两个命令:get-value 和list,现在试试list
airflow config list
(python36) [root@localhost airflow]# airflow config list
[core]
dags_folder = /root/airflow/dags
hostname_callable = socket.getfqdn
default_timezone = utc
executor = SequentialExecutor
sql_alchemy_conn = sqlite:root/airflow/airflow.db
sql_engine_encoding = utf-8
sql_alchemy_pool_enabled = True
sql_alchemy_pool_size = 5
sql_alchemy_max_overflow = 10
sql_alchemy_pool_recycle = 1800
sql_alchemy_pool_pre_ping = True
sql_alchemy_schema =
parallelism = 32
max_active_tasks_per_dag = 16
dags_are_paused_at_creation = True
max_active_runs_per_dag = 16
max_queued_runs_per_dag = 16
load_examples = True
load_default_connections = True
plugins_folder = /root/airflow/plugins
execute_tasks_new_python_interpreter = False
fernet_key =
donot_pickle = True
dagbag_import_timeout = 30.0
dagbag_import_error_tracebacks = True
dagbag_import_error_traceback_depth = 2
dag_file_processor_timeout = 50
task_runner = StandardTaskRunner
default_impersonation =
security =
unit_test_mode = False
enable_xcom_pickling = False
killed_task_cleanup_time = 60
dag_run_conf_overrides_params = True
dag_discovery_safe_mode = True
default_task_retries = 0
default_task_weight_rule = downstream
min_serialized_dag_update_interval = 30
min_serialized_dag_fetch_interval = 10
max_num_rendered_ti_fields_per_task = 30
check_slas = True
xcom_backend = airflow.models.xcom.BaseXCom
lazy_load_plugins = True
lazy_discover_providers = True
max_db_retries = 3
hide_sensitive_var_conn_fields = True
sensitive_var_conn_names =
default_pool_task_slot_count = 128
[logging]
base_log_folder = /root/airflow/logs
remote_logging = False
remote_log_conn_id =
google_key_path =
remote_base_log_folder =
encrypt_s3_logs = False
logging_level = INFO
fab_logging_level = WARNING
logging_config_class =
colored_console_log = True
colored_log_format = [%(blue)s%(asctime)s%(reset)s] {%(blue)s%(filename)s:%(reset)s%(lineno)d} %(log_color)s%(levelname)s%(reset)s - %(log_color)s%(message)s%(reset)s
colored_formatter_class = airflow.utils.log.colored_log.CustomTTYColoredFormatter
log_format = [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s
simple_log_format = %(asctime)s %(levelname)s - %(message)s
task_log_prefix_template =
log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log
log_processor_filename_template = {{ filename }}.log
dag_processor_manager_log_location = /root/airflow/logs/dag_processor_manager/dag_processor_manager.log
task_log_reader = task
extra_logger_names =
worker_log_server_port = 8793
[metrics]
statsd_on = False
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
statsd_allow_list =
stat_name_handler =
statsd_datadog_enabled = False
statsd_datadog_tags =
[secrets]
backend =
backend_kwargs =
[cli]
api_client = airflow.api.client.local_client
endpoint_url = http://localhost:8080
[debug]
fail_fast = False
[api]
enable_experimental_api = False
auth_backend = airflow.api.auth.backend.deny_all
maximum_page_limit = 100
fallback_page_limit = 100
google_oauth2_audience =
google_key_path =
access_control_allow_headers =
access_control_allow_methods =
access_control_allow_origins =
[lineage]
backend =
[atlas]
sasl_enabled = False
host =
port = 21000
username =
password =
[operators]
default_owner = airflow
default_cpus = 1
default_ram = 512
default_disk = 512
default_gpus = 0
default_queue = default
allow_illegal_arguments = False
[hive]
default_hive_mapred_queue =
[webserver]
base_url = http://localhost:8080
default_ui_timezone = UTC
web_server_host = 0.0.0.0
web_server_port = 8080
web_server_ssl_cert =
web_server_ssl_key =
web_server_master_timeout = 120
web_server_worker_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 6000
reload_on_plugin_change = False
secret_key = i4saxhrEonFCbCERqOzDAw==
workers = 4
worker_class = sync
access_logfile = -
error_logfile = -
access_logformat =
expose_config = False
expose_hostname = True
expose_stacktrace = True
dag_default_view = tree
dag_orientation = LR
log_fetch_timeout_sec = 5
log_fetch_delay_sec = 2
log_auto_tailing_offset = 30
log_animation_speed = 1000
hide_paused_dags_by_default = False
page_size = 100
navbar_color = #fff
default_dag_run_display_number = 25
enable_proxy_fix = False
proxy_fix_x_for = 1
proxy_fix_x_proto = 1
proxy_fix_x_host = 1
proxy_fix_x_port = 1
proxy_fix_x_prefix = 1
cookie_secure = False
cookie_samesite = Lax
default_wrap = False
x_frame_enabled = True
show_recent_stats_for_completed_runs = True
update_fab_perms = True
session_lifetime_minutes = 43200
auto_refresh_interval = 3
[email]
email_backend = airflow.utils.email.send_email_smtp
email_conn_id = smtp_default
default_email_on_retry = True
default_email_on_failure = True
[smtp]
smtp_host = localhost
smtp_starttls = True
smtp_ssl = False
smtp_port = 25
smtp_mail_from = airflow@example.com
smtp_timeout = 30
smtp_retry_limit = 5
[sentry]
sentry_on = False
sentry_dsn =
[celery_kubernetes_executor]
kubernetes_queue = kubernetes
[celery]
celery_app_name = airflow.executors.celery_executor
worker_concurrency = 16
worker_umask = 0o077
broker_url = redis://redis:6379/0
result_backend = db+postgresql://postgres:airflow@postgres/airflow
flower_host = 0.0.0.0
flower_url_prefix =
flower_port = 5555
flower_basic_auth =
sync_parallelism = 0
celery_config_options = airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
ssl_active = False
ssl_key =
ssl_cert =
ssl_cacert =
pool = prefork
operation_timeout = 1.0
task_track_started = True
task_adoption_timeout = 600
task_publish_max_retries = 3
worker_precheck = False
[celery_broker_transport_options]
[dask]
cluster_address = 127.0.0.1:8786
tls_ca =
tls_cert =
tls_key =
[scheduler]
job_heartbeat_sec = 5
scheduler_heartbeat_sec = 5
num_runs = -1
scheduler_idle_sleep_time = 1
min_file_process_interval = 30
dag_dir_list_interval = 300
print_stats_interval = 30
pool_metrics_interval = 5.0
scheduler_health_check_threshold = 30
orphaned_tasks_check_interval = 300.0
child_process_log_directory = /root/airflow/logs/scheduler
scheduler_zombie_task_threshold = 300
catchup_by_default = True
max_tis_per_query = 512
use_row_level_locking = True
max_dagruns_to_create_per_loop = 10
max_dagruns_per_loop_to_schedule = 20
schedule_after_task_execution = True
parsing_processes = 2
file_parsing_sort_mode = modified_time
use_job_schedule = True
allow_trigger_in_future = False
dependency_detector = airflow.serialization.serialized_objects.DependencyDetector
trigger_timeout_check_interval = 15
[triggerer]
default_capacity = 1000
[kerberos]
ccache = /tmp/airflow_krb5_ccache
principal = airflow
reinit_frequency = 3600
kinit_path = kinit
keytab = airflow.keytab
forwardable = True
include_ip = True
[github_enterprise]
api_rev = v3
[elasticsearch]
host =
log_id_template = {dag_id}-{task_id}-{execution_date}-{try_number}
end_of_log_mark = end_of_log
frontend =
write_stdout = False
json_format = False
json_fields = asctime, filename, lineno, levelname, message
host_field = host
offset_field = offset
[elasticsearch_configs]
use_ssl = False
verify_certs = True
[kubernetes]
pod_template_file =
worker_container_repository =
worker_container_tag =
namespace = default
delete_worker_pods = True
delete_worker_pods_on_failure = False
worker_pods_creation_batch_size = 1
multi_namespace_mode = False
in_cluster = True
kube_client_request_args =
delete_option_kwargs =
enable_tcp_keepalive = True
tcp_keep_idle = 120
tcp_keep_intvl = 30
tcp_keep_cnt = 6
verify_ssl = True
worker_pods_pending_timeout = 300
worker_pods_pending_timeout_check_interval = 120
worker_pods_queued_check_interval = 60
worker_pods_pending_timeout_batch_size = 100
[smart_sensor]
use_smart_sensor = False
shard_code_upper_limit = 10000
shards = 5
sensors_enabled = NamedHivePartitionSensor
(python36) [root@localhost airflow]#
这个命令可以列出airflow的所有配置信息,用这种方式查看配置参数确实非常方便且清晰明了,下面试试get-value 获取shard_code_upper_limit 这个参数
airflow config get-value shard_code_upper_limit
(python36) [root@localhost airflow]# airflow config get-value shard_code_upper_limit
usage: airflow config get-value [-h] section option
Print the value of the configuration
positional arguments:
section The section name
option The option name
optional arguments:
-h, --help show this help message and exit
airflow config get-value command error: the following arguments are required: option, see help above.
(python36) [root@localhost airflow]#
可以看到这里提示命令错误,并且给出了相关的帮助,原来使用get-value还需要指出具体的 section和option
airflow config get-value smart_sensor shard_code_upper_limit
(python36) [root@localhost airflow]# airflow config get-value smart_sensor shard_code_upper_limit
10000
(python36) [root@localhost airflow]#
可以成功读取到这个参数的值,这些配置参数和windows的某些.ini配置文件格式高度类似,基本是这样格式:
[section1]
key1=xxxxx11
key2=xxxxx12
[section2]
key1=xxxx13
key2=xxxx14
2. 管理链接
airflow connections -h
(python36) [root@localhost airflow]# airflow connections -h
usage: airflow connections [-h] COMMAND ...
Manage connections
positional arguments:
COMMAND
add Add a connection
delete Delete a connection
export Export all connections
get Get a connection
import Import connections from a file
list List connections
optional arguments:
-h, --help show this help message and exit
(python36) [root@localhost airflow]#
提供了6个命令:
add 新增一个链接
delete 删除一个链接
export 导出所有的链接
get 返回一个链接的信息
import 从文件导入一个链接
list 显示所有的链接
显示所有的链接
airflow connections list |more
(python36) [root@localhost ~]# airflow connections list |more
id | conn_id | conn_type | description | host | schema | login | password | port | is_encrypted | is_extra_encrypte | extra_
dejson | get_uri
| | | | | | | | | | d |
|
===+===================+===================+=============+====================+=========+================+==========+=======+==============+===================+=======
=============+===================
1 | airflow_db | mysql | None | mysql | airflow | root | None | None | False | False | {}
| mysql://root@mysql
| | | | | | | | | | |
| /airflow
2 | aws_default | aws | None | None | None | None | None | None | False | False | {}
| aws://
3 | azure_batch_defau | azure_batch | None | None | None | <ACCOUNT_NAME> | None | None | False | False | {'acco
unt_url': | azure-batch://%3CA
| lt | | | | | | | | | | '<ACCO
UNT_URL>'} | CCOUNT_NAME%3E@?ac
| | | | | | | | | | |
| count_url=%3CACCOU
| | | | | | | | | | |
| NT_URL%3E
4 | azure_cosmos_defa | azure_cosmos | None | None | None | None | None | None | False | False | {'data
base_name': | azure-cosmos://?da
| ult | | | | | | | | | | '<DATA
BASE_NAME>', | tabase_name=%3CDAT
| | | | | | | | | | | 'colle
ction_name': | ABASE_NAME%3E&coll
| | | | | | | | | | | '<COLL
ECTION_NAME> | ection_name=%3CCOL
(python36) [root@localhost ~]#
内容看起来确实比较多,我们试试吧链接信息导出成json文件
(python36) [root@localhost ~]# airflow connections export connections.json
Connections successfully exported to connections.json.
(python36) [root@localhost ~]#
这样会在当前目录下创建一个名为connnections.json的文件,内容比较多这里只截取前面一小段
如果要显示某个链接的信息可以使用airflow connection get <链接名称>
例如:airflow connections get airflow_db
(python36) [root@localhost ~]# airflow connections get airflow_db
id | conn_id | conn_type | description | host | schema | login | password | port | is_encrypted | is_extra_encrypted | extra_dejson | get_uri
===+============+===========+=============+=======+=========+=======+==========+======+==============+====================+==============+===========================
1 | airflow_db | mysql | None | mysql | airflow | root | None | None | False | False | {} | mysql://root@mysql/airflow
(python36) [root@localhost ~]#
3.DAG管理
airflow dags list
(python36) [root@localhost ~]# airflow dags list
dag_id | filepath | owner | paused
========================================+============================================================================================================+=========+=======
example_bash_operator | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_bash_operator. | airflow | False
| py | |
example_branch_datetime_operator_2 | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_branch_datetim | airflow | True
| e_operator.py | |
example_branch_dop_operator_v3 | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_branch_python_ | airflow | True
| dop_operator_3.py | |
example_branch_labels | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_branch_labels. | airflow | True
| py | |
example_branch_operator | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_branch_operato | airflow | True
| r.py | |
example_complex | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_complex.py | airflow | True
example_dag_decorator | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_dag_decorator. | airflow | True
| py | |
example_external_task_marker_child | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_external_task_ | airflow | True
| marker_dag.py | |
example_external_task_marker_parent | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_external_task_ | airflow | True
| marker_dag.py | |
example_kubernetes_executor | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_kubernetes_exe | airflow | True
| cutor.py | |
example_nested_branch_dag | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_nested_branch_ | airflow | True
| dag.py | |
example_passing_params_via_test_command | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_passing_params | airflow | True
| _via_test_command.py | |
example_short_circuit_operator | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_short_circuit_ | airflow | True
| operator.py | |
example_skip_dag | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_skip_dag.py | airflow | True
example_subdag_operator | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_subdag_operato | airflow | True
| r.py | |
example_subdag_operator.section-1 | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_subdag_operato | airflow | True
| r.py | |
example_subdag_operator.section-2 | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_subdag_operato | airflow | True
| r.py | |
example_task_group | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_task_group.py | airflow | True
example_task_group_decorator | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_task_group_dec | airflow | True
| orator.py | |
example_time_delta_sensor_async | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_time_delta_sen | airflow | True
| sor_async.py | |
example_trigger_controller_dag | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_trigger_contro | airflow | True
| ller_dag.py | |
example_trigger_target_dag | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_trigger_target | airflow | True
| _dag.py | |
example_weekday_branch_operator | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_branch_day_of_ | airflow | True
| week_operator.py | |
example_xcom | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_xcom.py | airflow | True
example_xcom_args | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_xcomargs.py | airflow | True
example_xcom_args_with_operators | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_xcomargs.py | airflow | True
latest_only | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_latest_only.py | airflow | True
latest_only_with_trigger | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_latest_only_wi | airflow | True
| th_trigger.py | |
test_utils | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/test_utils.py | airflow | True
tutorial | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/tutorial.py | airflow | True
tutorial_etl_dag | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/tutorial_etl_dag.py | airflow | True
tutorial_taskflow_api_etl | /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/tutorial_taskflow_api_ | airflow | True
| etl.py | |
(python36) [root@localhost ~]#
这个命令列出了所有DAG的ID,源码路径、所属用户以及当前的状态,如果paused=True表示没有被启动,否则就是启动的。
这些DAG都是airflow自带的demo,比如第一个example_bash_operator,文件路径是:
/usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_bash_operator.py
可以在这里查看这个dag的源码:
vim /usr/lib64/anaconda3/envs/python36/lib/python3.6/site-packages/airflow/example_dags/example_bash_operator.py
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.dummy import DummyOperator
with DAG(
dag_id='example_bash_operator',
schedule_interval='0 0 * * *',
start_date=datetime(2021, 1, 1),
catchup=False,
dagrun_timeout=timedelta(minutes=60),
tags=['example', 'example2'],
params={"example_key": "example_value"},
) as dag:
run_this_last = DummyOperator(
task_id='run_this_last',
)
# [START howto_operator_bash]
run_this = BashOperator(
task_id='run_after_loop',
bash_command='echo 1',
)
# [END howto_operator_bash]
run_this >> run_this_last
for i in range(3):
task = BashOperator(
task_id='runme_' + str(i),
bash_command='echo "{{ task_instance_key_str }}" && sleep 1',
)
task >> run_this
# [START howto_operator_bash_template]
also_run_this = BashOperator(
task_id='also_run_this',
bash_command='echo "run_id={{ run_id }} | dag_run={{ dag_run }}"',
)
# [END howto_operator_bash_template]
also_run_this >> run_this_last
# [START howto_operator_bash_skip]
this_will_skip = BashOperator(
task_id='this_will_skip',
bash_command='echo "hello world"; exit 99;',
dag=dag,
)
# [END howto_operator_bash_skip]
this_will_skip >> run_this_last
if __name__ == "__main__":
dag.cli()
我们可以参考这个源码创建自己的DAG