I'm just getting started with Airbnb's airflow, and I'm still not clear on how/when backfilling is done.
Specifically, there are 2 use-cases that confuse me:
If I run airflow scheduler
for a few minutes, stop it for a minute, then restart it again, my DAG seems to run extra tasks for the first 30 seconds or so, then it continues as normal (runs every 10 sec). Are these extra tasks "backfilled" tasks that weren't able to complete in an earlier run? If so, how would I tell airflow not to backfill those tasks?
If I run airflow scheduler
for a few minutes, then run airflow clear MY_tutorial
, then restart airflow scheduler
, it seems to run a TON of extra tasks. Are these tasks also somehow "backfilled" tasks? Or am I missing something.
Currently, I have a very simple dag:
default_args = {
'owner': 'me',
'depends_on_past': False,
'start_date': datetime(2016, 10, 4),
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG(
'MY_tutorial', default_args=default_args, schedule_interval=timedelta(seconds=10))
# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
t2 = BashOperator(
task_id='sleep',
bash_command='sleep 5',
retries=3,
dag=dag)
templated_command = """
{% for i in range(5) %}
echo "{{ ds }}"
echo "{{ macros.ds_add(ds, 8)}}"
echo "{{ params.my_param }}"
{% endfor %}
"""
t3 = BashOperator(
task_id='templated',
bash_command=templated_command,
params={'my_param': 'Parameter I passed in'},
dag=dag)
second_template = """
touch ~/airflow/logs/test
echo $(date) >> ~/airflow/logs/test
"""
t4 = BashOperator(
task_id='write_test',
bash_command=second_template,
dag=dag)
t1.set_upstream(t4)
t2.set_upstream(t1)
t3.set_upstream(t1)
The only two things I've changed in my airflow config are
- I changed from using a sqlite db to using a postgres db
- I'm using a
CeleryExecutor
instead of a SequentialExecutor
Thanks so much for you help!
question from:
https://stackoverflow.com/questions/39882204/airflow-backfill-clarification 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…