Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
868 views
in Technique[技术] by (71.8m points)

cron - How do i stop airflow running a task the first time when i unpause it?

I have a DAG. Here is a sample of the parameters.

dag = DAG(
    'My Dag',
    default_args=default_args,
    description='Cron Job : My Dag',
    schedule_interval='45 07 * * *',
    # start_date=days_ago(0),
    start_date = datetime(2021, 4, 6, 10, 45),
    tags=['My Dag Tag'],
    concurrency = 1,
    is_paused_upon_creation=True,
    catchup=False # dont run previous and backfill; run only latest
)

Reading the documentation from AIRFLOW, i think i have set the dag to run at 7:45 everyday. However if I pause the dag and unpause it a couple of days later, it still runs as soon as I unpause it (of course for that day) as catch=False which avoids backfills. That is not the expected behaviour right? I mean I scheduled it on 7:45. When I unpause it at 10:00 it should not be running at all until the next 7:45.

What am i missing here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I assume that you are familiar with the scheduling mechanism of Airflow, if this is not the case please read Problem with start date and scheduled date in Apache airflow before reading the rest of the answer.

As for your case: You had one/several runs as expected when you deployed the dag. At some point you paused the dag on 2021-04-07, today (2021-04-19) you unpaused it. Airflow then executed a dag run with execution_date='2021-04-18'.

This is expected.

The reason for this is based on the scheduling mechanism of Airflow. Your last run was on 2021-04-07 the interval is 45 07 * * * (every day at 07:45). Since you paused the DAG the runs of 2021-04-08, 2021-04-09, ... , 2021-04-17 were never created. When you unpaused the DAG Airflow didn't create these runs because of catchup=False however today run (2021-04-19) isn't part of the catchup it was scheduled because the interval of execution_date=2021-04-18 has reached its end cycle thus started running.

The behavior that you are experiencing isn't different than deploying this fresh DAG:

from airflow.operators.dummy_operator import DummyOperator
default_args = {
    'owner': 'airflow',
    'start_date': datetime(2020, 1, 1),

}
with DAG(dag_id='stackoverflow_question',
         default_args=default_args,
         schedule_interval='45 07 * * *',
         catchup=False
         ) as dag:
    DummyOperator(task_id='some_task')

As soon as you will deploy it a single run will be created:

enter image description here

enter image description here

The DAG start_date is 2020-01-01 with catchup=False I deployed the DAG today (19/Apr/2021)so it created a run with execution_date='2021-04-18' that started to run today 2021-04-19.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...