Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
197 views
in Technique[技术] by (71.8m points)

How do I stop Airflow from triggering my python scripts?

I am currently using Airflow to run a DAG (say dag.py) which has a few tasks, and then, it has a python script to execute (done via bash_operator). The python script (say report.py) basically takes data from a cloud (s3) location as a dataframe, does a few transformations, and then sends them out as a report over email.

But the issue I'm having is that airflow is basically running this python script, report.py, everytime Airflow scans the repository for changes (i.e. every 2 mins). So, the script is being run every 2 mins (and hence the email is being sent out every two minutes!).

Is there any work around to this? Can we use something apart from a bash operator (bare in mind that we need to do a few dataframe transformations before sending out the report)?

Thanks!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Just make sure you do everything serious in the tasks. It in the python script. The script will be executed often by scheduler but it should simply create tasks and build dependencies between them. The actual work is done in the 'execute' methods of the tasks.

For example rather than sending email in the script you should add the 'EmailOperator' as a task and the right dependencies, so the execute method of the operator will be executed not when the file is parsed by scheduler, but when all dependencies (other tasks ) will complete


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...