I am currently using Airflow to run a DAG (say dag.py) which has a few tasks, and then, it has a python script to execute (done via bash_operator). The python script (say report.py) basically takes data from a cloud (s3) location as a dataframe, does a few transformations, and then sends them out as a report over email.
But the issue I'm having is that airflow is basically running this python script, report.py, everytime Airflow scans the repository for changes (i.e. every 2 mins). So, the script is being run every 2 mins (and hence the email is being sent out every two minutes!).
Is there any work around to this? Can we use something apart from a bash operator (bare in mind that we need to do a few dataframe transformations before sending out the report)?
Thanks!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…