I'm scheduling tasks with Prefect this way :
#Python script
from prefect import task, Flow
from prefect.tasks.shell import ShellTask
from datetime import timedelta
from datetime import datetime
from prefect.schedules import IntervalSchedule
import os
import sys
schedule = IntervalSchedule(start_date=datetime.now() + timedelta(seconds=10),interval=timedelta(minutes=1))
can_start = True
with Flow("List files", schedule) as flow:
if can_start:
can_start = False
file_names = os.listdir("/home/admin/data/raw")
file_names = fnmatch.filter(file_names, "*fact*")
process_common.map(file_names)
can_start = True
out = flow.run()
But if files arrive into my directory after the first Prefect run, file_names remain empty during the second run, and also during all the next ones.
I have tried to fetch my files with a grep command, and then it works !
file_names = ShellTask(command="ls /home/admin/data/raw | grep fact", return_all=True, log_stderr=True, stream_output=True)
Does someone know why that happens ? Many thanks for your help.
question from:
https://stackoverflow.com/questions/65899426/os-listdir-vs-grep-in-a-prefect-schedule 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…