Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
412 views
in Technique[技术] by (71.8m points)

python - Pyspark job aborted error due to stage failure

I have the following piece of code:

# fact table
df = (spark.table(f'nn_squad7_{country}.fact_table')
     .filter(f.col('date_key').between(start_date,end_date))
     #.filter(f.col('is_lidl_plus')==1)
     .filter(f.col('source')=='tickets')
     .filter(f.col('subtype')=='trx')
     .filter(f.col('is_trx_ok') == 1)
     .join(dim_stores,'store_id','inner')
     .join(dim_customers,'customer_id','inner')
     .withColumn('week', f.expr('DATE_FORMAT(DATE_SUB(date_key, 1), "Y-ww")'))
     .withColumn('quarter', f.expr('DATE_FORMAT(DATE_SUB(date_key, 1), "Q")')))


#checking metrics
df2 =(df
      .groupby('is_client_plus','quarter')
      .agg(
        f.countDistinct('store_id'),
        f.sum('customer_id'),
        f.sum('ticket_id')))

display(df2)

When I execute the query I get the following error:

SparkException: Job aborted due to stage failure: Task 58 in stage 13.0 failed 4 times, most recent failure: Lost task 58.3 in stage 13.0 (TID 488, 10.32.14.43, executor 4): java.lang.IllegalArgumentException: Illegal pattern character 'Q'

I'm not sure about why I'm getting this error because when I run the fact table chunck alone I'm not getting any kind of error.

Any advice? Thanks!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

According to the docs of Spark 3, 'Q' is a valid datetime format pattern, despite it's not a valid Java time format. Not sure why it didn't work for you - maybe a Spark version issue. Try using the function quarter instead, which should give the same expected output:

df = (spark.table(f'nn_squad7_{country}.fact_table')
     .filter(f.col('date_key').between(start_date,end_date))
     #.filter(f.col('is_lidl_plus')==1)
     .filter(f.col('source')=='tickets')
     .filter(f.col('subtype')=='trx')
     .filter(f.col('is_trx_ok') == 1)
     .join(dim_stores,'store_id','inner')
     .join(dim_customers,'customer_id','inner')
     .withColumn('week', f.expr('DATE_FORMAT(DATE_SUB(date_key, 1), "Y-ww")'))
     .withColumn('quarter', f.expr('quarter(DATE_SUB(date_key, 1))')))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...