Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
237 views
in Technique[技术] by (71.8m points)

dataframe - how to select day for week Pyspark

I need to create a column for a day of the week where values will be Monday, Tuesday, Wednesday...

and then apply a filter only for Friday.

The code I'm using is the following:

df = (
      spark.table(f'nn_squad7_{country}.fact_table')
     .filter(f.col('date_key').between(start,end))
     .filter(f.col('is_client_plus')==1)
     .filter(f.col('source')=='tickets')
     .filter(f.col('subtype')=='trx')
     .filter(f.col('is_trx_ok') == 1) 
     .withColumn('week', f.date_format(f.date_sub(f.col('date_key'), 1), 'YYYY-ww'))
     .withColumn('month', f.date_format(f.date_sub(f.col('date_key'), 1), 'M'))
     .withColumn('HP_client', f.col('customer_id').isNotNull())
     .withColumn('local_time',f.from_utc_timestamp(f.col('trx_begin_date_time'),'Europe/Brussels'))
     .withColumn('Hour', f.hour(f.col('local_time')))
     .withColumn('Day', f.day(f.col('local_time')))
     .filter(f.col('Hour').between(4, 8))
     )


Here is the error I get:

AttributeError: module 'pyspark.sql.functions' has no attribute 'day'

How can I create a column for on a dayli basis? Thanks

question from:https://stackoverflow.com/questions/65918594/how-to-select-day-for-week-pyspark

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use F.dayofweek, which returns an integer (1 = Sunday, 2 = Monday, ..., 7 = Saturday).

Alternatively, you can use F.date_format('local_time', 'E'), which returns a string like 'Sun', 'Mon', etc.

'EEEE' returns the string in full, e.g. Sunday, etc.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...