Join on minimum date between two dates - Spark SQL

Question

Welcome To Ask or Share your Answers For Others

Join on minimum date between two dates - Spark SQL

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Join on minimum date between two dates - Spark SQL

I have a table of daily data and a table of monthly data. I'm trying to retrieve one daily record corresponding to each monthly record. The wrinkles are that some days are missing from the daily data and the field I care about, new_status, is sometimes null on the month_end_date.

month_df

| ID | month_end_date |
| -- | -------------- |
| 1  | 2019-07-31     |
| 1  | 2019-06-30     |
| 2  | 2019-10-31     |

daily_df

| ID | daily_date | new_status |
| -- | ---------- | ---------- |
| 1  | 2019-07-29 | 1          |
| 1  | 2019-07-30 | 1          |
| 1  | 2019-08-01 | 2          |
| 1  | 2019-08-02 | 2          |
| 1  | 2019-08-03 | 2          |
| 1  | 2019-06-29 | 0          |
| 1  | 2019-06-30 | 0          |
| 2  | 2019-10-30 | 5          |
| 2  | 2019-10-31 | NULL       |
| 2  | 2019-11-01 | 6          |
| 2  | 2019-11-02 | 6          |

I want to fuzzy join daily_df to monthly_df where daily_date is >= month_end_dt and less than some buffer afterwards (say, 5 days). I want to keep only the record with the minimum daily date and a non-null new_status.

This post solves the issue using an OUTER APPLY in SQL Server, but that seems not to be an option in Spark SQL. I'm wondering if there's a method that is similarly computationally efficient that works in Spark.

question from:https://stackoverflow.com/questions/65907854/join-on-minimum-date-between-two-dates-spark-sql

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

Join on minimum date between two dates - Spark SQL

Join on minimum date between two dates - Spark SQL

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags