When iterating through a Dataframe using .foreach
in Spark Scala is it possible to access another DataFrame, or load a DataFrame from SparkSQL, to make comparisons? For example, DF1 has available days and if a day is marked as not available on DF1 but appears on DF2 I would like to ignore that row of DF1. I have the logic working when I do a .collect
on DF1 and iterate, but DF1 will be a large dataset and I do not want to be pulling all of that data back on to the driver.
DF1 Schema
|-- id: integer (nullable = false)
|-- monday: boolean (nullable = false)
|-- tuesday: boolean (nullable = false)
|-- wednesday: boolean (nullable = false)
|-- thursday: boolean (nullable = false)
|-- friday: boolean (nullable = false)
|-- saturday: boolean (nullable = false)
|-- sunday: boolean (nullable = false)
DF2 Schema
|-- start: timestamp (nullable = false)
|-- end: timestamp (nullable = false)
|-- dayStart: string (nullable = false)
|-- dayEnd: string (nullable = false)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…