You can join the regex
dataframe to the df
dataframe using an rlike
condition, and then get the count for each regex:
import pyspark.sql.functions as F
regex = spark.read.csv("regex.csv", header=False)
result = df.alias('df').join(
regex.alias('regex'),
F.expr('df._c0 rlike regex._c0')
).groupBy('regex._c0').count()
result.show()
+------------+-----+
| _c0|count|
+------------+-----+
|Arizona.*hot| 2|
| Mahi*| 3|
| WeLove*| 1|
+------------+-----+
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…