With those specific string patterns:
import org.apache.spark.sql.functions.udf
val patern = "([0-9.]{2}/[0-9.]{1}|[0-9.]{1}) to ([0-9.]{1})".r
def createArray = udf { str : String =>
val patern(from, _to) = str
((from.split("/").last.toInt to _to.toInt).toArray)
.map(el => {
val strPattern = from.split("/")
s"""${ if(strPattern.length > 1) strPattern(0) + "/" + el else el
}"""
})
}
val r_df = Seq((1,"1 to 6"),(2,"44/1 to 3")).toDF("id","range")
r_df.withColumn("array", createArray($"range")).show(false)
gives:
+---+---------+------------------+
|id |range |array |
+---+---------+------------------+
|1 |1 to 6 |[1, 2, 3, 4, 5, 6]|
|2 |44/1 to 3|[44/1, 44/2, 44/3]|
+---+---------+------------------+
to add a patter to support strings with the format "3a to 5a" just update the regex with:
val patern = "([0-9.]{2}/[0-9.]{1}|[0-9.]{1})[a-zA-Z0-9_]* to ([0-9.]{1})[a-zA-Z0-9_]*".r
For example:
+---+---------+------------------+
|id |range |array |
+---+---------+------------------+
|1 |1 to 6 |[1, 2, 3, 4, 5, 6]|
|2 |44/1 to 3|[44/1, 44/2, 44/3]|
|3 |3a to 5a |[3, 4, 5] |
+---+---------+------------------+
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…