Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
579 views
in Technique[技术] by (71.8m points)

apache spark - How to find position of substring column in a another column using Pyspark?

If I have a pyspark DataFrame with two columns, text and subtext, where subtext is guaranteed to occur somewhere within text. How would I calculate the position of subtext in text column?

Input data:

+---------------------------+---------+
|           text            | subtext | 
+---------------------------+---------+
| Where is my string?       | is      |
| Hm, this one is different | on      |
+---------------------------+---------+

Expected output:

+---------------------------+---------+----------+
|           text            | subtext | position |
+---------------------------+---------+----------+
| Where is my string?       | is      |       6  |
| Hm, this one is different | on      |       9  |
+---------------------------+---------+----------+

Note: I can do this using static text/regex without issue, I have not been able to find any resources on doing this with a row-specific text/regex.

question from:https://stackoverflow.com/questions/65831272/how-to-find-position-of-substring-column-in-a-another-column-using-pyspark

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use locate. You need to subtract 1 because string index starts from 1, not 0.

import pyspark.sql.functions as F

df2 = df.withColumn('position', F.expr('locate(subtext, text) - 1'))

df2.show(truncate=False)
+-------------------------+-------+--------+
|text                     |subtext|position|
+-------------------------+-------+--------+
|Where is my string?      |is     |6       |
|Hm, this one is different|on     |9       |
+-------------------------+-------+--------+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...