apache spark - how to loop through each row of dataFrame in pyspark

Question

Welcome To Ask or Share your Answers For Others

apache spark - how to loop through each row of dataFrame in pyspark

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

apache spark - how to loop through each row of dataFrame in pyspark

E.g

sqlContext = SQLContext(sc)

sample=sqlContext.sql("select Name ,age ,city from user")
sample.show()

The above statement print entire table on terminal but i want to access each row in that table using for or while to perform further calculations .

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:58:24+0000

You simply cannot. DataFrames, same as other distributed data structures, are not iterable and can be accessed using only dedicated higher order function and / or SQL methods.

You can of course collect

for row in df.rdd.collect():
    do_something(row)

or convert toLocalIterator

for row in df.rdd.toLocalIterator():
    do_something(row)

and iterate locally as shown above, but it beats all purpose of using Spark.

Categories

apache spark - how to loop through each row of dataFrame in pyspark

apache spark - how to loop through each row of dataFrame in pyspark

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags