Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
152 views
in Technique[技术] by (71.8m points)

python - pyspark df.select(*) is disordered after df.sort()

This is my original pyspark dataframe.

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   1|   1|   2|
|   1|   2|   2|
|   1|   3|   2|
|   1|   2|   1|
|   2|   1|   2|
|   2|   3|   2|
|   2|   2|   1|
|   3|   1|   2|
|   3|   3|   2|
|   3|   2|   1|
+----+----+----+

On sorting df

df = df.sort('col2')
test = df.select('col1','col2','col3')
test.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   3|   1|   2|
|   2|   1|   2|
|   1|   1|   2|
|   1|   2|   1|
|   3|   2|   1|
|   1|   2|   2|
|   2|   2|   1|
|   1|   3|   2|
|   3|   3|   2|
|   2|   3|   2|
+----+----+----+
df.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   2|   1|   2|
|   3|   1|   2|
|   1|   1|   2|
|   1|   2|   2|
|   3|   2|   1|
|   1|   2|   1|
|   2|   2|   1|
|   3|   3|   2|
|   2|   3|   2|
|   1|   3|   2|
+----+----+----+

We can see that the row order of the test is different from df, I don't know what happened, can someone help me understand?

question from:https://stackoverflow.com/questions/65915215/pyspark-df-select-is-disordered-after-df-sort

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...