Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
161 views
in Technique[技术] by (71.8m points)

python - Order of pandas dataframe ranking does not match the order of the original dataframe

The rank result of pandas DataFrame seems weird. This is a sample code:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.random((10, 5)))
>>> df
          0         1         2         3         4
0  0.956603  0.379341  0.268281  0.446098  0.630782
1  0.939022  0.732704  0.892836  0.813121  0.829652
2  0.628488  0.046074  0.344966  0.422442  0.942899
3  0.535603  0.473202  0.885504  0.481541  0.873048
4  0.908629  0.449296  0.740381  0.356437  0.670467
5  0.631618  0.147706  0.381521  0.723074  0.151051
6  0.276021  0.274220  0.812456  0.283248  0.609319
7  0.112798  0.855934  0.198935  0.433243  0.247930
8  0.479593  0.643699  0.068690  0.465188  0.907548
9  0.452467  0.295931  0.629863  0.565983  0.784952
>>> df.rank(pct=True)
     0    1    2    3    4
0  1.0  0.5  0.3  0.5  0.4
1  0.9  0.9  1.0  1.0  0.7
2  0.6  0.1  0.4  0.3  1.0
3  0.5  0.7  0.9  0.7  0.8
4  0.8  0.6  0.7  0.2  0.5
5  0.7  0.2  0.5  0.9  0.1
6  0.2  0.3  0.8  0.1  0.3
7  0.1  1.0  0.2  0.4  0.2
8  0.4  0.8  0.1  0.6  0.9
9  0.3  0.4  0.6  0.8  0.6
>>> df.iloc[0, :].rank(pct=True)
0    1.0
1    0.4
2    0.2
3    0.6
4    0.8
Name: 0, dtype: float64

I don't understand why the first row of the ranking on the df row-wise (the default for axis is 0) is not the same as the ranking on the first row of the data frame.

Also, the result of df.rank(pct=True) seems weird. Looking at the first row of df, we see that col0 > col4 > col3 > col1 > col2. Since the default is ascending=True, I would expect the result of df.rank(pct=True) to also have the same order, but its result is col0 > col3 = col1 > col4 > col2. On the other hand, the order of df.iloc[0,:].rank(pct=True) seems correct. So my question is:

  1. Why is the first row of df.rank(pct=True) different from df.iloc[0, :].rank(pct=True)?
  2. Why is the order of df.rank(pct=True) not the same as the order of df?
question from:https://stackoverflow.com/questions/65623316/order-of-pandas-dataframe-ranking-does-not-match-the-order-of-the-original-dataf

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...