The rank result of pandas DataFrame seems weird. This is a sample code:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.random((10, 5)))
>>> df
0 1 2 3 4
0 0.956603 0.379341 0.268281 0.446098 0.630782
1 0.939022 0.732704 0.892836 0.813121 0.829652
2 0.628488 0.046074 0.344966 0.422442 0.942899
3 0.535603 0.473202 0.885504 0.481541 0.873048
4 0.908629 0.449296 0.740381 0.356437 0.670467
5 0.631618 0.147706 0.381521 0.723074 0.151051
6 0.276021 0.274220 0.812456 0.283248 0.609319
7 0.112798 0.855934 0.198935 0.433243 0.247930
8 0.479593 0.643699 0.068690 0.465188 0.907548
9 0.452467 0.295931 0.629863 0.565983 0.784952
>>> df.rank(pct=True)
0 1 2 3 4
0 1.0 0.5 0.3 0.5 0.4
1 0.9 0.9 1.0 1.0 0.7
2 0.6 0.1 0.4 0.3 1.0
3 0.5 0.7 0.9 0.7 0.8
4 0.8 0.6 0.7 0.2 0.5
5 0.7 0.2 0.5 0.9 0.1
6 0.2 0.3 0.8 0.1 0.3
7 0.1 1.0 0.2 0.4 0.2
8 0.4 0.8 0.1 0.6 0.9
9 0.3 0.4 0.6 0.8 0.6
>>> df.iloc[0, :].rank(pct=True)
0 1.0
1 0.4
2 0.2
3 0.6
4 0.8
Name: 0, dtype: float64
I don't understand why the first row of the ranking on the df
row-wise (the default for axis
is 0) is not the same as the ranking on the first row of the data frame.
Also, the result of df.rank(pct=True)
seems weird. Looking at the first row of df
, we see that col0
> col4
> col3
> col1
> col2
. Since the default is ascending=True
, I would expect the result of df.rank(pct=True)
to also have the same order, but its result is col0
> col3
= col1
> col4
> col2
. On the other hand, the order of df.iloc[0,:].rank(pct=True)
seems correct.
So my question is:
- Why is the first row of
df.rank(pct=True)
different from df.iloc[0, :].rank(pct=True)
?
- Why is the order of
df.rank(pct=True)
not the same as the order of df
?
question from:
https://stackoverflow.com/questions/65623316/order-of-pandas-dataframe-ranking-does-not-match-the-order-of-the-original-dataf 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…