python - Seaborn countplot with normalized y axis per group

Question

Welcome To Ask or Share your Answers For Others

python - Seaborn countplot with normalized y axis per group

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Seaborn countplot with normalized y axis per group

I was wondering if it is possible to create a Seaborn count plot, but instead of actual counts on the y-axis, show the relative frequency (percentage) within its group (as specified with the hue parameter).

I sort of fixed this with the following approach, but I can't imagine this is the easiest approach:

# Plot percentage of occupation per income class
grouped = df.groupby(['income'], sort=False)
occupation_counts = grouped['occupation'].value_counts(normalize=True, sort=False)

occupation_data = [
    {'occupation': occupation, 'income': income, 'percentage': percentage*100} for 
    (income, occupation), percentage in dict(occupation_counts).items()
]

df_occupation = pd.DataFrame(occupation_data)

p = sns.barplot(x="occupation", y="percentage", hue="income", data=df_occupation)
_ = plt.setp(p.get_xticklabels(), rotation=90)  # Rotate labels

Result:

I'm using the well known adult data set from the UCI machine learning repository. The pandas dataframe is created like this:

# Read the adult dataset
df = pd.read_csv(
    "data/adult.data",
    engine='c',
    lineterminator='
',

    names=['age', 'workclass', 'fnlwgt', 'education', 'education_num',
           'marital_status', 'occupation', 'relationship', 'race', 'sex',
           'capital_gain', 'capital_loss', 'hours_per_week',
           'native_country', 'income'],
    header=None,
    skipinitialspace=True,
    na_values="?"
)

This question is sort of related, but does not make use of the hue parameter. And in my case I cannot just change the labels on the y-axis, because the height of the bar must depend on the group.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:28:18+0000

I might be confused. The difference between your output and the output of

occupation_counts = (df.groupby(['income'])['occupation']
                     .value_counts(normalize=True)
                     .rename('percentage')
                     .mul(100)
                     .reset_index()
                     .sort_values('occupation'))
p = sns.barplot(x="occupation", y="percentage", hue="income", data=occupation_counts)
_ = plt.setp(p.get_xticklabels(), rotation=90)  # Rotate labels

is, it seems to me, only the order of the columns.

And you seem to care about that, since you pass sort=False. But then, in your code the order is determined uniquely by chance (and the order in which the dictionary is iterated even changes from run to run with Python 3.5).

Categories

python - Seaborn countplot with normalized y axis per group

python - Seaborn countplot with normalized y axis per group

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags