Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
607 views
in Technique[技术] by (71.8m points)

matplotlib - Grouping boxplots in seaborn when input is a DataFrame

I intend to plot multiple columns in a pandas dataframe, all grouped by another column using groupby inside seaborn.boxplot. There is a nice answer here, for a similar problem in matplotlib matplotlib: Group boxplots but given the fact that seaborn.boxplot comes with groupby option I thought it could be much easier to do this in seaborn.

Here we go with a reproducible example that fails:

import seaborn as sns
import pandas as pd
df = pd.DataFrame(
[
[2, 4, 5, 6, 1],
[4, 5, 6, 7, 2],
[5, 4, 5, 5, 1],
[10, 4, 7, 8, 2],
[9, 3, 4, 6, 2],
[3, 3, 4, 4, 1]
], columns=['a1', 'a2', 'a3', 'a4', 'b'])

#Plotting by seaborn
sns.boxplot(df[['a1','a2', 'a3', 'a4']], groupby=df.b)

What I get is something that completely ignores groupby option:

Failed groupby

Whereas if I do this with one column it works thanks to another SO question Seaborn groupby pandas Series :

sns.boxplot(df.a1, groupby=df.b)

seaborn that does not fail

So I would like to get all my columns in one plot (all columns come in a similar scale).

EDIT:

The above SO question was edited and now includes a 'not clean' answer to this problem, but it would be nice if someone has a better idea for this problem.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

As the other answers note, the boxplot function is limited to plotting a single "layer" of boxplots, and the groupby parameter only has an effect when the input is a Series and you have a second variable you want to use to bin the observations into each box..

However, you can accomplish what I think you're hoping for with the factorplot function, using kind="box". But, you'll first have to "melt" the sample dataframe into what is called long-form or "tidy" format where each column is a variable and each row is an observation:

df_long = pd.melt(df, "b", var_name="a", value_name="c")

Then it's very simple to plot:

sns.factorplot("a", hue="b", y="c", data=df_long, kind="box")

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...