Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
404 views
in Technique[技术] by (71.8m points)

python - Styling of Pandas groupby boxplots

The normal matplotlib boxplot command in Python returns a dictionary with keys for the boxes, median, whiskers, fliers, and caps. This makes styling really easy.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Create a dataframe and subset it for a boxplot
df1 = pd.DataFrame(rand(10), columns=['Col1'] )
df1['X'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])
boxes= [df1[df1['X'] == 'A'].Col1, df1[df1['X'] == 'B'].Col1]

# Call the standard matplotlib boxplot function,
# which returns a dictionary including the parts of the graph
mbp = plt.boxplot(boxes)
print(type(mbp))

# This dictionary output makes styling the boxplot easy
plt.setp(mbp['boxes'], color='blue')
plt.setp(mbp['medians'], color='red')
plt.setp(mbp['whiskers'], color='blue')
plt.setp(mbp['fliers'], color='blue')

The Pandas library has an "optimized" boxplot function for its grouped (hierarchically indexed ) dataframes. Instead of returning several dictionaries for each group, however, it returns an matplotlib.axes.AxesSubplot object. This makes styling very difficult.

# Pandas has a built-in boxplot function that returns
# a matplotlib.axes.AxesSubplot object
pbp = df1.boxplot(by='X')
print(type(pbp))

# Similar attempts at styling obviously return TypeErrors
plt.setp(pbp['boxes'], color='blue')
plt.setp(pbp['medians'], color='red')
plt.setp(pbp['whiskers'], color='blue')
plt.setp(pbp['fliers'], color='blue')

Is this AxisSubplot object produced by the pandas df.boxplot(by='X') function accessible?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You could also specify the return_type as dict. This will return the boxplot properties directly in a dictionary, which is indexed by each column that was plotted in the boxplot.

To use the example above (in IPython):

from pandas import *
import matplotlib
from numpy.random import rand
import matplotlib.pyplot as plt
df = DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = Series(['A','A','A','A','A','B','B','B','B','B'])
bp = df.boxplot( by='X', return_type='dict' )

>>> bp.keys()
['Col1', 'Col2']

>>> bp['Col1'].keys()
['boxes', 'fliers', 'medians', 'means', 'whiskers', 'caps']

Now, changing linewidths is a matter of a list comprehension :

>>> [ [item.set_linewidth( 2 ) for item in bp[key]['medians']] for key in bp.keys() ]
[[None, None], [None, None]]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...