Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
287 views
in Technique[技术] by (71.8m points)

python - How to update a subset of a MultiIndexed pandas DataFrame

I'm using a MultiIndexed pandas DataFrame and would like to multiply a subset of the DataFrame by a certain number.

It's the same as this but with a MultiIndex.

>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009], 
                      'flavour':['strawberry','strawberry','banana','banana',
                      'strawberry','strawberry','banana','banana'],
                      'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
                      'sales':[10,12,22,23,11,13,23,24]})

>>> d = d.set_index(['year','flavour','day'])                  

>>> d
                     sales
year flavour    day       
2008 strawberry sat     10
                sun     12
     banana     sat     22
                sun     23
2009 strawberry sat     11
                sun     13
     banana     sat     23
                sun     24

So far, so good. But let's say I spot that all the Saturday figures are only half what they should be! I'd like to multiply all sat sales by 2.

My first attempt at this was:

sat = d.xs('sat', level='day')
sat = sat * 2
d.update(sat)

but this doesn't work because the variable sat has lost the day level of the index:

>>> sat
                 sales
year flavour          
2008 strawberry     20
     banana         44
2009 strawberry     22
     banana         46

so pandas doesn't know how to join the new sales figures back onto the old dataframe.

I had a quick stab at:

>>> sat = d.xs('sat', level='day', copy=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:Python27libsite-packagespandascoreframe.py", line 2248, in xs
    raise ValueError('Cannot retrieve view (copy=False)')
ValueError: Cannot retrieve view (copy=False)

I have no idea what that error means, but I feel like I'm making a mountain out of a molehill. Does anyone know the right way to do this?

Thanks in advance, Rob

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Note: In soon to be released 0.13 a drop_level argument has been added to xs (thanks to this question!):

In [42]: df.xs('sat', level='day', drop_level=False)
Out[42]:
                     sales
year flavour    day
2008 strawberry sat     10

Another option is to use select (which extracts a sub-DataFrame (copy) of the same data, i.e. it has the same index and so can be updated correctly):

In [11]: d.select(lambda x: x[2] == 'sat') * 2
Out[11]:
                     sales
year flavour    day
2008 strawberry sat     20
     banana     sat     44
2009 strawberry sat     22
     banana     sat     46

In [12]: d.update(d.select(lambda x: x[2] == 'sat') * 2)

Another option is to use an apply:

In [21]: d.apply(lambda x: x*2 if x.name[2] == 'sat' else x, axis=1)

Another option is to use get_level_values (this is probably the most efficient way of these):

In [22]: d[d.index.get_level_values('day') == 'sat'] *= 2

Another option is promote the 'day' level to a column and then use an apply.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...