python - Cumulative operations on dtype objects

Question

Welcome To Ask or Share your Answers For Others

python - Cumulative operations on dtype objects

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Cumulative operations on dtype objects

I am trying to figure out how I can apply cumulative functions to objects. For numbers there are several alternatives like cumsum and cumcount. There is also df.expanding which can be used with apply. But the functions I pass to apply do not work on objects.

import pandas as pd
df = pd.DataFrame({"C1": [1, 2, 3, 4], 
                   "C2": [{"A"}, {"B"}, {"C"}, {"D"}], 
                   "C3": ["A", "B", "C", "D"], 
                   "C4": [["A"], ["B"], ["C"], ["D"]]})

df
Out: 
   C1   C2 C3   C4
0   1  {A}  A  [A]
1   2  {B}  B  [B]
2   3  {C}  C  [C]
3   4  {D}  D  [D]

In the dataframe I have integer values, sets, strings and lists. Now, if I try expanding().apply(sum) I have the cumulative sum:

df.expanding().apply(sum)
Out[69]: 
     C1   C2 C3   C4
0   1.0  {A}  A  [A]
1   3.0  {B}  B  [B]
2   6.0  {C}  C  [C]
3  10.0  {D}  D  [D]

My expectation was, since summation is defined on lists and strings, I would get something like this:

     C1   C2  C3     C4
0   1.0  {A}  A      [A]
1   3.0  {B}  AB     [A, B]
2   6.0  {C}  ABC    [A, B, C]
3  10.0  {D}  ABCD   [A, B, C, D]

I also tried something like this:

df.expanding().apply(lambda r: reduce(lambda x, y: x+y**2, r))
Out: 
     C1   C2 C3   C4
0   1.0  {A}  A  [A]
1   5.0  {B}  B  [B]
2  14.0  {C}  C  [C]
3  30.0  {D}  D  [D]

It works as I expect: previous result is x and the current row value is y. But I cannot reduce using x.union(y), for example.

So, my question is: Are there any alternatives to expanding that I can use on objects? The example is just to show that expanding().apply() is not working on object dtypes. I am looking for a general solution that supports applying functions to those two inputs: previous result and the current element.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:35:18+0000

Turns out this cannot be done.

Continuing on the same sample:

def burndowntheworld(ser):
    print('Are you sure?')
    return ser/0

df.select_dtypes(['object']).expanding().apply(burndowntheworld)
Out: 
    C2 C3   C4
0  {A}  A  [A]
1  {B}  B  [B]
2  {C}  C  [C]
3  {D}  D  [D]

If the column's type is object, the function is never called. And pandas doesn't have an alternative that works on objects. It's the same for rolling().apply().

In some sense, this is a good thing because expanding.apply with a custom function has O(n**2) complexity. With special cases like cumsum, ewma etc, the recursive nature of the operations can decrease the complexity to linear time but in the most general case it should calculate the function for the first n elements, and then for the first n+1 elements and so on. Therefore, especially for a function which is only dependent on the current value and function's previous value, expanding is quite inefficient. Not to mention storing lists or sets in a DataFrame is never a good idea to begin with.

So the answer is: if your data is not numeric and the function is dependent on the previous result and the current element, just use a for loop. It will be more efficient anyway.

Categories

python - Cumulative operations on dtype objects

python - Cumulative operations on dtype objects

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags