Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
280 views
in Technique[技术] by (71.8m points)

Counting consecutive positive values in Python/pandas array

I'm trying to count consecutive up days in equity return data; so if a positive day is 1 and a negative is 0, a list y=[0,0,1,1,1,0,0,1,0,1,1] should return z=[0,0,1,2,3,0,0,1,0,1,2].

I've come to a solution which has few lines of code, but is very slow:

import pandas
y = pandas.Series([0,0,1,1,1,0,0,1,0,1,1])

def f(x):
    return reduce(lambda a,b:reduce((a+b)*b,x)

z = pandas.expanding_apply(y,f)

I'm guessing I'm looping through the whole list y too many times. Is there a nice Pythonic way of achieving what I want while only going through the data once? I could write a loop myself but wondering if there's a better way.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
>>> y = pandas.Series([0,0,1,1,1,0,0,1,0,1,1])

The following may seem a little magical, but actually uses some common idioms: since pandas doesn't yet have nice native support for a contiguous groupby, you often find yourself needing something like this.

>>> y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1)
0     0
1     0
2     1
3     2
4     3
5     0
6     0
7     1
8     0
9     1
10    2
dtype: int64

Some explanation: first, we compare y against a shifted version of itself to find when the contiguous groups begin:

>>> y != y.shift()
0      True
1     False
2      True
3     False
4     False
5      True
6     False
7      True
8      True
9      True
10    False
dtype: bool

Then (since False == 0 and True == 1) we can apply a cumulative sum to get a number for the groups:

>>> (y != y.shift()).cumsum()
0     1
1     1
2     2
3     2
4     2
5     3
6     3
7     4
8     5
9     6
10    6
dtype: int32

We can use groupby and cumcount to get us an integer counting up in each group:

>>> y.groupby((y != y.shift()).cumsum()).cumcount()
0     0
1     1
2     0
3     1
4     2
5     0
6     1
7     0
8     0
9     0
10    1
dtype: int64

Add one:

>>> y.groupby((y != y.shift()).cumsum()).cumcount() + 1
0     1
1     2
2     1
3     2
4     3
5     1
6     2
7     1
8     1
9     1
10    2
dtype: int64

And finally zero the values where we had zero to begin with:

>>> y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1)
0     0
1     0
2     1
3     2
4     3
5     0
6     0
7     1
8     0
9     1
10    2
dtype: int64

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...