Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
123 views
in Technique[技术] by (71.8m points)

python - How to select only the constant values in a timeseries

I have a time series with speed and would like to detect all parts that are constant for more than a specific time. Lets say with the following data i want to detect when there was no movement for more than 2 minutes and put those parts into another dataframe (along with all other columns)

2020-02-27 15:43:00    0.000000
2020-02-27 15:43:30    0.000000
2020-02-27 15:44:00    0.000000
2020-02-27 15:44:30    0.000000
2020-02-27 15:45:00    0.000000
2020-02-27 15:45:30    0.000000
2020-02-27 15:46:00    0.000000
2020-02-27 15:46:30    0.000000
2020-02-27 15:47:00    0.000000
2020-02-27 15:47:30    0.000000
2020-02-27 15:48:00    0.000000
2020-02-27 15:48:30    0.000000
2020-02-27 15:49:00    0.000000
2020-02-27 15:49:30    0.000000
2020-02-27 15:50:00    0.000000
2020-02-27 15:50:30    0.000000
2020-02-27 15:51:00    0.000000
2020-02-27 15:51:30    0.000000
2020-02-27 15:52:00    1.004333
2020-02-27 15:52:30    2.002667
2020-02-27 15:53:00    5.001000
2020-02-27 15:53:30    6.002667
2020-02-27 15:54:00    8.001000
2020-02-27 15:54:30    4.000667
2020-02-27 15:55:00    3.000000
2020-02-27 15:55:30    0.000000
2020-02-27 15:56:00    0.000000
2020-02-27 15:56:30    0.000000
2020-02-27 15:57:00    0.000000
2020-02-27 15:57:30    0.000000
2020-02-27 15:58:00    0.000000

So then the result would be a df_constant with data from 2020-02-27 15:43:00 until 2020-02-27 15:51:30 & 2020-02-27 15:55:30 until 2020-02-27 15:58:00

question from:https://stackoverflow.com/questions/65952609/how-to-select-only-the-constant-values-in-a-timeseries

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
  • This is a completely vectorized solution, so it will be fast compared to solutions with loops or apply.
  • The datetime column should be converted to a datetime dtype, and then sorted on that column, but that column isn't used to determine the consecutive occurrences.
  • This solution uses parts of two other Stack Overflow answers:
    1. GroupBy Pandas Count Consecutive Zero's
    2. Pandas groupby take counts greater than 1
  • The issue is, the data can't be grouped by val because, as in the example, the groups of consecutive numbers are not unique (e.g. both groups are 0.0)
    • .ne, .shift, and .cumsum are used to create a Series, where each sequence of consecutive values, is a unique value.
    • With a sequence of unique consecutive values, groupby can be used to create a Boolean mask to select the rows, where the count of consecutive values is greater than 4, in this case.
      • df['val'].groupby(g).transform('count') > 4 creates a Boolean mask, which is use to select rows from df[['datetime', 'val']]
      • Since the request is for no movement in a 2 minute period, the count should be >=4 because the time steps are 30 seconds, and 5 consecutive occurrences is 2 minutes
import pandas as pd

# sample dataframe is the same as the data in the op
data = {'datetime': ['2020-02-27 15:43:00', '2020-02-27 15:43:30', '2020-02-27 15:44:00', '2020-02-27 15:44:30', '2020-02-27 15:45:00', '2020-02-27 15:45:30', '2020-02-27 15:46:00', '2020-02-27 15:46:30', '2020-02-27 15:47:00', '2020-02-27 15:47:30', '2020-02-27 15:48:00', '2020-02-27 15:48:30', '2020-02-27 15:49:00', '2020-02-27 15:49:30', '2020-02-27 15:50:00', '2020-02-27 15:50:30', '2020-02-27 15:51:00', '2020-02-27 15:51:30', '2020-02-27 15:52:00', '2020-02-27 15:52:30', '2020-02-27 15:53:00', '2020-02-27 15:53:30', '2020-02-27 15:54:00', '2020-02-27 15:54:30', '2020-02-27 15:55:00', '2020-02-27 15:55:30', '2020-02-27 15:56:00', '2020-02-27 15:56:30', '2020-02-27 15:57:00', '2020-02-27 15:57:30', '2020-02-27 15:58:00'], 'val': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.004333, 2.002667, 5.001, 6.002667, 8.001, 4.000667, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]}
df = pd.DataFrame(data)

# display(df.head())
              datetime  val
0  2020-02-27 15:43:00  0.0
1  2020-02-27 15:43:30  0.0
2  2020-02-27 15:44:00  0.0
3  2020-02-27 15:44:30  0.0
4  2020-02-27 15:45:00  0.0

# create a Series with the same index as df, where the consecutive values are unique
g = df.val.ne(df.val.shift()).cumsum()

# use g with groupby to count the consecutive values and then create a Boolean using > 4 (will represent 2 minutes, when the time interval is 30 seconds).
consecutive_data = df[['datetime', 'val']][df['val'].groupby(g).transform('count') > 4]

display(consecutive_data)

               datetime  val
0   2020-02-27 15:43:00  0.0
1   2020-02-27 15:43:30  0.0
2   2020-02-27 15:44:00  0.0
3   2020-02-27 15:44:30  0.0
4   2020-02-27 15:45:00  0.0
5   2020-02-27 15:45:30  0.0
6   2020-02-27 15:46:00  0.0
7   2020-02-27 15:46:30  0.0
8   2020-02-27 15:47:00  0.0
9   2020-02-27 15:47:30  0.0
10  2020-02-27 15:48:00  0.0
11  2020-02-27 15:48:30  0.0
12  2020-02-27 15:49:00  0.0
13  2020-02-27 15:49:30  0.0
14  2020-02-27 15:50:00  0.0
15  2020-02-27 15:50:30  0.0
16  2020-02-27 15:51:00  0.0
17  2020-02-27 15:51:30  0.0
25  2020-02-27 15:55:30  0.0
26  2020-02-27 15:56:00  0.0
27  2020-02-27 15:56:30  0.0
28  2020-02-27 15:57:00  0.0
29  2020-02-27 15:57:30  0.0
30  2020-02-27 15:58:00  0.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...