I have a data set from a machine running in cycle. I want to extract features out of the time series that occurs during the machine is running for every cycle.
Then I use groupby() to gather the data of each cycle. Now I wanted to use np.trapz() to get the area under the curve, but I got stack at it.
If you know any other easier way to do, it's fine for me as well.
Here are the data:
data = {'date_time':['2017-03-22 10:07',
'2017-03-23 10:08',
'2017-03-24 10:09',
'2017-03-25 10:10',
'2017-03-26 10:11',
'2017-03-27 10:12',
'2017-03-28 10:13',
'2017-03-29 10:14',
'2017-03-22 10:15',
'2017-03-22 10:16',
'2017-03-22 10:17',
'2017-03-22 10:18',
'2017-03-22 10:19',
'2017-03-22 10:20',
'2017-03-22 10:21',
'2017-03-22 10:22',
'2017-03-22 10:23',
'2017-03-22 10:24',
'2017-03-22 10:25',
'2017-03-22 10:26',
'2017-03-22 10:27',
'2017-03-22 10:28',
'2017-03-22 10:29',
'2017-03-22 10:30'],
'production_line_no':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1],
'var1':[20, 21, 4, 18, 20, 21, 2, 18, 20, 21, 1, 18, 10, 6, 9, 8, 10, 3, 9, 8, 7, 18, 20, 21],
'var2':[20, 21, 19, 18, 20, 21, 19, 18, 20, 21, 19, 18, 10, 11, 9, 8, 10, 11, 9, 8, 19, 18, 20, 21],
'running':[0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
}
df=pd.DataFrame(data)
Now I calculate several aggregations for each cycle like this and it works well:
# Convert date time string to DateTime type
df['date_time']= pd.to_datetime(df['date_time'], format='%Y-%m-%d %H:%M')
# Create cycle_ID to differenciate the different cycles and prepare the use of groupby() + agg()
df['cycle_ID']=df['running'].diff().abs().cumsum()*df['running']
# Define agg type Pandas built-in
aggregations = {
'var1':['std','min'],
'var2':['std','min'],
'date_time':[lambda x:(max(x) - min(x)).days, 'min','max']
}
# Create the gouped object
grouped=df.groupby(by=['cycle_ID', 'production_line_no'],as_index=False).agg(aggregations)
grouped
here is the output:
df of aggregations
Now I would like to add a new feature per variable that is the area under the curve. I try to use the groupby() + np.trapz() but I got stuck.
grouped_area=df.groupby(by=['cycle_ID', 'production_line_no'],as_index=False).apply(lambda x: np.trapz(x, dx=1.0))
I got that error: ValueError: Cannot add integral value to Timestamp without freq.
question from:
https://stackoverflow.com/questions/65875878/how-to-calculate-area-under-the-curve-integral-after-groupby