Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
342 views
in Technique[技术] by (71.8m points)

python - How to calculate area under the curve (integral) after groupby()?

I have a data set from a machine running in cycle. I want to extract features out of the time series that occurs during the machine is running for every cycle.

Then I use groupby() to gather the data of each cycle. Now I wanted to use np.trapz() to get the area under the curve, but I got stack at it.

If you know any other easier way to do, it's fine for me as well.

Here are the data:


data = {'date_time':['2017-03-22 10:07',
                     '2017-03-23 10:08',
                     '2017-03-24 10:09',
                     '2017-03-25 10:10',
                     '2017-03-26 10:11',
                     '2017-03-27 10:12',
                     '2017-03-28 10:13',
                     '2017-03-29 10:14',
                     '2017-03-22 10:15',
                     '2017-03-22 10:16',
                     '2017-03-22 10:17',
                     '2017-03-22 10:18',
                     '2017-03-22 10:19',
                     '2017-03-22 10:20',
                     '2017-03-22 10:21',
                     '2017-03-22 10:22',
                     '2017-03-22 10:23',
                     '2017-03-22 10:24',
                     '2017-03-22 10:25',
                     '2017-03-22 10:26',
                     '2017-03-22 10:27',
                     '2017-03-22 10:28',
                     '2017-03-22 10:29',
                     '2017-03-22 10:30'],
     
        'production_line_no':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1], 
        'var1':[20, 21, 4, 18, 20, 21, 2, 18, 20, 21, 1, 18, 10, 6, 9, 8, 10, 3, 9, 8, 7, 18, 20, 21], 
        'var2':[20, 21, 19, 18, 20, 21, 19, 18, 20, 21, 19, 18, 10, 11, 9, 8, 10, 11, 9, 8, 19, 18, 20, 21], 
        'running':[0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
       }

df=pd.DataFrame(data)

Now I calculate several aggregations for each cycle like this and it works well:


# Convert date time string to DateTime type
df['date_time']= pd.to_datetime(df['date_time'], format='%Y-%m-%d %H:%M')

# Create cycle_ID to differenciate the different cycles and prepare the use of groupby() + agg()
df['cycle_ID']=df['running'].diff().abs().cumsum()*df['running']

# Define agg type Pandas built-in
aggregations = {
    'var1':['std','min'],
    'var2':['std','min'],
    'date_time':[lambda x:(max(x) - min(x)).days, 'min','max']
}

# Create the gouped object
grouped=df.groupby(by=['cycle_ID', 'production_line_no'],as_index=False).agg(aggregations)
grouped

here is the output: df of aggregations

Now I would like to add a new feature per variable that is the area under the curve. I try to use the groupby() + np.trapz() but I got stuck.

grouped_area=df.groupby(by=['cycle_ID', 'production_line_no'],as_index=False).apply(lambda x: np.trapz(x, dx=1.0))

I got that error: ValueError: Cannot add integral value to Timestamp without freq.

question from:https://stackoverflow.com/questions/65875878/how-to-calculate-area-under-the-curve-integral-after-groupby

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...