I try to make a multi-step prediction for a timeseries of sensor-data. Therefore, I used the actual sensor value in dependency of its lagged values to train the sk-learn LinearRegression model. The single-step prediction worked well with a R^2-score of 0.98 on the test-data.
Then, I used the predicted values to make the next prediction repeatedly. But the prediction ended up in a straight line very quickly. What is my mistake ? And how can I do it correctly? I also tried it with forecast and an ARIMA model, but that resulted in the same issues. I just started with Machine Learning, so I'm grateful for any advice.
This is my code, starting with the first model training:
lm_model = LinearRegression()
lm_result = lm_model.fit(X_train,y_train)
pred_1 = pd.Series(lm_model.predict(X_test).squeeze())
pred_1.index = test.index
test['dp_pred_1'] = pred_1
#setting up string for dmatrices - with each pass a lag is replaced by a prediction
for step in range(1,future_steps,1):
base_string = 'dp_pred_'+ str(step)+'~dp_filter'
pred_string = ''
lag_string = ''
#prediction_part
for i in range(1,step,1):
pred_string +='+dp_pred_' +str(i)
#lag_part
for j in range(n_lags-step,0,-1):
lag_string += '+dp_lag_'+str(j)
#dmatrices_string
dm_string = base_string + pred_string + lag_string
#generate Designmatrix
y_test2, X_test2 = dmatrices(dm_string,test)
#write prediction in df
test['dp_pred_'+str(step+1)]=lm_result.predict(X_test2).squeeze()
Plotted prediction vs. actual values:
1 step prediction
20 steps prediction
question from:
https://stackoverflow.com/questions/65836676/multi-step-timeseries-prediction-using-linear-regression-with-scikit-learn 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…