I am working on a timeseries dataset of a retailers transactions the previous 3 years. I want to get rid of any trend lines and seasonality before I use machine learning.
These are the columns and the DType of the datafreame (ds1):
Int64Index: 538435 entries, 0 to 538642
Data columns (total 20 columns):
Column Non-Null Count Dtype
--- ------ -------------- -----
0 WEEK_END_DATE 538435 non-null datetime64[ns]
1 UNITS 538435 non-null int64
2 PRICE 538435 non-null float64
3 FEATURE 538435 non-null int64
4 DISPLAY 538435 non-null int64
5 TPR_ONLY 538435 non-null int64
6 DESCRIPTION 538435 non-null object
7 MANUFACTURER 538435 non-null object
8 CATEGORY 538435 non-null object
9 SUB_CATEGORY 538435 non-null object
10 PRODUCT_SIZE 538435 non-null object
11 ADDRESS_STATE_PROV_CODE 538435 non-null object
12 MSA_CODE 538435 non-null int64
13 SEG_VALUE_NAME 538435 non-null object
14 PARKING_SPACE_QTY 538435 non-null float64
15 SALES_AREA_SIZE_NUM 538435 non-null int64
16 AVG_WEEKLY_BASKETS 538435 non-null float64
17 Units_a_visit 538435 non-null float64
18 Visits_per_hhs 538435 non-null float64
19 DISCOUNT 538435 non-null float64
dtypes: datetime64[ns](1), float64(6), int64(6), object(7)
memory usage: 86.3+ MB
I have tried the following:
from pandas import Series
from matplotlib import pyplot
import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose
ds1.index = ds1.WEEK_END_DATE
result = seasonal_decompose(ds1, model='additive')
result.plot()
pyplot.show()
as well as
res = sm.tsa.seasonal_decompose(ds1.interpolate())
res.plot()
For both I get the following error message:
TypeError Traceback (most recent call last)
<ipython-input-76-322daa0c1fd6> in <module>()
6 ds1.index = ds1.WEEK_END_DATE
7
----> 8 result = seasonal_decompose(ds1)
9 result.plot()
10 pyplot.show()
/usr/local/lib/python3.6/dist-packages/statsmodels/tsa/seasonal.py in seasonal_decompose(x, model, filt, freq, two_sided, extrapolate_trend)
113 nobs = len(x)
114
--> 115 if not np.all(np.isfinite(x)):
116 raise ValueError("This function does not handle missing values")
117 if model.startswith('m'):
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I have also tried to only analyse the int and floats:
numeric = ['UNITS','PRICE','FEATURE','DISPLAY','TPR_ONLY','PARKING_SPACE_QTY','SALES_AREA_SIZE_NUM','AVG_WEEKLY_BASKETS','Units_a_visit','Visits_per_hhs','DISCOUNT']
from pandas import Series
from matplotlib import pyplot
import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(ds1[numeric], model='additive')
result.plot()
pyplot.show()
As well as change all the object's into dummy variables before using the code.
None works. Anyone any suggestions?