I am trying to display the number of days (prediction) out of range of linear regression. I have managed to display the data using matplotlib
and created a predict()
function to display the values based in the number of days but the data is outputting wrong, higher than the initial output. Since the straight line is aiming down and the higher value is 85.15
.
import area as area
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
from sklearn import linear_model
# date = x, cases = y
# 1- Retrieve the data
df = pd.read_csv("cases-challenge.csv")
x = df.days
y = df.cases
# 2- Iterate the data
# get the mean of x and y
xmean = x.mean()
ymean = y.mean()
df['diffx'] = xmean - x
df['diffx_squared'] = df.diffx ** 2
SSxx = df.diffx_squared.sum()
df['diffy'] = ymean - y
SSxy = (df.diffx * df.diffy).sum()
m = SSxx / SSxy
b = ymean - m * xmean
def predict(value):
# value = x
for i in range(value):
# straight line = m*x+y
print(i, ':', m * i + b)
predict(6)
plt.scatter(x, y)
plt.plot(x, m*x+b, 'r')
plt.show()
The input cases-challenge.csv
:
days, cases
18, 85.15
19, 83.5
20, 82.5
21, 80.73
22, 78.77
23, 77.86
24, 75.96
25, 75.85
26, 74.79
27, 72.79
output in terminal using 6
as value in the predict()
:
0 : 95.61300163132137
1 : 94.86531266992931
2 : 94.11762370853725
3 : 93.36993474714518
4 : 92.62224578575312
5 : 91.87455682436106
Matplotlib output:
question from:
https://stackoverflow.com/questions/65949119/how-to-get-prediction-days-out-of-range-of-linear-regression-using-python 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…