Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
297 views
in Technique[技术] by (71.8m points)

python 3.x - Issues fitting SIR model with curve_fit

I am trying to develop a fit for a given SIR model. Following the suggestion from various sources, notably https://scipython.com/book/chapter-8-scipy/additional-examples/the-sir-epidemic-model/ and https://stackoverflow.com/a/34425290/3588242, I put together a testing code as I was having huge difficulties fitting anything coherently with my model.

It is shown below.

Basically, I generate data that I know fit my model, by definition of how they're generated. This gives me a infectious compartment over time (ydata and xdata). I then want to go back to the parameters (beta, gamma, mu) I used to generate the data, using the scipy function curve_fit.

However, it does not work, and I'm very unclear as to why it does not seem to do anything.

import numpy as np
from scipy import integrate, optimize
import matplotlib.pyplot as plt


# The SIR model differential equations.
def sir_model(y, t, N, beta, gamma, mu):
    S, I, R = y
    dSdt = -beta * S * I / N + mu * I
    dIdt = beta * S * I / N - gamma * I
    dRdt = gamma * I
    return dSdt, dIdt, dRdt


# The fit integration.
def fit_odeint(x, beta, gamma, mu):
    return integrate.odeint(sir_model, (S0, I0, R0), x, args=(N, beta, gamma, mu))[:,1]


if __name__ == "__main__":

    ########################################
    # Get data to be fitted from the model #
    ########################################
    # Total population, N.
    N = 1
    # Initial number of infected and recovered individuals, I0 and R0.
    I0, R0 = 1e-6, 0
    # Everyone else, S0, is susceptible to infection initially.
    S0 = N - I0 - R0
    # Contact rate, beta, and mean recovery rate, gamma, (in 1/deltaT).
    beta, gamma, mu = -0.001524766068089,   1.115130184090387,  -0.010726414041332
    # A grid of time points (in deltaT increment)
    t = np.linspace(0, 305, 306)

    # Initial conditions vector
    y0 = S0, I0, R0
    # Integrate the SIR equations over the time grid, t.
    ret = integrate.odeint(sir_model, y0, t, args=(N, beta, gamma, mu))
    S, I, R = ret.T

    # Plot the data on three separate curves for I(t) and R(t)
    fig = plt.figure(facecolor='w')
    ax = fig.add_subplot(111, facecolor='#dddddd', axisbelow=True)
    # ax.plot(t, S/N, 'b', alpha=0.5, lw=2, label='Susceptible')
    ax.plot(t, I/N, 'r', alpha=0.5, lw=2, label='Infected')
    ax.plot(t, R/N, 'g', alpha=0.5, lw=2, label='Recovered with immunity')
    ax.set_xlabel('Time /days')
    ax.set_ylabel('Number (1000s)')
    # ax.set_ylim(0,1.2)
    ax.yaxis.set_tick_params(length=0)
    ax.xaxis.set_tick_params(length=0)
    ax.grid(b=True, which='major', c='w', lw=2, ls='-')
    legend = ax.legend()
    legend.get_frame().set_alpha(0.5)
    for spine in ('top', 'right', 'bottom', 'left'):
        ax.spines[spine].set_visible(False)
    plt.show()

    #####################################
    # Fit the data using the same model #
    #####################################

    # Define the "experimental" data
    ydata = I
    xdata = t

    # Define the initial conditions vector, note that this will be equal to y0 above.
    I0, R0 = ydata[0], 0
    S0 = N - I0 - R0

    # Try with default p0 = [1, 1, 1] (most of the times I will have no clue where the values lie)
    print('p0 default values...')
    popt, pcov = optimize.curve_fit(fit_odeint, xdata, ydata, bounds=([-0.1, 0., -np.inf], np.inf))
    psig = np.sqrt(np.diag(pcov))
    print(f'beta : {popt[0]} (+/- {psig[0]})
gamma: {popt[1]} (+/- {psig[1]})
mu   : {popt[2]} (+/- {psig[2]})

')

    # Try with p0 close to the known solution (kind of defeats the purpose of curve fit if it's too close...)
    print('p0 close to the known solution...')
    popt, pcov = optimize.curve_fit(fit_odeint, xdata, ydata, bounds=([-0.1, 0., -np.inf], np.inf), p0=[-0.01,1.,-0.01])
    psig = np.sqrt(np.diag(pcov))
    print(f'beta : {popt[0]} (+/- {psig[0]})
gamma: {popt[1]} (+/- {psig[1]})
mu   : {popt[2]} (+/- {psig[2]})

')

    # Try with p0 equal to the known solution (kind of defeats the purpose of curve fit if it's too close...)
    print('p0 equal to the known solution...')
    popt, pcov = optimize.curve_fit(fit_odeint, xdata, ydata, bounds=([-0.1, 0., -np.inf], np.inf), p0=[-0.001524766068089, 1.115130184090387, -0.010726414041332])
    psig = np.sqrt(np.diag(pcov))
    print(f'beta : {popt[0]} (+/- {psig[0]})
gamma: {popt[1]} (+/- {psig[1]})
mu   : {popt[2]} (+/- {psig[2]})

')

This code gives me the correct expected plot, and then:

p0 default values...
beta : 0.9 (+/- 13.202991356641752)
gamma: 1.0 (+/- 13.203507667858469)
mu   : 1.0 (+/- 50.75556985555176)


p0 close to the known solution...
beta : -0.01 (+/- 2.0204502661168218)
gamma: 1.0 (+/- 2.0182998608106186)
mu   : -0.01 (+/- 7.701149479142956)


p0 equal to the known solution...
beta : -0.001524766068089 (+/- 0.0)
gamma: 1.115130184090387 (+/- 0.0)
mu   : -0.010726414041332 (+/- 0.0)

So it would seem that curve_fit is like, well, that's close enough, let's stop, after doing barely anything and refusing to give it a shot. This may be due to an epsilon value somewhere (absolute versus relative or something like that, I encountered that issue with scipy previously for a different solver). However, the documentation of curve_fit doesn't seem to mention anything about being able to change the tolerance. Or this may be due to me misunderstanding how something in that code works.

If anyone has any suggestions, I would love that.

I tried looking into the lmfit package with little success getting it to work (I get some ValueError a.any() blabla error that I can't figure out).

EDIT:

I looked around and decided to use a stochastic evolution algorithm (differential evolution) to get a better initial guess automatically. The code works well, but again, curve_fit is simply not doing anything and returning the initial guess back.

import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize, integrate


# The SIR model differential equations.
def sir_model(y, t, N, beta, gamma, mu):
    S, I, R = y
    dSdt = -beta * S * I / N + mu * I
    dIdt = beta * S * I / N - gamma * I
    dRdt = gamma * I
    return dSdt, dIdt, dRdt


# The fit integration.
def fit_odeint(x, beta, gamma, mu):
    return integrate.odeint(sir_model, (S0, I0, R0), x, args=(N, beta, gamma, mu))[:,1]


# function for genetic algorithm to minimize (sum of squared error)
# bounds on parameters are set in generate_Initial_Parameters() below
def sumOfSquaredError(parameterTuple):
    return np.sum((ydata - fit_odeint(xdata, *parameterTuple)) ** 2)


def generate_Initial_Parameters():
    parameterBounds = []
    parameterBounds.append([-0.1, 10.0])  # parameter bounds for beta
    parameterBounds.append([-0.1, 20.0])  # parameter bounds for gamma
    parameterBounds.append([-0.1, 0.1])  # parameter bounds for mu

    # "seed" the numpy random number generator for repeatable results
    result = optimize.differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x


if __name__ == "__main__":

    ########################################
    # Get data to be fitted from the model #
    ########################################
    # Total population, N.
    N = 1
    # Initial number of infected and recovered individuals, I0 and R0.
    I0, R0 = 1e-6, 0
    # Everyone else, S0, is susceptible to infection initially.
    S0 = N - I0 - R0
    # Contact rate, beta, and mean recovery rate, gamma, (in 1/deltaT).
    beta, gamma, mu = -0.001524766068089,   1.115130184090387,  -0.010726414041332
    # A grid of time points (in deltaT increment)
    t = np.linspace(0, 305, 306)

    # Initial conditions vector
    y0 = S0, I0, R0
    # Integrate the SIR equations over the time grid, t.
    ret = integrate.odeint(sir_model, y0, t, args=(N, beta, gamma, mu))
    S, I, R = ret.T

    # Plot the data on three separate curves for I(t) and R(t)
    fig = plt.figure(facecolor='w')
    ax = fig.add_subplot(111, facecolor='#dddddd', axisbelow=True)
    # ax.plot(t, S/N, 'b', alpha=0.5, lw=2, label='Susceptible')
    ax.plot(t, I/N, 'r', alpha=0.5, lw=2, label='Infected')
    ax.plot(t, R/N, 'g', alpha=0.5, lw=2, label='Recovered with immunity')
    ax.set_xlabel('Time /deltaT')
    ax.set_ylabel('Number (normalized)')
    # ax.set_ylim(0,1.2)
    ax.yaxis.set_tick_params(length=0)
    ax.xaxis.set_tick_params(length=0)
    ax.grid(b=True, which='major', c='w', lw=2, ls='-')
    legend = ax.legend()
    legend.get_frame().set_alpha(0.5)
    for spine in ('top', 'right', 'bottom', 'left'):
        ax.spines[spine].set_visible(False)
    plt.show()

    #####################################
    # Fit the data using the same model #
    #####################################

    # Define the "experimental" data
    ydata = I
    xdata = t

    # Define the initial conditions vector, note that this will be equal to y0 above.
    I0, R0 = ydata[0], 0
    S0 = N - I0 - R0

    # generate initial parameter values
    initialParameters = generate_Initial_Parameters()

    # curve fit the test data
    fittedParameters, pcov = optimize.curve_fit(fit_odeint, xdata, ydata, p0=tuple(initialParameters),
                                                bounds=([-0.1, 0., -np.inf], np.inf))

    # create values for display of fitted peak function
    b, g, m = fittedParameters
    ret = integrate.odeint(sir_model, y0, t, args=(N, b, g, m))
    S, I, R = ret.T

    plt.plot(xdata, ydata)  # plot the raw data
    plt.plot(xdata, I, linestyle='--')  # plot the equation using the fitted parameters
    plt.show()

    psig = np.sqrt(np.diag(pcov))
    print('Initial parameters:')
    print(f'beta : {initialParameters[0]}
'
          f'gamma: {initialParameters[1]}
'
          f'mu   : {initialParameters[2]}

')
    print('Fitted parameters:')
    print(f'beta : {fittedParameters[0]} (+/- {psig[0]})
'
          f'gamma: {fittedParameters[1]} (+/- {psig[1]})
'
          f'mu   : {fittedParameters[2]} (+/- {psig[2]})

')

This gives me, along with the correct and expected figures:

Initial parameters:
beta : -0.039959661364345145
gamma: 1.0766953272292845
mu   : -0.040321969786292024


Fitted parameters:
beta : -0.039959661364345145 (+/- 5.6469679624489775e-12)
gamma: 1.0766953272292845 (+/- 5.647099056919525e-12)
mu   : -0.040321969786292024 (+/- 5.720259134770649e-12)

So curve_fit did almost absolutely nothing. It's not shocking in this case since the initial guess is pretty good.

But since what I care about are the parameters, I am "surprised" by the apparent uselessness of curve_fit in my case, I was expecting more from it. It likely stems from a misunderstanding on my part of what it actually does though. I will note that as expected, if the initial guess parameters bounds are large, the genetic algorithm has trouble finding the minimum and this has a heavy computational cost.

Any enlightenment or suggestions welcome!

question from:https://stackoverflow.com/questions/65943023/issues-fitting-sir-model-with-curve-fit

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...