Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
878 views
in Technique[技术] by (71.8m points)

python - Fitting For Discrete Data: Negative Binomial, Poisson, Geometric Distribution

In scipy there is no support for fitting discrete distributions using data. I know there are a lot of subject about this.

For example if i have an array like below:

x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]

I couldn't apply for this array:

from scipy.stats import nbinom
param = nbinom.fit(x)

But i would like to ask you up to date, is there any way to fit for these three discrete distributions and then choose the best fit for the discrete dataset?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use Method of Moments to fit any particular distribution.

Basic idea: get empirical first, second, etc. moments, then derive distribution parameters from these moments.

So, in all these cases we only need two moments. Let's get them:

import pandas as pd
# for other distributions, you'll need to implement PMF
from scipy.stats import nbinom, poisson, geom

x = pd.Series(x)
mean = x.mean()
var = x.var()
likelihoods = {}  # we'll use it later

Note: I used pandas instead of numpy. That is because numpy's var() and std() don't apply Bessel's correction, while pandas' do. If you have 100+ samples, there shouldn't be much difference, but on smaller samples it could be important.

Now, let's get parameters for these distributions. Negative binomial has two parameters: p, r. Let's estimate them and calculate likelihood of the dataset:

# From the wikipedia page, we have:
# mean = pr / (1-p)
# var = pr / (1-p)**2
# without wiki, you could use MGF to get moments; too long to explain here
# Solving for p and r, we get:

p = 1 - mean / var  # TODO: check for zero variance and limit p by [0, 1]
r = (1-p) * mean / p

UPD: Wikipedia and scipy are using different definitions of p, one treating it as probability of success and another as probability of failure. So, to be consistent with scipy notion, use:

p = mean / var
r = p * mean / (1-p)

END OF UPD

Calculate likelihood:

likelihoods['nbinom'] = x.map(lambda val: nbinom.pmf(val, r, p)).prod()

Same for Poisson, there is only one parameter:

# from Wikipedia,
# mean = variance = lambda. Nothing to solve here
lambda_ = mean
likelihoods['poisson'] = x.map(lambda val: poisson.pmf(val, lambda_)).prod()

Same for Geometric distribution:

# mean = 1 / p  # this form fits the scipy definition
p = 1 / mean

likelihoods['geometric'] = x.map(lambda val: geom.pmf(val, p)).prod()

Finally, let's get the best fit:

best_fit = max(likelihoods, key=lambda x: likelihoods[x])
print("Best fit:", best_fit)
print("Likelihood:", likelihoods[best_fit])

Let me know if you have any questions


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...