INTRODUCTION
The negative binomial random variable Y is a nonnegative discrete random
variable with probability mass function, given by:
The location parameter is μ and for a fixed μ, φ is the dispersion
parameter. Note that, if the negative binomial dispersion parameter φ is
allowed to become infinitely large, then the resulting distribution is the Poisson
distribution. The variance of the distribution is given by σ^{2}=μ+μ^{2}/φ.
Indeed, when φ is known, the negative binomial distribution with parameter
μ is a member of the exponential family. In this case,
is a complete, sufficient statistic and is a minimum variance unbiased estimator
for μ. As the parameter of interest here is φ, we will briefly discuss
some problems with the existing estimation methods of φ.
The negative binomial distribution (NB) may be viewed as a oneparameter distribution
where either μ or φ is unknown, or a twoparameter distribution,
where both μ and φ are unknown. The twoparameter distribution
is difficult to work with and when we can assume that φ is known, many
simplifications result. However, in most practical situations, both parameters
are unknown and need to be estimated. In particular, estimation of the negative
binomial dispersion parameter is extra difficult in the sense that most of the
commonly used estimators for such parameter are not well defined or do not exist
in the entire sample space.
For instance, when the sample mean is equal to the sample variance, the method of moments estimator becomes infinity whereas the maximum likelihood estimator ceases to exist when the sample variance is less than the sample mean.
The dispersion parameter φ is traditionally estimated by the maximum likelihood
or method of moments approaches. Notably, when the sample variance exceeds the
sample mean these methods give fairly good estimates for φ. On the other
hand, when the sample mean exceeds the sample variance (underdispersion), or
when the mean is only slightly smaller than the variance, problems do occur.
In the case of underdispersion, the method of moments estimation may yield
negative estimates and the maximum likelihood estimate does not even exist (Levin
and Reeds, 1977). Indeed, this fact was very frustrating to Pearson who
encountered a sample he believed was from a negative binomial distribution,
but was unable to estimate reasonably one of the parameters.
In the scenarios, when the sample variance is only slightly larger than the sample mean, the estimates for φ often turn out to be very large. For example, in simulation it is common for φ estimates to exceed 500 or 1000 when the true value of φ is 5.
These problems limit the usefulness of both methods and hence some other methods need to be considered in an effort to improve the estimation of φ. In this study, we attempt to estimate the dispersion parameter of the negative binomial distribution by combining the method of moments and maximum quasilikelihood estimators in a variety of different ways via appropriate weights.
Bayesian and empirical Bayesian methods were discussed by Walter
and Hamedani (1991) and AlSaleh and AlBatainah (2003).
We refer to Ghosh et al. (1983), Ahmed
(1991, 2000), Khan and Ahmed
(2006) for detailed studies on the combined estimation methods. In recent
years, many researchers used the negative binomial to model count data, including
Puig and Valero (2006), Greene (2008),
Jones et al. (2009) among others.
ESTIMATION METHODS
Based on the reviewed literature, we encountered more than ten different estimators
for the dispersion parameter φ. Here we describe few that are commonly
used.
Method of Moments Estimator (MME)
The simplest way to estimate the negative binomial parameters is by the
method of moments. By equating the sample mean
and the sample variance S^{2} to the corresponding population mean μ
and population variance σ^{2}=μ+μ^{2}/φ
and calculating the solutions with respect to μ and φ one can get:
Where:
The Eq. 2 reveals several apparent problems in estimating
φ which are listed below:
• 
If the sample variance equals the sample mean, then the estimator of φ
is not defined 
• 
If the sample variance is less than the sample mean, then the estimator
of φ is negative 
• 
If the sample variance is slightly more than the sample mean then the
estimator of φ is very large 
In order to illustrate the behavior of and S^{2} for the NB, we generated random samples, each of size 50, from a negative binomial distribution with parameters μ = 1 and φ = 3. The following graph is a scatter plot of S^{2} versus for 2000 such samples. The points falling above the line (S^{2} = ) are from overdispersed samples and hence their estimates for φ can be found using MME. The points falling below the line (S^{2} = ) represent underdispersed samples, in which the MME of φ is negative. Based on 500,000 Monte Carlo replicates of the above scenario, the empirical probability of getting an underdispersed sample is 12.42%. This shows that although the underlying model may be clearly over dispersed (1 = μ<σ = 3), for small sample sizes, there is a good chance of getting under dispersed samples which lead to negative MME’s.
The expression for the largesample variance for MME of φ was given by
Anscombe (1950) as follows:
Maximum Likelihood Estimator (MLE)
The maximum likelihood estimates are derived by setting the partial derivative
of the loglikelihood function equal to zero and solving the resulting estimating
equations with respect to the parameters μ and φ. The maximum likelihood
estimator of μ is and
the maximum likelihood estimator of φ (say, )
is the solution of the following Equation:
where, n is the sample size, n_{1} is the number of ones in the sample,
n_{2} is the number of twos in the sample and so on. The above equation
has a unique solution provided that <S^{2}.
These estimators have been considered by Piters et al.
(1977), Clark and Perry (1989), Walter
and Hamedani (1991) and others.
The aforementioned relation can be rewritten as:
Unfortunately, there is no closed form for
and Eq. 5 must be solved iteratively for the ML estimate ;
the NewtonRaphson or scoring methods are efficient and convenient procedures
for doing so. Having said that, if the initial values of the parameters are
poor then the iteration may be very slow and costly. Note that Eq.
5 will have a unique solution provided that S^{2}>.
For a proof of the existence of at least one possible solution of φ (Anscombe,
1950) and for a proof of uniqueness of the solution (Levin
and Reeds, 1977).

Fig. 1: 
Sample variance plotted versus sample mean for 2000 Monte
Carlo samples each of size n = 50 from a negative binomial distribution
with μ = 1 and σ = 3 
However, for underdispersed samples, the maximum likelihood estimate for
φ does not exist at all. The loglikelihood function behaves asymptotically,
reaching no maximum when the sample variance is less than the sample mean. Hence,
no finite maximum likelihood estimate is attainable (Levin
and Reeds, 1977). Note that if
= 0, then (S^{2} =
= 0) and the MLE does not exist and Eq. 5 is of no use. Even
when S^{2}>
there are other problems in estimating φ. For example, results from simulation
showed large upward biases and large mean squared errors for the MLE. For the
points in Fig. 1 falling just above the line (S^{2}
=),
it can be noted that these samples yield very large estimates for φ when
either the MME or MLE are used.
Piegorsch (1990) and Anraku and
Yanagimoto (1990), following the study of Clark and
Perry (1989), suggested reparametrization by (α = 1/φ). Then
α was estimated by the maximum likelihood approach. Piegorsch
(1990) used the following form of the negative binomial probability mass
function:
The ML estimating equations are:
The second equation in (7) reduces to:
The ML estimate of μ is
and the solution to
at gives
the MLE of α. Such a solution is numerically available via a nonlinear
rootfinder. By estimating φ through its reciprocal (α = 1/φ)
we can avoid the problem posed by infinite values of the MLE when S^{2}
= .
For a comprehensive treatment on the existence and uniqueness of the MLE, we
refer to Aragan et al. (1992), Willson
et al. (1986) and Ferreri (1997).
Extended Maximum QuasiLikelihood Estimator (MQLE)
The MQLE for the negative binomial parameter φ was first obtained by
Clark and Perry (1989). The MQLE
is obtained by solving the following estimating equations simultaneously:
The first equation leads to the MQLE of μ, .
The MQLE of α,
is obtained by solving the following equation iteratively and replacing by
:
This procedure protects from getting infinite values of
when the sample mean is close to the sample variance (Ross
and Preece, 1985), however, negative
's were encountered in the simulation.
Other Methods of Estimation
Several alternative methods were suggested in the literature to estimate
the negative binomial dispersion parameter φ. We present a brief description
for some of these methods.
The zeroclass estimator considered by Piters et al.
(1977), can be derived by equating the observed and the expected number
of zero values among the y′s (sample).
Willson et al. (1984) used a multistage process
to estimate φ. As a few of the estimates for φ (in the MME) are extremely
poor, such estimates would change dramatically if a few more observations were
added to the sample. This procedure continues to add observations to the sample
until the estimate of φ begins to converge in some specified manner. However,
these good results are often obtained by reaching very large sample sizes in
the multistage sampling. It should be noted that for some samples this procedure
fails to produce an estimate of φ (positive and finite) because of underdispersion.
Anscombe (1949, 1950) suggested the digamma function estimator using an iterative
method based on the following transformation:
Anraku and Yanagimoto (1990) maximized a conditional
likelihood function to estimate α. They treated the underdispersed samples
as Poisson samples. In these samples,
was defined as infinity and
was defined to be zero. It can be shown by simulation that the chance of misclassifying
a negative binomial sample as Poisson is large.
The use of the large likelihood estimator LLE was suggested by Shenton
and Bowman (1967). Steiner (1993) applied the LLE
to the negative binomial dispersion parameter φ. The LLE is obtained by
setting the partial derivatives of the usual MLE to a small positive constant
(say c = 0.13) rather than zero and solving for φ.
Another estimator for φ can be found using the reweighed estimation by
enlarging the sample variance by increasing the frequency (weight) of the smallest
and largest observations that produce the largest increase in the ratio S^{2}/.
The use of the reweighed estimation for the negative binomial dispersion parameter
was suggested by Steiner. The repeated application of the technique will usually
give .
However, it can be shown by simulation that some samples cannot be overdispersed.
Further, another problem involved in this method is when to stop the reweighing
process.
Some Concluding Remarks
From the above existing methods of estimators, it is observed that the digamma
and zeroclass estimators have no redeeming features as compared to the MME
and MLE and have substantial disadvantages. These two methods are now rarely
used.
The MME and MLE for the parameter φ appear to have similar characteristics.
For small sample sizes, the MME is recommended and when μ≥φ the
MLE is recommended. When the sample size is adequate (20 and above) and α
= 1/φ is not very small, then the MQLE appears to be slightly more accurate
than MLE. The MQLE is recommended when S^{2} is close to .
In most cases, using the multistage estimator produces better estimators for φ when there is no restriction on the sample size (cost) since the results in the multistage procedure are often obtained by reaching very large sample sizes.
The LLE performs well in most parameter combinations. However, there are some difficulties in the iteration convergence, particularly in the case of underdispersed samples with sample size less than 150. This forces the researchers to increase the sample size, which may not be always possible.
It is noted that when the sample variance is greater than the sample mean, the MME is easy to calculate and will have a small mean squared error (MSE). On the other hand, when S^{2} is close to , the MQLE performs better than both MME and MLE. This suggests that perhaps two or more estimators can be combined in the hope to get a better estimation strategy.
SEVERAL COMBINED ESTIMATORS
Suppose that
and
are two estimators for θ. Our problem is to combine
and
to obtain a single improved estimator for θ. There are two ways for combining
such estimators. The first method is linear combination estimation. In general,
for a fixed number w such that wε(0,1), the linear combination estimator
is defined as:
The combined estimators can be further improved by finding the optimal weight.
Probably, fixed weight can be replaced by random weight, resulting in preliminary
and shrinkagetype estimators. We refer to Ghosh et al.
(1983), Ahmed (1991, 2000),
Khan and Ahmed (2006) for detailed studies on this topic.
Generally speaking, the shrinkage estimator based on optimal weight dominates
the classical estimator in multisample situation. The second method of combining
is the binary choice and is essentially classical NeymanPearson hypothesis
testing.
Here, we introduce several combined estimators by choosing either the MME or
the MQLE depending on the outcome of a test of the null hypothesis H_{0}:σ^{2}/μ
= b against the alternative H_{0}:σ^{2}/μ>b where
b>0 is a prespecified constant. More detail about the use of combined estimators
can be found in Efron and Morris (1973, 1975).
Combined Estimator 1 (Comb 1)
To estimate the dispersion parameter α = 1/φ of the negative binomial,
let _{MME}
and
_{MQLE} be the MME and MQLE of α, respectively. Then the combined
estimator for α depending on the variance test (VT) or the index of dispersion
test (Karlis and Xekalaki, 2000) for more details is
given by:
Where:
b is a nonnegative prespecified constant and I(A) is one if A is true and zero otherwise.
Note that the MQLE is not defined for some samples and the MME is defined for all cases. To avoid this difficulty, we define the following estimator, which always exists.
Combined Estimator 2 (Comb 2)
Further, we define the following combined estimators (Justification is given later).
Combined Estimator 3 (Comb 3)
Combined Estimator 4 (Comb 4)
Combined Estimator 5 (Comb 5)
which can be simplified as:
In the above estimators, we are trying to improve the combination of _{MME} and _{MQLE} by using different weights. In _{3}^{C}, the use of 40 is suggested by simulation. We used different fixed weights in the simulation and it turned out that the use of 40 produced a good estimator. In _{4}^{C}, we use the critical value of the variance test instead. In _{5}^{C}, we use the variances of the estimators to find an improved combined estimator.
In the following section, an extensive Monte Carlo simulation study is carried out to investigate the behavior of the suggested estimators. We numerically compare the biases and MSE's of the _{MME} and _{MQLE} with those of our combined estimators.
SIMULATION STUDY
To compare the various estimators, we use the Mean Squared Error (MSE) criterion
to evaluate the proposed estimators. The MSE is defined as:
where,
is an estimate of θ. Further, the Relative Efficiency (RE) of an estimator
with respect to another estimator
is defined by:
A simulation study is carried out by taking samples of size n = 30, 50, 100,
150 from negative binomial distribution with parameters μ and φ, where
μ = (1, 3, 5, 10, 20) and φ = (1, 2,..., 10, 15, 20). Each simulation
is repeated 2000 times. Note that the underdispersed samples (S^{2}<
)
were discarded from the simulation study. More importantly, only a few samples
were discarded in the simulation with a combined estimator because in most of
the underdispersed samples, the MQLE is selected which has a valid estimator
in most cases (Clark and Perry, 1989). The performance
of the combined estimator
_{1}^{C} depends on the value of the constant b. The optimal
choice of b is still an open question problem for future research. In our simulation
experiment we consider the choice b = 2. Following Piegorsch
(1990), the restriction ,
where y_{max} is the largest vale of y was applied for all estimates
to guarantee the existence of the maximum number of estimates.
The ML estimating equations were solved using Splus. The number of replications for each parameter configuration was initially varied and it was determined that 2000 simulations were adequate because a further increase in the number of replications did not significantly change the results. Thus, in each case, 2000 samples were randomly generated. We compared the simulated mean bias, Simulated Mean Squared Error (SMSE) for each estimator based on 2000 simulations. Moreover, the Simulated Relative Efficiencies (SRE) of the estimators were calculated by:
where, the SMSE is obtained by:
If the convergence of the estimating equation were not achieved, then the sample was discarded. Again, note that the convergence of the estimating equations is sensitive to the initial values provided to the Splus. The number of rejected samples depends on the values of the parameters μ and α as well as on the sample size n. The number of rejected samples decreases as the values of the parameter μ and α increase. The number of rejected samples increases as the sample size n increases. Further, the number of the rejected samples declined to zero for large values of α, μ and for fixed n.
The simulated relative efficiency results are presented in Fig.
25 and the mean bias results are presented in Fig.
69.

Fig. 2: 
Simulated relative efficiency of the estimators of the dispersion
parameter for n = 30. (a) for mu = 1, (b) for mu = 3, (c) for mu = 5, (d)
for mu = 10, (e) for mu = 20 

Fig. 3: 
Simulated relative efficiency of the estimators of the dispersion
parameter for n = 50. (a) for mu = 1, (b) for mu = 3, (c) for mu = 5, (d)
for mu = 10, (e) for mu = 20 

Fig. 4: 
Simulated relative efficiency of the estimators of the dispersion
parameter for n = 100. (a) for mu = 1, (b) for mu = 3, (c) for mu = 5, (d)
for mu = 10, (e) for mu = 20 

Fig. 5: 
Simulated relative efficiency of the estimators of the dispersion
parameter for
n = 150. (a) for mu = 1, (b) for mu = 3, (c) for mu = 5, (d) for mu = 10,
(e) for mu = 20 

Fig. 6: 
Mean bias of the combined estimators of the dispersion parameter
for n = 30. (a) for mu = 1, (b) for mu = 3, (c) for mu = 5, (d) for mu =
10, (e) for mu = 20 

Fig. 7: 
Mean bias of the combined estimators of the dispersion parameter
for n = 50. (a) for mu = 1, (b) for mu = 3, (c) for mu = 5, (d) for mu =
10, (e) for mu = 20 

Fig. 8: 
Mean bias of the combined estimators of the dispersion parameter
for n = 100. (a) for mu = 1, (b) for mu = 3, (c) for mu = 5, (d) for mu
= 10, (e) for mu = 20 

Fig. 9: 
Mean bias of the combined estimators of the dispersion parameter
for n = 150. (a) for mu = 1, (b) for mu = 3, (c) for mu = 5, (d) for mu
= 10, (e) for mu = 20 
RESULTS AND DISCUSSION
Based on our simulation study, we make the following observations from Fig.
69:
In terms of mean biases, we note that the mean biases for the MME estimators of α = 1/φ are negative in most of the parameter space and that the magnitude of biases decrease as the sample sizes increase. This is true for all of the proposed estimators. We see that the biases of the MQLE and the _{5}^{C} estimator have the same sign and the biases of the _{3}^{C} and the _{4}^{C} estimators have the same sign as well.
For fixed n and α, the mean bias decreases as μ increases and for
fixed μ and α, the mean bias decreases as n increases. For fixed μ
and n, the mean bias decreases as α decreases (that is, φ increases).
Further, Fig. 69 reveal that as n increases,
the biases of all the proposed estimators become more stable. For (φ
>3), all estimators have biases close to each other.
The combined estimators, especially _{5}^{C}, significantly reduce the mean bias of the MQLE estimators. The MME has smaller bias than the other estimators for small μ and large α.
In terms of relative efficiency, with respect to the MME, we make the following
observations from Fig. 25:
The relative efficiencies for the MQLE and all the combined estimators increases as the sample size n increases.
All combined estimators are more efficient than the MMW. The first combined estimator _{1}^{C} has a relative efficiency greater than both the MME and the MQLE in most of the parameter space and has a relative efficiency as good as MQLE in the rest of the parameter space. For small μ and small α, the MME is better than both MQLE and _{1}^{C} estimators. For (n>50), the second combined estimator, _{2}^{C} has less relative efficiency than the _{1}^{C}, but it improves over the _{1}^{C} in the cases when the _{1}^{C} has better performance than _{3}^{C}. The third combined estimator ( _{3}^{C}) and the fourth combined estimator ( _{4}^{C}), have similar performances to the first combined estimator. The fifth combined estimator ( _{5}^{C}) is the best among all the proposed estimators in the entire parameter space except for small values of α (large φ) and for very small μ (μ≤1) where, _{3}^{C} has a very close performance to that of the MME.
As α decreases, all the proposed estimators have a performance very close
to that of the MME. The difference between the estimators can be seen clearly
for large α ( Fig. 25).
For fixed μ and n, the relative efficiency of all the combined estimators increase as α increases. Also, for fixed n and α, the relative efficiency of all the combined estimators increase as μ increases.
In summary, except for μ≤1 and small α, the combined estimator,
_{5}^{C} is superior to all other estimators studied here. For
small μ (μ≤1) and small α, the MME will be a sensible choice.