Abstract: Data obtained in biomedical research is often skewed. Examples include the incubation period of diseases like HIV/AIDS and the survival times of cancer patients. Such data, especially when they are positive and skewed, is often modeled by the log-normal distribution. If this model holds, then the log transformation produces a normal distribution. We consider the problem of constructing confidence intervals for the mean of the log-normal distribution. Several methods for doing this are known, including at least one estimator that performed better than Coxxs method for small sample sizes. We also construct a modified version of Coxxs method. Using simulation, we show that, when the sample size exceeds 30, it leads to confidence intervals that have good overall properties and are better than Coxxs method. More precisely, the actual coverage probability of our method is closer to the nominal coverage probability than is the case with Coxxs method. In addition, the new method is computationally much simpler than other well-known methods.
INTRODUCTION
Data obtained in biomedical research is often skewed. Examples include the incubation period of diseases like HIV and survival times of cancer patients. Since statistical inference based on the normal distribution is well known, an established way to deal with non-normal data is to apply a transformation that makes them normally distributed. The log transformation is one of those most commonly used for this purpose. It is especially recommended when the data come from a population that is positive and skewed. A variable X is said to have a log-normal distribution with parameters μ and σ2 if Y = logX is normally distributed with mean μ and variance σ2. In this case the mean θ of X is
We consider the problem of constructing confidence intervals for the mean θ of the log-normal distribution.
Zhou and GAO (1997) did simulations to construct and compare confidence intervals using the four accepted methods at that time. These are the naive method, Angus's conservative method, Angus's Parametric Bootstrap (PB) method and Cox's method. Their simulation study revealed that the naive method was wholly inappropriate and contrary to what one would expect, produced an increase in coverage area with an increase in sample size. When the sample size was fairly small (n = 11), coverage error was overall the smallest for the (PB) method but this result was only obtained when the variance was small. It was also found that the (PB) method was negatively biased.
Cox's method yielded confidence intervals that had comparatively the smallest coverage error for moderate sample sizes (as small as 50). However, unlike the (PB)'s method, Cox's method provided coverage error values that did not significantly increase as σ2 increased.
Wu et al. (2003) derived a modified signed log-likelihood ratio method that, for small samples of size less than 30, outperformed both Cox's method and the (PB) method for all the comparative criteria by Zhou and Gao (1997).
In the current study, we revisit this classical problem and derive a modified version of Cox's method to provide a more efficient estimator. Unlike all the previous papers mentioned, our approach will deal exclusively with samples of size greater than 30. The proposed estimator is compared with the existing methods via the three criteria used by Zhou and Goa (1997), coverage error, interval width and relative bias. We show that coverage error is smaller than any of the other methods, including the modified signed log-likelihood ratio method of Wu et al. (2003), for sample size n>30.
THE FIVE MAIN APPROACHES
Here, we review the five existing methods for constructing two-sided 1-α level confidence intervals for a log-normal mean θ. Let X1, X2,....,Xn be a random sample from a log-normal distribution with parameters μ and σ2, let Y = log Xi for I = 1,2,...n and let
be the mean of the log-normal.
The Naïve method: This method constructs a confidence interval for μ, the mean of the log-transformed data, using the normal theory as
Next an antilogarithm function is applied to transform the confidence limits back to the original scale to obtain a confidence interval for
For large n, this method leads to biased estimators.
Coxs method: One way to estimate θ is to estimate μ and σ2 and then to make use of the relationship
If we estimate μ and σ2 by the sample mean
Since
is an unbiased estimator of
it follows from a well-known theorem (Lehmann, 1983) that
Here we note that, at this point in their discussion of the relevant point estimators, Zhou and Gao (1997) made an error in stating that
is an unbiased estimator of
Although E(S2) = σ2, S4 is not an unbiased estimator of σ4. In fact, because
it can be easily shown that
Thus the correct unbiased estimator of Var(
This is also the UMVUE of |
Assuming approximate normality for
Angus's conservative method: Angus proposed a conservative method for construction of a confidence interval for lnθ based on the following approximate pivotal statistic:
(1) |
when the sample is finite however, (1) has the same distribution as:
(2) |
where, N and χ2 (n-1) are independent, N is the standard normal and 2(n-1) is a χ2(n-1) is a χ2- distribution with n-1 degrees of freedom. This leads (Zhou and Gao, 1997), to the lower and upper limits respectively of the (1-α) confidence interval for ln θ.
A parametric bootstrap method: The bootstrap interval described by Angus applies the parametric t-
(3) |
The unknown quantiles t0 and t1 can be estimated by a parametric bootstrap sample.
Modified signed log-likelihood ratio method: Wu et al. (2003) asserted that Cox's method did not perform well in small sample settings due its nonquadratic and asymmetric shape of the likelihood profile for small n. They instead considered the modified signed log-likelihood ratio introduced by Banndroff-Nielsen (1986 and 1991), generally known as the r* formula:
(4) |
where, u (Ψ) is a quantity and the general form of r* is given in the Appendix. r* being asymptotically distributed as a standard normal variate with third order accuracy. Therefore, an approximate 100 (1-α)% confidence interval based on r* is
(5) |
where unlike Cox's interval, this r* interval calculates the confidence limit from the observed asymmetric likelihood - based function r* (Ψ) which theoretically should have achieved a more accurate coverage probability than Cox's. The modified signed log-likelihood ratio method produced zero coverage errors and almost negligible average biases and both the coverage probabilities and average biases remained nearly constant as the variance increased.
THE IMPROVED COXS CONFIDENCE INTERVAL(C I) ESTIMATORS
Firstly, we note a simple fact regarding the concept of estimation. An estimator, say t of the parameter say, θ is examined for its efficiency on the following counts:
(6) |
Since
Where,
where,
Hence  = -v is an unbiased estimator of A. Further,
Hence
(7) |
Hence,
(8) |
and
(9) |
Therefore,
(10) |
Now, we are ready to propose our improved CIs, as follows:
Lets call the Coxs CI (Lower CI(:CI low) and Upper CI (:CI high) limits), as Estimator 1, i.e.,
As per our proposition the efficient CIs (Lower CI(CI low) and Upper CI(CI high) limits) for the Lognormal mean θ:
And
SIMULATION AND CONCLUSIONS
As mentioned in the beginning, we emulate the Simulation Framework of Zhou and Gao (1997) for the good reasons explained in their paper. Adopting the same structure of their simulation study, we also have carried out a simulation study in this section, of which the results are reported in the Tables in the Appendix.
Using 6000 samples (of illustrative sizes of 51, 101, 151, 201 and 301) from the relevant lognormal distribution with illustrative values of σ2: 1.00, 1.25, 1.50, 1.75 and 2.00 (assuming like in Land (1971), for the sake of simplicity of illustration and without any loss of generality, that the population mean μ = - σ2/2), we have calibrated the characteristics of the CIs : Coverage Probability (Cvg. Prob.), Coverage Error (Cvg. Error), Length of the CI (Length), Proportion/ Probability of cases of the CI not covering the true value of the actual population mean, when CIs are on the left/right of the true value of the population mean(Left/Right Bs., respectively) and hence the Relative Bias (Rel. Bs.).
The results of the simulation study are tabulated in the five tables given in the appendix, which deal with sample sizes of 51, 101, 151, 201 and 301.
Overall, the Estimator 2 performs better than Coxs method, in the sense that the achieved coverage probability is closer to the nominal probability of 0.90 (in other words, the coverage error is smaller)
Appendix: |
For n = 51 |