HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2011 | Volume: 11 | Issue: 3 | Page No.: 528-534
DOI: 10.3923/jas.2011.528.534
Confidence Interval for Locations of Non-kurtosis and Large Kurtosis Leptokurtic Symmetric Distributions
Faris M. Al-Athari

Abstract: This study shows that the Student’s t confidence interval for the normal distribution is not the right estimator for locations of non-kurtosis and large kurtosis leptokurtic symmetric distributions and proposes three alternative confidence intervals based on robust estimators which are analogous to the Student's t confidence interval. These estimators were tried on 40000 simulated samples of sizes 20(10)50,100 from, three different non-kurtosis, normal and two different large kurtosis leptokurtic, symmetric distributions. These confidence intervals have been studied with regard to both robustness of validity (the error rate) and the robustness of efficiency (the length of the interval) and compared with each other and with Student's t confidence interval by calculating the coverage probability, the average width, the robustness bounds and the margin of error. The best confidence interval has been selected to have the smallest average width within the confidence intervals that have coverage probabilities not less than the lower bound of the robustness. MAD t confidence intervals followed by Sps t are found to be the strongest while Student's t and Downton's t are not preferred at all.

Fulltext PDF Fulltext HTML

How to cite this article
Faris M. Al-Athari , 2011. Confidence Interval for Locations of Non-kurtosis and Large Kurtosis Leptokurtic Symmetric Distributions. Journal of Applied Sciences, 11: 528-534.

Keywords: leptokurtic distribution, confidence interval, median absolute deviation, Coverage probability, pseudo standard deviation and sample median

INTRODUCTION

The Studen’'s t confidence interval for one population mean is derived under the assumption that the sample was randomly selected from a normal population with unknown standard deviation. However, it is clear that in practice, there are many researchers use Student's t statistic in testing hypotheses or constructing confidence intervals regardless of sample size and distribution shape.

They rely on the idea that the t- test or the Student’s t confidence interval is robust against the moderate deviations from the normality assumption except for extremely skewed populations; (Bartlett, 1935; Bradley, 1980; Pearson and Please, 1975; Pocock, 1982). Pearson and Please (1975) established that skewness had an effect on Type I error rates and hence on the nominal confidence level. Chaffin and Rhiel (1993) found that if both skewness and kurtosis are present in the distribution they will have a great impact on the distribution of Student t statistic. Chami et al. (2007) derived a modified version of Cox’s method to provide a more efficient confidence interval for the mean of the log-normal distribution than the available five estimators. They used the same comparative criteria of this study, coverage probability and interval width. Johnson (1978) proposed a modification of the Student's t confidence interval for skewed distribution. Since 1978, several other methods have been proposed in the literature by several researchers, among them Baklizi (2007, 2008), Gross (1976), Hettmansperger and Mckean (1998), Kibria (2006), Kleijnen et al. (1986), Meeden (1999), Shi and Kibria (2007) and Willink (2005).

Rhiel and Chaffin (1996) suggested using Student’s t statistic regardless of sample size when the population distribution is symmetric in the sense that the coverage probability is close to the nominal significance level. Wardrop (1994) reported a similar research study of confidence intervals for the mean. He compared the proportion of 5000 simulated intervals that contained the population mean to the nominal confidence level and suggested to use the Student's t confidence interval when the population is symmetric.

After all, the Student’s t confidence interval may not be appropriate model even if the population distribution is symmetric. A case of considerable practical interest is one in which the observations follow a long-tailed, a contaminated normal, a large kurtosis or heavy tails without mean or variance symmetric distributions. These observations might have a strong influence on the width and the coverage probability of the Student’s t confidence interval. Abu-Shawiesh et al. (2009) proposed four methods of confidence intervals for the mean of contaminated normal distribution, namely, Student’s t, Sps t, Downton’s t and MAD t confidence intervals. They compared the performance of these intervals, by a simulation study, to check how the presence of extreme values affects the confidence intervals. The performance was based on coverage probabilities and widths of the intervals. They found that intervals based on Sps t and MAD t confidence intervals are better than the others, which is in support to this study. Gross (1976) examined the case of confidence interval robustness in long-tailed, (Gaussian and longer), symmetric distributions and showed that it has a great effect on the distribution of Student t statistic.

This study restricted itself, for the most part, to examining the impact of no Kurtosis and large Kurtosis distributions on the distribution of Student t statistic and proposing three alternative interval estimators, for example, Sps t, MAD t and Downton’s t confidence intervals for estimating the location parameters of non-kurtosis and large Kurtosis leptokurtic symmetric distributions that have not been considered by the pre-mentioned authors.

The objective of the study was to observe the performance of the proposed confidence intervals and compare them with Student’s t confidence interval to determine the best method that has smallest average width within the confidence intervals that have coverage probabilities not less than the lower bound of the robustness. Such investigations are carried out by a simulation study.

The coverage probability and the average width for each confidence interval are simulated for 40000 samples from standard normal, standard double exponential, standard double translated Pareto of infinite excess kurtosis, double translated Pareto of no mean, variance and kurtosis defined, standard Cauchy and double translated Pareto of mean zero and infinite variance distributions. Then the best confidence interval is selected.

THE PROPOSED CONFIDENCE INTERVALS

There is several confidence intervals exist for the location parameter in the literature. However, the author will consider the following four intervals.

Student's t confidence interval: Let X1, X2,...,Xn be a random sample from a normal population with mean μ and standard deviation σ. Then the statistic was given by Student (1908) has a t-distribution with (n-1) degrees of freedom. The 100(1–α)% confidence interval for μ often have the form When the population distribution is non-normal, then the sample mean is approximately normally distributed with a mean μ and standard deviation provided that n is large and hence the distribution of Student's t statistic converges to standard normal distribution and will be the approximate confidence interval for μ, where zα is the upper α percentage point of the standard normal distribution. Rhiel and Chaffin (1996) suggested using t critical values instead of z critical values when the distribution is symmetric or slightly skewed in order to get coverage probabilities closer to the nominal confidence levels than do z critical values. Thus, the 100(1–α)% confidence interval for μ is where is the upper α percentage point of the Student's t distribution with (n-1) degrees of freedom. That is

The Spst confidence interval: This confidence interval is constructed by using the fact that the sample median (MD) of a normally distributed random variable with mean μ and standard deviation σ is asymptotically normally distributed with mean μ and standard error and the fact that the pseudo- standard deviation Sps based on the interquartile range, IQR, is unbiased estimator of σ (Mood et al., 1974; Staudte and Sheather, 1990). Therefore the Spst confidence interval for the location parameter μ is defined by:

(1)

where,

(2)

Downtown’s t confidence interval: Downton (1966) proposed σ* as an estimator for the standard deviation, σ of a normal population. The Downton estimator, σ* is given by is Gini's mean difference.

(3)

where,

(4)

The Downton estimator, σ*, has been recommended as robust scale estimator for σ by many authors, for example ( Iglewicz, 1983; Barnett et al., 1967; Sarhan and Greenberg,1962). Therefore, the Downton's t confidence interval can be defined by:

(5)

The MAD t confidence interval: The MAD t confidence interval for μ is given as:

(6)

where,

(7)

is the median absolute deviation from the sample median (Hampel, 1974) and bn is a correction factor needed to make bn MAD unbiased estimator for σ ( Rousseeuw and Croux, 1993).

ROBUSTNESS

In this study, the author adopted the definition of robustness used by Bradley (1980). Bradley (1980) considered a test to be robust for the nominal level α=0.05 if the coverage probability for the test, (simulated α ), was within 20% of the nominal level (between 0.04 and 0.06) and for α=0.01 if the coverage probability was within 50% of the nominal level ( between 0.005 and 0.015). According to this definition and the relationship between the two- sided test and the confidence interval, the confidence interval that has coverage probability (simulated confidence level) between 0.94 and 0.96 will be considered robust for the nominal confidence level 0.95 and the confidence interval that has coverage probability between 0.985 and 0.995 will be considered robust for the nominal confidence level 0.99.

To acknowledge the sampling error and the precision with which the probability coverage, estimates the nominal confidence level, 1–α, the margin of error, is calculated for each nominal level of 0.95 and 0.99 by multiplying the standard error of the coverage probability by respectively 1.96 and 2.58. The standard error is calculated as follows:

(8)

Table 1 gives the upper and the lower robustness bounds as well as the margins of errors for the nominal confidence levels 0.95 and 0.99.


Table 1: Robustness bounds and margins of errors for the various nominal confidence levels

THE METHOD

This study is funded by the Deanship of Research and Graduate studies in Zarqa University/Jordan and conducted in the department of mathematics and the laboratories of the department of Information Technology for a duration time of ten months started on March 3, 2010.

Computer simulation is used to carry out the objectives of the paper. Coverage probabilities and average widths are simulated for each confidence interval for samples from the distributions defined below:

Two large-kurtosis leptokurtic symmetric distributions with p.d.f.
  Standard double exponential distribution with excess kurtosis of 3
  Standard double translated Pareto distribution

(9)

with c= 4 and infinite excess kurtosis

Three non-kurtosis symmetric distributions with p.d.f.
Double translated Pareto distribution

(10)

with c = 1 and no mean, variance or higher moments defined

Standard Cauchy distribution which has no mean, variance or higher moments defined
Double translated Pareto distribution with c=2, mean zero and infinite variance

Samples of sizes n= 20(10)50,100 were simulated from uniform (0, 1) and then used to generate random samples from the above distributions by using the probability transform. Forty thousand simulations were used for each sample size for each distribution. From the simulation, the observed proportions of calculated confidence intervals that contain the location parameter μ were determined and compared to the nominal confidence level and the robustness bounds by using MATLAB, the language of technical computing version 6.5 (Enander et al., 1996). In addition to simulating the coverage probability for the confidence interval, the average width was simulated for those confidence intervals whose coverage probabilities are not less than the lower bound of the robustness bounds. Then the best confidence interval method is selected to be the one which has the smallest average width for all sample size and all distributions.

RESULTS

Table 2-7 show the simulation results of the Student's t and the proposed confidence intervals for the location parameters when the nominal confidence levels are 0.95 and 0.99.

Table 2 shows the coverage probabilities and the Average Widths (AW) of Student's t, Sps t, MAD t and Downton’s t confidence intervals for samples from standard normal distribution and nominal confidence levels 0.95 and 0.99. As expected, the results indicate that the coverage probabilities for Student's t and the proposed confidence intervals are equal to or close to the nominal confidence levels and within the robustness bounds for all sample sizes and both nominal confidence levels (The slight deviations from the nominal levels are a result of the simulation). The results show also, that the average widths of the proposed confidence intervals are slightly larger than the average width of Student’s t confidence interval. This makes, as expected; the Student’s t is slightly more precise than the proposed confidence intervals under the normality assumption.

Table 3 and 4 show the coverage probabilities and the average widths of all confidence interval methods at 0.95 and 0.99 nominal confidence levels for samples from heavy-tailed and large level of excess kurtosis distributions, the standard double exponential distribution with excess kurtosis of 3 and the standard double translated Pareto distribution with infinite excess kurtosis.

The results show that the coverage probability for the Student’s t confidence interval is equal to or close to the nominal confidence level and within the robustness bounds for all sample sizes. The coverage probabilities for the other three proposed methods are lager than the upper robustness bound except that the Sps t and the MAD t of Table 3 are within the robustness bounds at 0.99 nominal confidence level and all sample sizes.


Table 2: Coverage probability and average width for standard normal distribution


Table 3: Coverage probability and average width for standard double exponential distribution


Table 4: Coverage probability and average width for standard double translated Pareto distribution (c = 4)


Table 5: Coverage probability and average width for double translated Pareto distribution (c = 1)


Table 6: Coverage probability and average width for Cauchy (0, 1) distribution


Table 7: Coverage probability and average width for double translated Pareto distribution (c = 2)

Table 3 and 4 show also, that average widths of Sps t and MAD t confidence intervals are closed to each other and both are much smaller than the Student's t and Downton’s t confidence interval average widths. Because of this and associated increase in the coverage probability, the Spst and MAD t confidence intervals are “Stronger” than the Student’s t and Downton’s t at both nominal confidence levels. This makes MAD t and Sps t are much better methods than the others.

Table 5 and 6 show the coverage probabilities and the average widths of all confidence interval methods for samples from the non-mean, non-variance and non-kurtosis double translated Pareto and standard Cauchy distributions at 0.95 and 0.99 nominal confidence levels. The results show that the coverage probabilities for all confidence interval methods at both nominal confidence levels are larger than the upper robustness bound and the average width of the MAD t is the smallest for all sample sizes and both nominal confidence levels. Notice that Cauchy data has greatly changed the coverage probabilities and increased the average widths of Student’s t and Downton’s t confidence intervals. Because of this and the associated increase in coverage probabilities, the MAD t confidence interval is the strongest method.

The results of Table 7 show that MAD t and Sps t methods are equally stronger than the other methods under the non-kurtosis and infinite variance distributions.

DISCUSSION

The results in Table 3-7 suggest that MAD t followed by Sps t confidence interval is stronger and more resistant to the non-kurtosis and the large kurtosis leptokurtic symmetric distributions than Student's t and the other confidence interval methods and is the best, a result which is supported by Abu-Shawiesh et al. (2009). Student's t confidence interval does not behave properly for highly heavy tailed Cauchy and the no mean, no variance or higher moments defined Double translated Pareto of c=1 distributions, where various results in probability theory about expected values and variances, such as the strong law of large numbers and the central limit theorem will not work in such cases and where few of the sample standard deviations S are too large and hence few of the intervals widths are very long due to the reason that Cauchy distribution tends to generate extreme values which enlarge the value of the sample standard deviation and hence the value of the average width, a result which is supported by Gross (1976).

The results provide some insight into whether the Student's t confidence interval should not be used for all symmetric non- normal distributions. The following points suggest that MAD t followed by Sps t confidence interval is more appropriate than Student's t confidence interval.

As expected, when the population distribution is normal, the Student’s t confidence interval should be slightly more precise than the other methods. At any rate, MAD t and Sps t confidence intervals have good accuracy and precision
When the population distribution is symmetric, with large or infinite kurtosis as in Table 3 and 4, MAD t and Sps t confidence intervals are stronger than the Student's t confidence interval
When the population distribution has no mean, no variance and no kurtosis defined as in Table 5 and 6, Student’s t confidence interval is not preferred at all and the sample mean and the sample standard deviation are meaningless and not good choices for constructing confidence interval. On the other hand, MAD t confidence interval turned out to be better
When the population distribution has finite mean, infinite variance and no kurtosis, as in Table 7, MAD t and Sps t confidence intervals are stronger than the Student's t confidence interval

ACKNOWLEDGMENT

The author acknowledges the support received from the Deanship of Research and Graduate Studies in Zarqa University /Jordan. The author also wishes to thank the Editorial committee and the referees for their valuable comments which have improved the presentation of the paper.

REFERENCES

  • Abu-Shawiesh, M.O., F.M. Al-Athari and H.F. Kittani, 2009. Confidence interval for the mean of a contaminated normal distribution. J. Applied Sci., 9: 2835-2840.
    CrossRef    Direct Link    


  • Baklizi, A., 2007. Inference about the mean difference of two non-normal populatins based on independent samples: A comparative study. J. Staist. Comput. Simul., 77: 613-624.
    CrossRef    


  • Baklizi, A., 2008. Inference about the mean of skewed population: A comparative study. J. Staist. Comput. Simul., 78: 421-435.
    CrossRef    


  • Barnett, F., K. Mullen and J.G. Saw, 1967. Linear estimates of a population scale parameter. Biometrika, 54: 551-554.


  • Bartlett, M.S., 1935. The effect of non-normality on the t distribution. Proc. Cambridge Philosophical Soc., 31: 223-231.
    CrossRef    


  • Bradley, J.V., 1980. Nonrobustness in Z, t and F tests at large sample sizes. Bull. Psychonomics Soc., 16: 333-336.


  • Chaffin, W.W. and G.S. Rhiel, 1993. The effect of skewness and kurtosis on the One-Sample t Test and the impact of knowledge of the population standard deviation. J. Comput. Simul., 46: 79-90.
    CrossRef    


  • Chami, P., R. Antoine and A. Sahai, 2007. On efficient confidence intervals for the log-normal mean. J. Applied Sci., 7: 1790-1794.
    CrossRef    Direct Link    


  • Downton, F., 1966. Linear estimates with polynomial coefficients. Biometrika, 53: 129-141.


  • Enander, E.P., A. Sjoberg, B. Melin and P. Isaksson, 1996. The Matlab, Hand Book. Addison Wesley, London


  • Gross, A.M., 1976. Confidence interval robustness with long-tailed symmetric distributions. J. Am. Statist. Assoc., 71: 409-416.


  • Hampel, F.R., 1974. The influence curve and its role in robust estimation. J. Am. Stat. Assoc., 69: 383-393.
    Direct Link    


  • Hettmansperger, T.P. and J.W. McKean, 1998. Robust Nonparametric Statistical Methods. 1st Edn., Hodder Arnold, London, ISBN: 978-0340549377


  • Iglewicz, B., 1983. Robust Scale Estimators and Confidence Intervals for Location. In: Understanding Robust and Exploratory Data Analysis, Hoaglin, D.C., F. Mosteller and J.W. Tukey (Eds.). John Wiley and Sons, New York, ISBN: 0-471-38491-7, pp: 405-431


  • Johnson, N.J., 1978. Modified t-tests and confidence intervals for asymmetrical populations. J. Am. Statist. Assoc., 73: 536-544.
    Direct Link    


  • Kibria, B.M.G., 2006. Modified confidence intervals for the mean of the asymmetric distribution. Pak. J. Statist., 22: 109-120.
    Direct Link    


  • Kleijnen, J.P.C., G.L.J. Kloppenburg and F.L. Meeuwsen, 1986. Testing the mean of asymmetric population: Johnson`s modified t test revisited. Commun. Statist. Simul. Comput., 15: 715-732.
    CrossRef    


  • Meeden, G., 1999. Interval estimators for the population mean for skewed distributions with a small sample size. J. Applied Statist., 26: 81-96.


  • Mood, A.M., F.A. Graybill and D.C. Boes, 1974. Introduction to the Theory of Statistics. 3rd Edn., McGraw-Hill, New York, pp: 405-431


  • Pearson, E.S. and N.W. Please, 1975. Relation between the shape of population distribution and the robustness of four simple testing statistics. Biometrika, 62: 223-241.
    Direct Link    


  • Pocock, S.J., 1982. When not to rely on the central limit theorem: An example from absentee data. Commun. Statist. Theory Meth., 11: 2169-2179.
    Direct Link    


  • Rhiel, G.S. and W.W. Chaffin, 1996. An investigation of the large-sample/small-sample approach to the one-sample test for a mean (Sigma Unknown). J. Statist. Educ., 4: 1-13.
    Direct Link    


  • Rousseeuw, P.J. and C. Croux, 1993. Alternatives to the median absolute deviation. J. Am. Statist. Assoc., 80: 1273-1283.


  • Sarhan, A.E. and B.G. Greenberg, 1962. Contributions to Order Statistics. 1st Edn., John Wiley and Sons, New York, ISBN: 978-0471754206


  • Shi, W. and B. Kibria, 2007. On some confidence intervals for estimating the mean of a skewed population. Int. J. Math. Educ. Technol., 38: 412-421.
    Direct Link    


  • Staudte, R.G. and S.J. Sheather, 1990. Robust Estimation and Testing. 2nd Edn., John Wiley and Sons, New York, ISBN: 978-0-471-85547-7


  • Student, 1908. The probable error of a mean. Biometrika, 6: 1-25.


  • Wardrop, R.L., 1994. Statistics: Learning in the Presence of Variation. Wm. C. Brown Publishers, Dubuque, IA


  • Willink, R., 2005. A confidence interval and test for the mean of an asymmetric distribution. Commun. Statist. Theory Meth., 34: 753-766.
    CrossRef    

  • © Science Alert. All Rights Reserved