How Weibull Distribution Skewness and Chi-square Test Statistics are Related: Simulation Study

Elobaid, Rafida M.

ABSTRACT

Objective: The purpose of this study is to investigate the behavior of the values of the χ² test statistic with the variation of the skewness of right skewed distributions. Methodology: Weibull distribution is selected on the basis that it is a particularly well-suited distribution to accommodate positively skewed distributions. For fixed sample sizes, different levels of skewness are considered and the corresponding values of χ² test statistic are calculated using simulation. Non parametric Kendall τ test was also used to assess the relationship between the two statistical values. Results: The study results show that the values of χ² test statistic decrease as the values of skewness increase indicating that the data will follow the assumed distribution. Kendall τ test reported a significant negative correlation between χ² statistics and skewness values. Conclusion: The inverse relation between χ² statistics and skewness values leads to statistically use the skewness measure as indicator to determine the validity of the assumed right skewed distribution to fit the data best. This is the first study focusing on examining the relationship between χ² test statistics and skewness values. The correlation is investigated using non parametric Kendall τ test. This investigation has not been studied in the statistical previous study.

PDF Abstract XML References Citation

INTRODUCTION

Testing for the goodness-of-fit of a distribution to a data set is a crucial aspect of data analysis. In this respect, much has been done in particular when data are assumed to come from continuous distributions^1-3. Numerous pervious studies on goodness-of-fit test try to provide a theoretical basis for studying empirical distribution functions. Such goodness-of-fit tests are Cramer-Von Mises type tests^4-7 and Kolmogorov-Simirnov^8,9. Mayer and Butler¹⁰ and Power¹¹ provided two overviews of model validation containing reviews of statistical studies to goodness-of-fit. In contrast, common studies to present the topic of skewness were discussed before^12-14. However, not many previous studies on the relationship of the values of χ² test statistics and skewness or kurtosis can be found.

Goodness-of-fit provides indication about the relationship between the model and the data from which it was derived or the predictability of a model when applied to an independent set of data. In this study, the behavior of the values of the χ²test statistic with the variation of the skewness of right skewed distributions is investigated. From all quantitative goodness-of-fit techniques the χ²goodness-of-fit tests is selected because it is sensitive to deviations in the tails of the distribution¹⁵. Also, it can be applied to any type of distributions whether continuous or discrete.

Random numbers from several right skewed distributions are generated using simulation method¹⁶. The right continuous distribution considered in this study is Weibull distribution. Weibull distribution is characterized by the shape and scale parameters and it is a generalization of the exponential distribution. A continuous random variable x has a Weibull distribution if its Probability Density Function (PDF) has the form:

(1)

where, δ is the scale parameters and λ is a shape parameter shown in Fig. 1 and the Cumulative Distribution Function (CDF) defined using the following equation:

(2)

This study considers fixed sample sizes, different levels of skewness and calculate the corresponding values of χ² test statistic accordingly.


Fig. 1:	Weibull distribution PDFS

Kendall τ test¹⁷ is employed as a non parametric tool to assess whether the relationship between the values of χ² goodness-of-fit test statistics and measurement of skewness exists. It also used to test the null hypothesis of independence. The chief advantage of Kendall τ is that it is distribution approaches the normal distribution quit rapidly so that the normal approximation is better for Kendall τ than it is for Spearman’s. Another advantage is its direct and simple interpretation in terms of probabilities of observing concordant and discordant pairs.

MATERIALS AND METHODS

Simulation is conducted to generate the artificial data from Weibull distribution using three different samples of size 20, 50 and 100. The samples are chosen in such a way that the corresponding skewness measure obtained indicate various degrees of skewness.

The observations in the samples are divided into intervals in order to compute the frequency of observations within an interval. The expected values are computed using the formula np where, n represents the sample size and p is the probability of success in an interval for assumed distribution. This probability is calculated from the Cumulative Distribution Function (CDF) of Weibull distribution. The χ² values are calculated using the following equation:

(3)

where, e_i and o_iare the observed and expected values, respectively and k is the number of intervals. The χ² test statistics values obtained to test whether the data follows the assumed distribution. A table of χ² values versus skewness measures is constructed based on sample sizes 20, 50 and 100. Kendall τ statistics is used to measure the correlation between the dependent variable (values of the χ² test statistic) and the independent variable (measures of skewness). The measure of correlation proposed by Kendall:

(4)

where, N_c is the number of concordant pairs of observations and N_d is the total number of discordant pairs of observations. If all pairs are concordant, Kendall equals 1.0, if all pairs are discordant, Kendall τ equals -1.0 therefore, Kendall τ satisfy the requirement of the rank correlation measure which is –1<ρ<1. By computing Kendall τ the positivity or negativity of the rank correlation between the pairs being observed can be determined.

Kendall τ is also used to test the null hypothesis of independence between the pairs of observations, x_i is the measures of skewness and y_i is the values of χ² test statistics. The test used by finding the difference between the concordant and discordant pairs:

(5)

At a particular level of significance, the computed τ is compared with a critical value read from the Kendall τ table and rejects the null hypothesis if Kendall τ value is greater than the critical value and concludes that χ² statistics and skewness measure are not independent.

RESULTS AND DISCUSSION

Several continuous right skewed distributions are used from which Weibull distribution is chosen to conduct this study. Three different sample sizes and different levels of skewness (SK) considered by controlling the shape parameter (λ) are used. Different values of χ² test statistic are obtained and displayed in Table 1. The results from Table 1 show that the values of χ² test statistic decrease as the values of skewness increase^18,19. This study explained the case of right skewed distributions yet, the trend of this relationship between χ² test statistics and skewness values may vary when left skewed distributions are used and this is of interest to be examined. Moreover, the results encourage other researchers to asses this relationship using modified χ² test statistics and other goodness-of-fit tests such as Cramer-Von Mises type tests, Kolmogorov-Simirnov and Anderson Darling.

Using significant level of 0.05, the critical values for sample size 20, 50 and 100 are 31.41, 67.50 and 125.34, respectively.

Table 1:	Results of (Sk) and χ² test statistics for Weibull distribution

Table 2:	Kendall τ test results for Weibull distribution

Comparing χ² test statistic values in Table 1 with these critical values show that the data of the three sample sizes are from the assumed Weibull distribution except for the samples with small values of skewness (i.e., data with shape parameter λ = 2 and data with sample size = 20 and shape parameter λ = 1). This is consistent with Safdar and Ahmed²⁰ who stated that for λ = 2 the distribution is Rayleigh and resemble to normal for λ = 3.48 and with Yavuz²¹ who used Weibull shape parameter as a measure of reliability. Typical λ values vary from application to application but in many situations distributions with λ in the range 0.5-3 are appropriate²². The concept of shape is less familiar than central tendency and variability in previous study. These results are interesting and important to researchers in relevant fields by introducing on one hand the shape measurement, skewness as alternative technique to determine the results of the distribution to the data. On the other hand, the findings can provide the first bricks in using other shape measures such as kurtosis to present the data fitness.

The correlation has been examined using non parametric Kendall τ test. The data used consists of a bivariate random samples of size 9 (x_i, y_i) for I = 1, 2, … ,9. The data used is the same data generated using SAS program and used in Table 1. Using Kendall τ method illustrated in section 2, the results for Weibull distribution are summarized in Table 2. Using Eq. 4, a correlation value of -0.5 indicates that there is negative rank correlation between skewness and χ² statistic values.

Using Eq. 5 the difference between concordant and discordant pairs was -18. If τ exceeds (1-α) level of significance the null hypothesis is rejected. With significant level of 0.1 the quintile for a two-tailed test of size 9 is τ_{0.05, 9}= 16. Kendall statistics show that the absolute value of calculated τ is equal to 18 which exceeds the quintiles value and indicate that χ² statistics and skewness measure are not independent.

CONCLUSION

The relation between χ² test statistics and skewness measure of Weibull distribution was presented. The results show that with large values of skewness, the distribution fits the data well. It is demonstrated that for different sample sizes, the value of χ² test statistics is inversely proportional to the values of skewness. This confirmed by conducting Kendall τ test which suggests that more skewness leads to statistically significant inverse relation between χ² test statistics and the measurement of skewness. This result is of important and significantly advances the research field since, skewness measure can be used as indicator to determine the validity of the assumed right skewed distribution to fit the data best.

REFERENCES

Lenart, A. and T.I. Missov, 2016. Goodness-of-fit tests for the Gompertz distribution. Commun. Statist. Theory Methods, 45: 2920-2937.
CrossRef Direct Link
Mehrannia, H. and A. Pakgohar, 2014. Using easy fit software for goodness-of-fit test and data generation. Int. J. Mathe. Arch., 5: 118-124.
Direct Link
Singla, N., K. Jain and S.K. Sharma, 2016. Goodness of fit tests and power comparisons for weighted gamma distribution. REVSTAT-Statist. J., 14: 29-48.
Direct Link
Anderson, T.W., 1962. On the distribution of the two-sample cramer-von mises criterion. Ann. Math. Statist., 33: 1148-1159.
Direct Link
Beirlant, J., T. de Wet and Y. Goegebeur, 2006. A goodness-of-fit statistic for Pareto-type behaviour. J. Comput. Applied Math., 186: 99-116.
CrossRef Direct Link
Cramer, H., 1999. Mathematical Methods of Statistics. Princeton University Press, Princeton, New Jersey.
Greenwood, M., 1946. The statistical study of infectious diseases. J. Royal Statist. Soc., 109: 85-110.
CrossRef Direct Link
Chakravarty, I.M., R.G. Laha and J. Roy, 1967. Handbook of Methods of Applied Statistics. John Wiley and Sons, New York.
Corder, G.W. and D.I. Foreman, 2014. Nonparametric Statistics: A Step-by-Step Approach. 2nd Edn., Wiley, New York, ISBN: 9781118840429, Pages: 288.
Mayer, D.G. and D.G. Butler, 1993. Statistical validation. Ecol. Model., 68: 21-32.
CrossRef Direct Link
Power, M., 1993. The predictive validation of ecological and environmental models. Ecol. Model., 68: 33-50.
CrossRef Direct Link
Arnold, B.C. and R.A. Groeneveld, 1995. Measuring skewness with respect to the mode. J. Am. Stat., 49: 34-38.
CrossRef Direct Link
Doane, D.P. and L.E. Seward, 2011. Measuring skewness: A forgotten statistic? J. Stat. Educ., 19: 1-18.
Direct Link
Tabor, J., 2010. Investigating the investigative task: Testing for skewness-an investigation of different test statistics and their power to detect skewness. J. Stat. Educ., Vol. 18.
Direct Link
Snedecor, G.W. and W.G. Cochran, 1989. Statistical Methods. 8th Edn., Iowa State University Press, Ames, IA, USA, ISBN-13: 978-0813815619, Pages: 524.
Direct Link
Voit, E.O. and L.H. Schwacke, 2000. Random number generation from right-skewed, symmetric and left-skewed distributions. Risk Anal., 20: 59-72.
CrossRef Direct Link
Romdhani, H., L. Lakhal-Chaieb and L.P. Rivest, 2014. An exchangeable Kendall's tau for clustered data. Can. J. Stat., 42: 384-403.
CrossRef Direct Link
Lane, D., 2011. Online Statistics Education. In: International Encyclopedia of Statistical Science, Lovric, M. (Ed.). Springer, Berlin, Heidelberg, ISBN: 978-3-642-04897-5, pp: 1018-1020.
Lane, D.M., 2013. Introduction to Statistics: An Interactive e-Book. David Lane Publishing, Grantham, England, Pages: 641.
Safdar, S. and E. Ahmed, 2014. Process capability indices for shape parameter of Weibull distribution. Open J. Stat., 4: 207-219.
CrossRef Direct Link
Yavuz, A.A., 2013. Estimation of the shape parameter of the Weibull distribution using linear regression methods: Non-censored samples. Qual. Reliab. Eng. Int., 29: 1207-1219.
CrossRef Direct Link
Lawless, J.F., 2003. Statistical Models and Methods for Lifetime Data. 2nd Edn., John Wiley and Sons Inc., New York, USA.

Asian Journal of Applied Sciences

Research Article

How Weibull Distribution Skewness and Chi-square Test Statistics are Related: Simulation Study

ABSTRACT

How to cite this article

Search