Research Article
How Weibull Distribution Skewness and Chi-square Test Statistics are Related: Simulation Study
Department of General Science, Prince Sultan University, Riyadh, KSA
LiveDNA: 249.15904
Testing for the goodness-of-fit of a distribution to a data set is a crucial aspect of data analysis. In this respect, much has been done in particular when data are assumed to come from continuous distributions1-3. Numerous pervious studies on goodness-of-fit test try to provide a theoretical basis for studying empirical distribution functions. Such goodness-of-fit tests are Cramer-Von Mises type tests4-7 and Kolmogorov-Simirnov8,9. Mayer and Butler10 and Power11 provided two overviews of model validation containing reviews of statistical studies to goodness-of-fit. In contrast, common studies to present the topic of skewness were discussed before12-14. However, not many previous studies on the relationship of the values of χ2 test statistics and skewness or kurtosis can be found.
Goodness-of-fit provides indication about the relationship between the model and the data from which it was derived or the predictability of a model when applied to an independent set of data. In this study, the behavior of the values of the χ2 test statistic with the variation of the skewness of right skewed distributions is investigated. From all quantitative goodness-of-fit techniques the χ2 goodness-of-fit tests is selected because it is sensitive to deviations in the tails of the distribution15. Also, it can be applied to any type of distributions whether continuous or discrete.
Random numbers from several right skewed distributions are generated using simulation method16. The right continuous distribution considered in this study is Weibull distribution. Weibull distribution is characterized by the shape and scale parameters and it is a generalization of the exponential distribution. A continuous random variable x has a Weibull distribution if its Probability Density Function (PDF) has the form:
(1) |
where, δ is the scale parameters and λ is a shape parameter shown in Fig. 1 and the Cumulative Distribution Function (CDF) defined using the following equation:
(2) |
This study considers fixed sample sizes, different levels of skewness and calculate the corresponding values of χ2 test statistic accordingly.
Fig. 1: | Weibull distribution PDFS |
Kendall τ test17 is employed as a non parametric tool to assess whether the relationship between the values of χ2 goodness-of-fit test statistics and measurement of skewness exists. It also used to test the null hypothesis of independence. The chief advantage of Kendall τ is that it is distribution approaches the normal distribution quit rapidly so that the normal approximation is better for Kendall τ than it is for Spearmans. Another advantage is its direct and simple interpretation in terms of probabilities of observing concordant and discordant pairs.
Simulation is conducted to generate the artificial data from Weibull distribution using three different samples of size 20, 50 and 100. The samples are chosen in such a way that the corresponding skewness measure obtained indicate various degrees of skewness.
The observations in the samples are divided into intervals in order to compute the frequency of observations within an interval. The expected values are computed using the formula np where, n represents the sample size and p is the probability of success in an interval for assumed distribution. This probability is calculated from the Cumulative Distribution Function (CDF) of Weibull distribution. The χ2 values are calculated using the following equation:
(3) |
where, ei and oi are the observed and expected values, respectively and k is the number of intervals. The χ2 test statistics values obtained to test whether the data follows the assumed distribution. A table of χ2 values versus skewness measures is constructed based on sample sizes 20, 50 and 100. Kendall τ statistics is used to measure the correlation between the dependent variable (values of the χ2 test statistic) and the independent variable (measures of skewness). The measure of correlation proposed by Kendall:
(4) |
where, Nc is the number of concordant pairs of observations and Nd is the total number of discordant pairs of observations. If all pairs are concordant, Kendall equals 1.0, if all pairs are discordant, Kendall τ equals -1.0 therefore, Kendall τ satisfy the requirement of the rank correlation measure which is 1<ρ<1. By computing Kendall τ the positivity or negativity of the rank correlation between the pairs being observed can be determined.
Kendall τ is also used to test the null hypothesis of independence between the pairs of observations, xi is the measures of skewness and yi is the values of χ2 test statistics. The test used by finding the difference between the concordant and discordant pairs:
(5) |
At a particular level of significance, the computed τ is compared with a critical value read from the Kendall τ table and rejects the null hypothesis if Kendall τ value is greater than the critical value and concludes that χ2 statistics and skewness measure are not independent.
Several continuous right skewed distributions are used from which Weibull distribution is chosen to conduct this study. Three different sample sizes and different levels of skewness (SK) considered by controlling the shape parameter (λ) are used. Different values of χ2 test statistic are obtained and displayed in Table 1. The results from Table 1 show that the values of χ2 test statistic decrease as the values of skewness increase18,19. This study explained the case of right skewed distributions yet, the trend of this relationship between χ2 test statistics and skewness values may vary when left skewed distributions are used and this is of interest to be examined. Moreover, the results encourage other researchers to asses this relationship using modified χ2 test statistics and other goodness-of-fit tests such as Cramer-Von Mises type tests, Kolmogorov-Simirnov and Anderson Darling.
Using significant level of 0.05, the critical values for sample size 20, 50 and 100 are 31.41, 67.50 and 125.34, respectively.
Table 1: | Results of (Sk) and χ2 test statistics for Weibull distribution |
Table 2: | Kendall τ test results for Weibull distribution |
Comparing χ2 test statistic values in Table 1 with these critical values show that the data of the three sample sizes are from the assumed Weibull distribution except for the samples with small values of skewness (i.e., data with shape parameter λ = 2 and data with sample size = 20 and shape parameter λ = 1). This is consistent with Safdar and Ahmed20 who stated that for λ = 2 the distribution is Rayleigh and resemble to normal for λ = 3.48 and with Yavuz21 who used Weibull shape parameter as a measure of reliability. Typical λ values vary from application to application but in many situations distributions with λ in the range 0.5-3 are appropriate22. The concept of shape is less familiar than central tendency and variability in previous study. These results are interesting and important to researchers in relevant fields by introducing on one hand the shape measurement, skewness as alternative technique to determine the results of the distribution to the data. On the other hand, the findings can provide the first bricks in using other shape measures such as kurtosis to present the data fitness.
The correlation has been examined using non parametric Kendall τ test. The data used consists of a bivariate random samples of size 9 (xi, yi) for I = 1, 2, ,9. The data used is the same data generated using SAS program and used in Table 1. Using Kendall τ method illustrated in section 2, the results for Weibull distribution are summarized in Table 2. Using Eq. 4, a correlation value of -0.5 indicates that there is negative rank correlation between skewness and χ2 statistic values.
Using Eq. 5 the difference between concordant and discordant pairs was -18. If τ exceeds (1-α) level of significance the null hypothesis is rejected. With significant level of 0.1 the quintile for a two-tailed test of size 9 is τ0.05, 9 = 16. Kendall statistics show that the absolute value of calculated τ is equal to 18 which exceeds the quintiles value and indicate that χ2 statistics and skewness measure are not independent.
The relation between χ2 test statistics and skewness measure of Weibull distribution was presented. The results show that with large values of skewness, the distribution fits the data well. It is demonstrated that for different sample sizes, the value of χ2 test statistics is inversely proportional to the values of skewness. This confirmed by conducting Kendall τ test which suggests that more skewness leads to statistically significant inverse relation between χ2 test statistics and the measurement of skewness. This result is of important and significantly advances the research field since, skewness measure can be used as indicator to determine the validity of the assumed right skewed distribution to fit the data best.