Subscribe Now Subscribe Today
Research Article
 

Generalization of Klotz’s Test



F. Karaman
 
ABSTRACT

In this study, Klotz’s test for heterogeneity of variance is generalized to factorial designs. Although, Levene’s test, the jacknife and Fligner and Killeen’s test for heterogeneity of variance were generalized to factorial designs, Klotz’s test was not studied. The performance of Klotz’s test is compared with other previously studied tests. An application and comparison of these analyses to the 2 by 2 factorial design are examined in detail. A simulation study was performed for 10000 data sets to compare power and robustness properties of the tests. It is observed that as sample size increased and as the difference in variances increased all tests have higher power. According to these simulations, Klotz’s test can particularly be recommended for symmetric parent distributions. For skewed parent distribution like chi-square, Klotz’s test should not be used since it is not robust. Data from a multifactor agricultural system were used as an example to illustrate the usefulness of these tests.

Services
Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

F. Karaman , 2009. Generalization of Klotz’s Test. Journal of Applied Sciences, 9: 2916-2924.

DOI: 10.3923/jas.2009.2916.2924

URL: https://scialert.net/abstract/?doi=jas.2009.2916.2924

INTRODUCTION

Most frequently, statistical analysis were performed to detect the differences in the location or means of the treatment groups, but in some cases, differences in dispersion may be of as much interest. Distance-based tests were proposed by Anderson (2006) to test homogeneity of mulivariate dispesions. For unreplicated two-level fractional factorial designs McGrath and Lin (2002) developed a nonparametric dispersion tests. Ozaydin et al. (1999) studied factor effects on the variance in studies of an agricultural system involving chile peppers and two chile-pepper pests-root-knot nematodes and yellow nutsedge. Since there exist factor effects on the variance, analysis to detect factor effects on differences in means failed. Hence, this system required methods for examining the structure of effects on the variance. Although, there are many other situations to understand factor effects to the variance, pesticide application is one example to understand factor effects on the variance. Higher overall amounts to be applied for the methods with greater variability, so that; each area will receive the minimum amount of pesticide to be effective.

Some researchers performed testing heterogeneity of variance for one-way design. Some of these researchers are: Milliken and Johnson (1984) and Miller (1968) and so on. Among them Conover et al. (1981) performed 56 tests in a comprehensive simulation study of tests for heterogeneity of variance in the one-way model.

The Levene and the jackknife tests to detect heterogeneity of variance as a factor effect in a replicated two-way treatment structure were studied and then generalization of Fligner and Killeen (1976)’s test was considered by Ozaydin et al. (1999).

While, many tests for scale exist for the completely randomized design, fewer tests have been considered for more complex designs. In particular nonparametric rank based tests for scale have not been applied to more complex designs. In addition to Fligner and Killeen’s nonparametric rank-based test, which was studied by Ozaydin et al. (1999), it is of interest to consider the generalization of one such test, Klotz (1962)’s nonparametric rank-based test, to a two-way treatment structure.

For one-way design according to power and robustness, performance of Klotz’s test for symmetric parent distributions was quite well. It was also recommended for symmetric parent distributions by Conover et al. (1981).

As it mentioned on Ozaydin et al. (1999) study; computation of pseudo-observations reflects changes in the magnitude of the variance. Then either the usual ANOVA or an analogous chi-square analysis was used to analyze these pseudo-observations. A review of the tests was studied by Ozaydin et al. (1999).

Simulation studies were used to examine the power and the robustness of the generalized tests for the special case of a 2 by 2 treatment structure with balanced data. Performance of Klotz’s test for the balanced two-way factorial treatment sructure is compared by other previously studied tests. These simulation studies were performed for 10000 data sets.

The simulation using SAS/STAT software was performed. Uniform (short-tailed), normal, double exponential (long-tailed) and chi-square (skewed) parent distributions were used for the simulation study. Samples of size 5, 10 and 20 observations were used for the different variance combinations.

THE MODEL AND HYPOTHESES

For the fixed two-way design with equal replication the cell mean model is:

(1)

where, i = ..., tA > 1, j = 1..., tB > 1,k = 1..., n > 1. In the model, yijk is the k’th response from the i’th level of factor A and the j’th level of factor B. The μij are unknown constants representing the average response due to the i’th level of factor A and the j’th level of factor B. Then, the εijk are the random error associated with the observation yijk and are assumed to be independently distributed with zero mean and variance .

To investigate the behavior of the following test statistics under less restrictive distributional assumptions, the assumption that the errors are assumed to be normal random variables is suspended.

For my studies, I use the special case tA = 2, tB = 2, the null hypothesis is , (all variances are equal).

At least one variance does not equal to others’ is the alternative hypothesis. Instead of using usual one-way ANOVA analysis to test equality of variance, following the Ozaydin et al. (1999), I perform an analysis that is analogous to the usual ANOVA for a two-way replicated treatment structure. The following null hypothesis are used to find the differences in variances and, if they differ, the form of the difference:

(interaction between factor A and factor B does not exist)
(there is not a factor A main effect on the variance)
(there is not a factor B main effect on the variance)

If there exists an interaction, main effects are not considered further. The presence of interaction suggests that a simple additive main effects structure does not exist among the factor effects on the variance so that a reduction in summarizing the variances is not possible.

THE TEST PROCEDURES

Levene’s test and the jackknife test are classified as tests based on modifications of the F-test for means and Klotz’s test and Fligner and Killeen’s test are classified as a linear rank tests (or a rank-like tests) (Conover et al., 1981). For these tests, pseudo-observations that in some way reflect changes in the magnitude of the variance are computed. Then, the pseudo-observations are analyzed.

In general, the pseudo-observations are some function of the absolute deviations from either the mean or the median. For the (1) continuous pseudo-observations, the analysis will be the two-way ANOVA with F-tests. For the (2) rank-based pseudo-observations, an analogous analysis using chi-square test statistics is more appropriate. The tests considered in this study, such as Levene’s test, the jacknife test, Klotz’s test and Fligner and Killeen’s test for the one-way model are among the tests that follow this basic test rationale.

The following seven tests are considered and compared for balanced data using the pseudo-observations in this study (Ozaydin et al., 1999) considered the first five of these tests and then compared all these five tests as having the best performance on the basis of robustness and power. The last two of the seven tests are generalizations of Klotz’s test.

(1) For (Lev 1) the square of the residuals was computed, then ANOVA was performed on these pseudo-observations:

(2)

where, is the sample mean for the ij’th group (Ozaydin et al., 1999).

(2) For a modified Levene’s test (Lev 2), the absolute deviations from the sample medians was computed as follows, then ANOVA was performed on these pseudo-observations (Ozaydin et al., 1999).

(3)

(3) For the jackknife test, (Ozaydin et al., 1999) performed ANOVA F-tests on the following pseudo-observations (jack):

(4)

where, is the usual sample variance for the ijth group.

was calculated and then:


(4) The first generalization of Fligner and Killeen’s test applies an F-test analysis to rank-based pseudo-observations. Following (Conover et al., 1981; Ozaydin et al., 1999) used absolute deviations from the median:

The Zijk’s were then ranked from smallest to largest (i.e., from 1 to N = ntAtB). The ranks are denoted by Rijk. Following Conover (1980), if there were ties in my sample, I calculated the average of the ranks of the tied values and used the average as the Rijk value for the tied Zijk’s.

Then to modify the ranks, a score function was used. The score function (denoted FK) was used by Fligner and Killeen (1976):

(5)

where, Φ is the standard normal cumulative distribution function. Following (Ozaydin et al., 1999), I used Fligner and Killeen’s original score function and denote the values a(Rijk) by aijk.

Conover et al. (1981) used the positive square root of this score function. Ozaydin et al. (1999) used Fligner and Killeen’s original score function since it has better power and is more robust than using the positive square root of this score function. The ANOVA F-test analysis was performed to the scores aijk. The procedure obtained by applying F-tests to the pseudo-observations FK is denoted as FK-F, which is same as by Ozaydin et al. (1999).

(5) The second generalization of Fligner and Killeen’s test performs a chi-square analysis on the pseudo-observations FK. For the overall test of equality of variances, the one-way test statistic presented by Conover et al. (1981) was used:

(6)

where,

where, is the mean score in the ij’th sample and is the overall mean score. Under the null hypothesis of equal variances, D2 is the known variance of the scores. Ozaydin et al. (1999) partitioned X into three pieces, corresponding to each of the three hypotheses mentioned earlier as:

(7)

(8)

where, is the mean score in the ith sample

(9)

where, is the mean score in the jth sample

Under the null of no differences, each of the above test statistics is approximately distributed as a chi-square random variable with degrees of freedom corresponding to the numerator sum of squares:

where, I denote the procedure obtained by applying the chi-square tests to the pseudo-observations FK by FK-Chi.

(6) The first generalization of Klotz’s test applies an F-test analysis to rank-based pseudo-observations. Following the same procedures as in (4), values of Rijk were calculated

A score function was then used to modify the ranks. Klotz (1962) used the score function (denoted Klotzs)

(10)

where,Φ is the standard normal cumulative distribution function. The values a(Rijk) are denoted by aijk. The ANOVA F-test analysis was applied to the scores aijk. The procedure obtained by applying F-tests to the pseudo-observations Klotzs is denoted as Klotz-F.

(7) A second variant of Klotz’s test is a generalization of the usual rank-based nonparametric approach; i.e., a chi-square analysis was performed on the pseudo-observations Klotzs. Following the same steps explained as in (5), the chi-square tests were performed. The procedure obtained by applying the chi-square tests to the pseudo-observations Klotzs is denoted by Klotz-Chi

The last four tests are variations of Fligner and Killeen’s and Klotz’s nonparametric rank-based approach. Chi-square tests are more appropriate and have traditionally been used for rank-based tests. Following Conover and Iman (1981), rank transformation procedures applying the usual ANOVA F-tests to the ranks are easier to obtain. The fourth and sixth tests use this notion but apply the usual ANOVA F-tests to scores based on ranks. The fifth and seventh tests use a traditional chi-square analysis of the scores.

All of the above tests use pseudo-observations in different way. For example, pseudo-observations which are used in Lev1 are on the scale of variance, pseudo-observations which are used in Lev2 are on the scale of standard deviation, in Jack are on a log transformed scale. The FK and Klotz pseudo-observations are based on ranks.

RESULTS

It was shown by Conover et al. (1981) study that performance of Klotz’s test for symmetric parent distributions was quite well according to power and robustness for one-way design. Therefore, in this study performance of Klotz’s test statistics was studied for two-way design. Lev1 and Lev2 are modifications of the Levene’s test. ANOVA performed on any monotonically increasing function of the absolute value of deviations from the sample means. To obtain more robust tests Levene’s idea can be applied to deviations from the median (Miller, 1968). The third test is the jackniffe test. The forth and fifth tests are variations of Fligner and Killeen’s and the sixth and seventh tests are variations of Klotz’s nonparametric rank-based approach. For rank-based tests, chi-square tests are more appropriate then ANOVA. In this study, usual ANOVA F-tests applied to the ranks were also studied since it is easier to obtain (Conover and Iman, 1981). The fifth (FK-chi) and seventh Klotz-chi tests use traditional chi-square analysis of the scores. The forth (FK) and sixth (Klotz) tests use ANOVA F-tests to scores based on the ranks.

For the special case of a 2 by 2 treatment structure, simulations using SAS/STAT software and procedures compared the power and robustness of the seven proposed test procedures. Variance configurations, a range of parent distributions and sample sizes were indicated.

Uniform (short-tailed), normal, double exponential (long-tailed) and Chi-square (skewed) parent distributions were used for the simulation study. When sampling from a chi-square distribution to maintain the level of skewness, 1 d.f. chi-square random variables were generated and multiplied by the appropriate constant to attain the indicated variability. Samples of size 5, 10 and 20 observations were used for each of the following variance combinations indicated by ():

(1) (1,1,1,1)
(2) (1,1,2,2), (1,1,4,4), (1,1,8,8)
(3a) (1,2,2,1), (1,4,4,1), (1,8,8,1)
(3b) (1,1,1,2), (1,1,1,4), (1,1,1,8)
(4) (1,2,2,3), (1,4,4,7), (1,8,8,15)

Variance combination (1) was used to approximate the true type I error rate. (i.e., the observed significance level). The set of variance combinations (2) was used to investigate the power of the tests when only a factor main effect is present. The set of variance combinations (3a) was used to investigate the power of the tests when an interaction effect is present. The set of variance combinations (3b) was used to investigate the power of the tests when an interaction effect and both main effects are present. The set of variance combinations (4) was used to investigate the power of the tests when both A and B main effects exist.

The simulation results are shown in the Table 1-3. The simulation results of first five tests are similar to (Ozaydin et al., 1999) results. It is observed that as sample size increased and as the difference in variances increased all tests have higher power.

All of the tests are robust with the exception of Jack, Lev 1, Klotz-F and Klotz-Chi. Jack, Klotz-F and Klotz-Chi are never robust when applied to samples from a chi-square parent distribution. Lev 1 is not robust when used with small samples (n = 5) from a chi-square parent distribution. Lev 2, FK-F and FK-Chi suffer from being too conservative and also lack power when applied to small samples (n = 5).

Table 1: Results of small samples (n = 5)
For VC see the variance combinations. As for the proportions in the columns Model, Only A, Only B, Both A and B, INT are based on 10,000 simulations for VC (1) and 30,000 for all other VC’s, since these combine all three subcombinations for combinations (2), (3a), (3b), and (4). VC (1) exploring robustness, while VC (2) explores power in the presence of an A main effect, VC (3a) and (3b) explore power in the presence of interaction, VC (4) investigating power in the presence of A and B main effects. The column Model corresponds to the overall test of homogeneity of variance. Categories Only A, Only B, Both A and B, and INT are mutually exclusive and correspond to a factorial ANOVA

Table 2: Resulsts of moderate samples (n = 10).
For VC see the variance combinations. As for the proportions in the columns Model, Only A, Only B, Both A and B, INT are based on 10,000 simulations for VC (1) and 30,000 for all other VC’s, since these combine all three subcombinations for combinations (2), (3a), (3b), and (4). VC (1) exploring robustness, while VC (2) explores power in the presence of an A main effect, VC (3a) and (3b) explore power in the presence of interaction, VC (4) investigating power in the presence of A and B main effects. The column Model corresponds to the overall test of homogeneity of variance. Categories Only A, Only B, Both A and B, and INT are mutually exclusive and correspond to a factorial ANOVA

Table 3: Result of Large samples (n = 20)
For VC see the variance combinations. As for the proportions in the columns Model, Only A, Only B, Both A and B, INT pare based on 10,000 simulations for VC (1) and 30,000 for all other VC’s, since these combine all three subcombinations for combinations (2), (3a), (3b), and (4). VC (1) exploring robustness, while VC (2) explores power in the presence of an A main effect, VC (3a) and (3b) explore power in the presence of interaction, VC (4) investigating power in the presence of A and B main effects. The column Model corresponds to the overall test of homogeneity of variance. Categories Only A, Only B, Both A and B, and INT are mutually exclusive and correspond to a factorial ANOVA

For skewed parent distribution like chi-square Klotz-F and Klotz-Chi tests are not robust and strongly recommended that these two tests shold not be used for skewed parent distributions. For symmetric parent distributions performance of Klotz-F and Klotz-Chi is quite well. Both Klotz-F and Klotz-Chi tests show the most power. They are also robust. Klotz-F and Klotz-Chi tests can strongly be recommended when applied to samples from symmetric parent distributions.

NUMERICAL EXAMPLE

An example of a greenhouse experiment is used from Ozaydin et al. (1999). The treatment structure for experiment is a 2 by 2 factorial. The application of the seven test procedures to a two-way design is studied by the data. The first factor was chile (absent, present) and the other factor was source of nematodes (nematode from chile or from tomato) (Table 4). Germination of the yellow nutsedge tubers produced during the experiment was the response variable.

Regardless of nematode source, variance in yellow nutsedge tuber germination was higher when chile was present (Table 5). To test for the presence and form of heterogeneity of variance for two-way structures all seven of the analysis for two-way structures were used (Table 6). The conclusion of all the analyses was: Absence or presence of chile affects the variance of nutsedge tuber germination (means statistically significant chile main effect).

The variance of the tuber germination with respect to presence and absence of chile was pooled since the data suggested that there are two distinct variances, one when chile is present and one when it is absent. A two sample t-test comparing the means for the treatments (chile absent, nematode from chile) and (chile absent, nematode from tomato) was performed. The result was there is no significant difference in the mean tuber germination (p = 0.4609). Another two sample t-test comparing the means for the treatments (chile present, nematode from chile) and (chile present, nematode from tomato) was performed. The result was there is no significant difference in the mean tuber germination (p = 0.7856). Therefore the means within chile levels was also pooled (Table 7).

Although, in practice the most appropriate test should be chosen and applied, for purposes of example, all seven tests were applied to the data.

Table 4: Nutsedge tuber germination by chile and nematode sourcea
aData from (Ozaydin et al., 1999)

Table 5: Means and variances of nutsedge tuber germination by chile and nematode sourcea
aData from Ozaydin et al. (1999)

Table 6: Summary of analyses of the effect of chile nematode (N) source on nutsedge tuber germination variability

Table 7: Means and variances of nutsedge tuber germination by Chilea
aData from (Ozaydin et al. (1999)

CONCLUSION

Although performance of Klotz’s tests for symmetric distributions were quite well according to power and robustness in a one-way design, Klotz’s test statistics was not studied in a two-way design. This is the reason why in this study the performance of the Klotz’s test statistics was studied and compared with the other five tests. Also, a simulation study was performed for 10000 times instead 1000 times. So, the performance of other previously studied tests were restudied based on 10000 simulations. For future studies, other tests examined by (Conover et al., 1981) for one-way design could be tested and compared for more complex designs to evaluate their performance.

REFERENCES
Anderson, M.J., 2006. Distance-based tests for homogeneity of multivariate dispersions. Biometrics, 62: 245-253.
CrossRef  |  Direct Link  |  

Conover, W.J. and R.L. Iman, 1981. Rank transformations as a bridge between parametric and nonparametric statistics. Am. Statistician, 35: 124-129.
CrossRef  |  Direct Link  |  

Conover, W.J., 1980. Practical Nonparametric Statistics. 2nd Edn., John Wiley and Sons, New York, ISBN-10: 0471028673.

Conover, W.J., M.E. Johnson and M.M. Johnson, 1981. A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics, 23: 351-361.
CrossRef  |  Direct Link  |  

Fligner, M.A. and T.J. Killeen, 1976. Distribution-free two sample tests for scale. J. Am. Statistical Assoc., 71: 210-213.
CrossRef  |  Direct Link  |  

Klotz, J., 1962. Nonparametric tests for scale. Ann. Math. Stat., 33: 498-512.
CrossRef  |  Direct Link  |  

McGrath, R. and D. Lin, 2002. A nonparametric dispersion test for unreplicated two-level fractional factorial designs. J. Nonparamet. Statist., 14: 699-714.
CrossRef  |  Direct Link  |  

Miller, R.G., 1968. Jackknifing variances. Ann. Math. Stat., 39: 567-582.
CrossRef  |  Direct Link  |  

Milliken, G.A. and D.E. Johnson, 1984. Analysis of Messy Data. 1st Edn., Chapman and Hall, New York, ISBN-0: 158488083X.

Ozaydin, F., D.M. VanLeeuwen, C.S. Miller and J. Schroeder, 1999. Factor effects on the variance in a replicated two-way treatment structure in an agricultural system. J. Agric. Biol. Environ. Stat., 4: 166-184.
CrossRef  |  Direct Link  |  

©  2019 Science Alert. All Rights Reserved