INTRODUCTION
Most frequently, statistical analysis were performed to detect the differences
in the location or means of the treatment groups, but in some cases, differences
in dispersion may be of as much interest. Distance-based tests were proposed
by Anderson (2006) to test homogeneity of mulivariate
dispesions. For unreplicated two-level fractional factorial designs McGrath
and Lin (2002) developed a nonparametric dispersion tests. Ozaydin
et al. (1999) studied factor effects on the variance in studies of
an agricultural system involving chile peppers and two chile-pepper pests-root-knot
nematodes and yellow nutsedge. Since there exist factor effects on the variance,
analysis to detect factor effects on differences in means failed. Hence, this
system required methods for examining the structure of effects on the variance.
Although, there are many other situations to understand factor effects to the
variance, pesticide application is one example to understand factor effects
on the variance. Higher overall amounts to be applied for the methods with greater
variability, so that; each area will receive the minimum amount of pesticide
to be effective.
Some researchers performed testing heterogeneity of variance for one-way design.
Some of these researchers are: Milliken and Johnson (1984)
and Miller (1968) and so on. Among them
Conover et al. (1981) performed 56 tests in a comprehensive simulation
study of tests for heterogeneity of variance in the one-way model.
The Levene and the jackknife tests to detect heterogeneity of variance as a
factor effect in a replicated two-way treatment structure were studied and then
generalization of Fligner and Killeen (1976)s
test was considered by Ozaydin et al. (1999).
While, many tests for scale exist for the completely randomized design, fewer
tests have been considered for more complex designs. In particular nonparametric
rank based tests for scale have not been applied to more complex designs. In
addition to Fligner and Killeens nonparametric rank-based test, which
was studied by Ozaydin et al. (1999), it is of
interest to consider the generalization of one such test, Klotz
(1962)s nonparametric rank-based test, to a two-way treatment structure.
For one-way design according to power and robustness, performance of Klotzs
test for symmetric parent distributions was quite well. It was also recommended
for symmetric parent distributions by Conover et al.
(1981).
As it mentioned on Ozaydin et al. (1999) study;
computation of pseudo-observations reflects changes in the magnitude of the
variance. Then either the usual ANOVA or an analogous chi-square analysis was
used to analyze these pseudo-observations. A review of the tests was studied
by Ozaydin et al. (1999).
Simulation studies were used to examine the power and the robustness of the
generalized tests for the special case of a 2 by 2 treatment structure with
balanced data. Performance of Klotzs test for the balanced two-way factorial
treatment sructure is compared by other previously studied tests. These simulation
studies were performed for 10000 data sets.
The simulation using SAS/STAT software was performed. Uniform (short-tailed), normal, double exponential (long-tailed) and chi-square (skewed) parent distributions were used for the simulation study. Samples of size 5, 10 and 20 observations were used for the different variance combinations.
THE MODEL AND HYPOTHESES
For the fixed two-way design with equal replication the cell mean model is:
where, i = ..., tA > 1, j = 1..., tB > 1,k = 1...,
n > 1. In the model, yijk is the kth response from the ith
level of factor A and the jth level of factor B. The μij
are unknown constants representing the average response due to the ith
level of factor A and the jth level of factor B. Then, the εijk
are the random error associated with the observation yijk and are
assumed to be independently distributed with zero mean and variance
.
To investigate the behavior of the following test statistics under less restrictive distributional assumptions, the assumption that the errors are assumed to be normal random variables is suspended.
For my studies, I use the special case tA = 2, tB = 2,
the null hypothesis is
,
(all variances are equal).
At least one variance does not equal to others is the alternative hypothesis.
Instead of using usual one-way ANOVA analysis to test equality of variance,
following the Ozaydin et al. (1999), I perform
an analysis that is analogous to the usual ANOVA for a two-way replicated treatment
structure. The following null hypothesis are used to find the differences in
variances and, if they differ, the form of the difference:
• |
(interaction between factor A and factor B does not exist) |
• |
(there is not a factor A main effect on the variance) |
• |
(there is not a factor B main effect on the variance) |
If there exists an interaction, main effects are not considered further. The
presence of interaction suggests that a simple additive main effects structure
does not exist among the factor effects on the variance so that a reduction
in summarizing the variances is not possible.
THE TEST PROCEDURES
Levenes test and the jackknife test are classified as tests based on
modifications of the F-test for means and Klotzs test and Fligner and
Killeens test are classified as a linear rank tests (or a rank-like tests)
(Conover et al., 1981). For these tests, pseudo-observations
that in some way reflect changes in the magnitude of the variance are computed.
Then, the pseudo-observations are analyzed.
In general, the pseudo-observations are some function of the absolute deviations from either the mean or the median. For the (1) continuous pseudo-observations, the analysis will be the two-way ANOVA with F-tests. For the (2) rank-based pseudo-observations, an analogous analysis using chi-square test statistics is more appropriate. The tests considered in this study, such as Levenes test, the jacknife test, Klotzs test and Fligner and Killeens test for the one-way model are among the tests that follow this basic test rationale.
The following seven tests are considered and compared for balanced data using
the pseudo-observations in this study (Ozaydin et al.,
1999) considered the first five of these tests and then compared all these
five tests as having the best performance on the basis of robustness and power.
The last two of the seven tests are generalizations of Klotzs test.
(1) |
For (Lev 1) the square of the residuals was computed, then
ANOVA was performed on these pseudo-observations: |
where,
is the sample mean for the ijth group (Ozaydin et
al., 1999).
(2) |
For a modified Levenes test (Lev 2), the absolute deviations
from the sample medians
was computed as follows, then ANOVA was performed on these pseudo-observations
(Ozaydin et al., 1999). |
(3) |
For the jackknife test, (Ozaydin et
al., 1999) performed ANOVA F-tests on the following pseudo-observations
(jack): |
where,
is the usual sample variance for the ijth group.
was calculated and then:
(4) |
The first generalization of Fligner and Killeens test
applies an F-test analysis to rank-based pseudo-observations. Following
(Conover et al., 1981; Ozaydin
et al., 1999) used absolute deviations from the median: |
The Zijks were then ranked from smallest to largest (i.e.,
from 1 to N = ntAtB). The ranks are denoted by Rijk.
Following Conover (1980), if there were ties in my sample,
I calculated the average of the ranks of the tied values and used the average
as the Rijk value for the tied Zijks.
Then to modify the ranks, a score function was used. The score function (denoted
FK) was used by Fligner and Killeen (1976):
where, Φ is the standard normal cumulative distribution function. Following
(Ozaydin et al., 1999), I used Fligner and Killeens
original score function and denote the values a(Rijk) by aijk.
Conover et al. (1981) used the positive square
root of this score function. Ozaydin et al. (1999)
used Fligner and Killeens original score function since it has better
power and is more robust than using the positive square root of this score function.
The ANOVA F-test analysis was performed to the scores aijk. The procedure
obtained by applying F-tests to the pseudo-observations FK is denoted as FK-F,
which is same as by Ozaydin et al. (1999).
(5) |
The second generalization of Fligner and Killeens test
performs a chi-square analysis on the pseudo-observations FK. For the overall
test of equality of variances, the one-way test statistic presented by Conover
et al. (1981) was used: |
where,
where,
is the mean score in the ijth sample and
is the overall mean score. Under the null hypothesis of equal variances, D2
is the known variance of the scores. Ozaydin et al.
(1999) partitioned X into three pieces, corresponding to each of the three
hypotheses mentioned earlier as:
where,
is the mean score in the ith sample
where,
is the mean score in the jth sample
Under the null of no differences, each of the above test statistics is approximately distributed as a chi-square random variable with degrees of freedom corresponding to the numerator sum of squares:
where, I denote the procedure obtained by applying the chi-square tests to
the pseudo-observations FK by FK-Chi.
(6) |
The first generalization of Klotzs test applies an F-test
analysis to rank-based pseudo-observations. Following the same procedures
as in (4), values of Rijk were calculated |
A score function was then used to modify the ranks. Klotz
(1962) used the score function (denoted Klotzs)
where,Φ is the standard normal cumulative distribution function. The values
a(Rijk) are denoted by aijk. The ANOVA F-test analysis
was applied to the scores aijk. The procedure obtained by applying
F-tests to the pseudo-observations Klotzs is denoted as Klotz-F.
(7) |
A second variant of Klotzs test is a generalization
of the usual rank-based nonparametric approach; i.e., a chi-square analysis
was performed on the pseudo-observations Klotzs. Following the same steps
explained as in (5), the chi-square tests were performed. The procedure
obtained by applying the chi-square tests to the pseudo-observations Klotzs
is denoted by Klotz-Chi |
The last four tests are variations of Fligner and Killeens and Klotzs
nonparametric rank-based approach. Chi-square tests are more appropriate and
have traditionally been used for rank-based tests. Following Conover
and Iman (1981), rank transformation procedures applying the usual ANOVA
F-tests to the ranks are easier to obtain. The fourth and sixth tests use this
notion but apply the usual ANOVA F-tests to scores based on ranks. The fifth
and seventh tests use a traditional chi-square analysis of the scores.
All of the above tests use pseudo-observations in different way. For example, pseudo-observations which are used in Lev1 are on the scale of variance, pseudo-observations which are used in Lev2 are on the scale of standard deviation, in Jack are on a log transformed scale. The FK and Klotz pseudo-observations are based on ranks.
RESULTS
It was shown by Conover et al. (1981) study
that performance of Klotzs test for symmetric parent distributions was
quite well according to power and robustness for one-way design. Therefore,
in this study performance of Klotzs test statistics was studied for two-way
design. Lev1 and Lev2 are modifications of the Levenes test. ANOVA performed
on any monotonically increasing function of the absolute value of deviations
from the sample means. To obtain more robust tests Levenes idea can be
applied to deviations from the median (Miller, 1968).
The third test is the jackniffe test. The forth and fifth tests are variations
of Fligner and Killeens and the sixth and seventh tests are variations
of Klotzs nonparametric rank-based approach. For rank-based tests, chi-square
tests are more appropriate then ANOVA. In this study, usual ANOVA F-tests applied
to the ranks were also studied since it is easier to obtain (Conover and Iman,
1981). The fifth (FK-chi) and seventh Klotz-chi tests use traditional chi-square
analysis of the scores. The forth (FK) and sixth (Klotz) tests use ANOVA F-tests
to scores based on the ranks.
For the special case of a 2 by 2 treatment structure, simulations using SAS/STAT software and procedures compared the power and robustness of the seven proposed test procedures. Variance configurations, a range of parent distributions and sample sizes were indicated.
Uniform (short-tailed), normal, double exponential (long-tailed) and Chi-square
(skewed) parent distributions were used for the simulation study. When sampling
from a chi-square distribution to maintain the level of skewness, 1 d.f. chi-square
random variables were generated and multiplied by the appropriate constant to
attain the indicated variability. Samples of size 5, 10 and 20 observations
were used for each of the following variance combinations indicated by (
):
(1) |
(1,1,1,1) |
(2) |
(1,1,2,2), (1,1,4,4), (1,1,8,8) |
(3a) |
(1,2,2,1), (1,4,4,1), (1,8,8,1) |
(3b) |
(1,1,1,2), (1,1,1,4), (1,1,1,8) |
(4) |
(1,2,2,3), (1,4,4,7), (1,8,8,15) |
Variance combination (1) was used to approximate the true type I error rate. (i.e., the observed significance level). The set of variance combinations (2) was used to investigate the power of the tests when only a factor main effect is present. The set of variance combinations (3a) was used to investigate the power of the tests when an interaction effect is present. The set of variance combinations (3b) was used to investigate the power of the tests when an interaction effect and both main effects are present. The set of variance combinations (4) was used to investigate the power of the tests when both A and B main effects exist.
The simulation results are shown in the Table 1-3.
The simulation results of first five tests are similar to (Ozaydin
et al., 1999) results. It is observed that as sample size increased
and as the difference in variances increased all tests have higher power.
All of the tests are robust with the exception of Jack, Lev 1, Klotz-F and
Klotz-Chi. Jack, Klotz-F and Klotz-Chi are never robust when applied to samples
from a chi-square parent distribution. Lev 1 is not robust when used with small
samples (n = 5) from a chi-square parent distribution. Lev 2, FK-F and FK-Chi
suffer from being too conservative and also lack power when applied to small
samples (n = 5).
Table 1: |
Results of small samples (n = 5) |
 |
For VC see the variance combinations. As for the proportions
in the columns Model, Only A, Only B, Both A and B, INT are based on 10,000
simulations for VC (1) and 30,000 for all other VCs, since these combine
all three subcombinations for combinations (2), (3a), (3b), and (4). VC
(1) exploring robustness, while VC (2) explores power in the presence of
an A main effect, VC (3a) and (3b) explore power in the presence of interaction,
VC (4) investigating power in the presence of A and B main effects. The
column Model corresponds to the overall test of homogeneity of variance.
Categories Only A, Only B, Both A and B, and INT are mutually exclusive
and correspond to a factorial ANOVA |
Table 2: |
Resulsts of moderate samples (n = 10). |
 |
For VC see the variance combinations. As for the proportions
in the columns Model, Only A, Only B, Both A and B, INT are based on 10,000
simulations for VC (1) and 30,000 for all other VCs, since these combine
all three subcombinations for combinations (2), (3a), (3b), and (4). VC
(1) exploring robustness, while VC (2) explores power in the presence of
an A main effect, VC (3a) and (3b) explore power in the presence of interaction,
VC (4) investigating power in the presence of A and B main effects. The
column Model corresponds to the overall test of homogeneity of variance.
Categories Only A, Only B, Both A and B, and INT are mutually exclusive
and correspond to a factorial ANOVA |
Table 3: |
Result of Large samples (n = 20) |
 |
For VC see the variance combinations. As for the proportions
in the columns Model, Only A, Only B, Both A and B, INT pare based on 10,000
simulations for VC (1) and 30,000 for all other VCs, since these combine
all three subcombinations for combinations (2), (3a), (3b), and (4). VC
(1) exploring robustness, while VC (2) explores power in the presence of
an A main effect, VC (3a) and (3b) explore power in the presence of interaction,
VC (4) investigating power in the presence of A and B main effects. The
column Model corresponds to the overall test of homogeneity of variance.
Categories Only A, Only B, Both A and B, and INT are mutually exclusive
and correspond to a factorial ANOVA |
For skewed parent distribution like chi-square Klotz-F and Klotz-Chi tests are not robust and strongly recommended that these two tests shold not be used for skewed parent distributions. For symmetric parent distributions performance of Klotz-F and Klotz-Chi is quite well. Both Klotz-F and Klotz-Chi tests show the most power. They are also robust. Klotz-F and Klotz-Chi tests can strongly be recommended when applied to samples from symmetric parent distributions.
NUMERICAL EXAMPLE
An example of a greenhouse experiment is used from Ozaydin
et al. (1999). The treatment structure for experiment is a 2 by 2
factorial. The application of the seven test procedures to a two-way design
is studied by the data. The first factor was chile (absent, present) and the
other factor was source of nematodes (nematode from chile or from tomato) (Table
4). Germination of the yellow nutsedge tubers produced during the experiment
was the response variable.
Regardless of nematode source, variance in yellow nutsedge tuber germination
was higher when chile was present (Table 5). To test for the
presence and form of heterogeneity of variance for two-way structures all seven
of the analysis for two-way structures were used (Table 6).
The conclusion of all the analyses was: Absence or presence of chile affects
the variance of nutsedge tuber germination (means statistically significant
chile main effect).
The variance of the tuber germination with respect to presence and absence of chile was pooled since the data suggested that there are two distinct variances, one when chile is present and one when it is absent. A two sample t-test comparing the means for the treatments (chile absent, nematode from chile) and (chile absent, nematode from tomato) was performed. The result was there is no significant difference in the mean tuber germination (p = 0.4609). Another two sample t-test comparing the means for the treatments (chile present, nematode from chile) and (chile present, nematode from tomato) was performed. The result was there is no significant difference in the mean tuber germination (p = 0.7856). Therefore the means within chile levels was also pooled (Table 7).
Although, in practice the most appropriate test should be chosen and applied, for purposes of example, all seven tests were applied to the data.
Table 5: |
Means and variances of nutsedge tuber germination by chile
and nematode sourcea |
 |
aData from Ozaydin et al.
(1999) |
Table 6: |
Summary of analyses of the effect of chile nematode (N) source
on nutsedge tuber germination variability |
 |
CONCLUSION
Although performance of Klotzs tests for symmetric distributions were
quite well according to power and robustness in a one-way design, Klotzs
test statistics was not studied in a two-way design. This is the reason why
in this study the performance of the Klotzs test statistics was studied
and compared with the other five tests. Also, a simulation study was performed
for 10000 times instead 1000 times. So, the performance of other previously
studied tests were restudied based on 10000 simulations. For future studies,
other tests examined by (Conover et al., 1981)
for one-way design could be tested and compared for more complex designs to
evaluate their performance.