INTRODUCTION
Most frequently, statistical analysis were performed to detect the differences
in the location or means of the treatment groups, but in some cases, differences
in dispersion may be of as much interest. Distancebased tests were proposed
by Anderson (2006) to test homogeneity of mulivariate
dispesions. For unreplicated twolevel fractional factorial designs McGrath
and Lin (2002) developed a nonparametric dispersion tests. Ozaydin
et al. (1999) studied factor effects on the variance in studies of
an agricultural system involving chile peppers and two chilepepper pestsrootknot
nematodes and yellow nutsedge. Since there exist factor effects on the variance,
analysis to detect factor effects on differences in means failed. Hence, this
system required methods for examining the structure of effects on the variance.
Although, there are many other situations to understand factor effects to the
variance, pesticide application is one example to understand factor effects
on the variance. Higher overall amounts to be applied for the methods with greater
variability, so that; each area will receive the minimum amount of pesticide
to be effective.
Some researchers performed testing heterogeneity of variance for oneway design.
Some of these researchers are: Milliken and Johnson (1984)
and Miller (1968) and so on. Among them
Conover et al. (1981) performed 56 tests in a comprehensive simulation
study of tests for heterogeneity of variance in the oneway model.
The Levene and the jackknife tests to detect heterogeneity of variance as a
factor effect in a replicated twoway treatment structure were studied and then
generalization of Fligner and Killeen (1976)’s
test was considered by Ozaydin et al. (1999).
While, many tests for scale exist for the completely randomized design, fewer
tests have been considered for more complex designs. In particular nonparametric
rank based tests for scale have not been applied to more complex designs. In
addition to Fligner and Killeen’s nonparametric rankbased test, which
was studied by Ozaydin et al. (1999), it is of
interest to consider the generalization of one such test, Klotz
(1962)’s nonparametric rankbased test, to a twoway treatment structure.
For oneway design according to power and robustness, performance of Klotz’s
test for symmetric parent distributions was quite well. It was also recommended
for symmetric parent distributions by Conover et al.
(1981).
As it mentioned on Ozaydin et al. (1999) study;
computation of pseudoobservations reflects changes in the magnitude of the
variance. Then either the usual ANOVA or an analogous chisquare analysis was
used to analyze these pseudoobservations. A review of the tests was studied
by Ozaydin et al. (1999).
Simulation studies were used to examine the power and the robustness of the
generalized tests for the special case of a 2 by 2 treatment structure with
balanced data. Performance of Klotz’s test for the balanced twoway factorial
treatment sructure is compared by other previously studied tests. These simulation
studies were performed for 10000 data sets.
The simulation using SAS/STAT software was performed. Uniform (shorttailed), normal, double exponential (longtailed) and chisquare (skewed) parent distributions were used for the simulation study. Samples of size 5, 10 and 20 observations were used for the different variance combinations.
THE MODEL AND HYPOTHESES
For the fixed twoway design with equal replication the cell mean model is:
where, i = ..., t_{A} > 1, j = 1..., t_{B} > 1,k = 1...,
n > 1. In the model, y_{ijk} is the k’th response from the i’th
level of factor A and the j’th level of factor B. The μ_{ij}
are unknown constants representing the average response due to the i’th
level of factor A and the j’th level of factor B. Then, the ε_{ijk}
are the random error associated with the observation y_{ijk} and are
assumed to be independently distributed with zero mean and variance .
To investigate the behavior of the following test statistics under less restrictive distributional assumptions, the assumption that the errors are assumed to be normal random variables is suspended.
For my studies, I use the special case t_{A} = 2, t_{B} = 2,
the null hypothesis is ,
(all variances are equal).
At least one variance does not equal to others’ is the alternative hypothesis.
Instead of using usual oneway ANOVA analysis to test equality of variance,
following the Ozaydin et al. (1999), I perform
an analysis that is analogous to the usual ANOVA for a twoway replicated treatment
structure. The following null hypothesis are used to find the differences in
variances and, if they differ, the form of the difference:
• 
(interaction between factor A and factor B does not exist) 
• 
(there is not a factor A main effect on the variance) 
• 
(there is not a factor B main effect on the variance) 
If there exists an interaction, main effects are not considered further. The
presence of interaction suggests that a simple additive main effects structure
does not exist among the factor effects on the variance so that a reduction
in summarizing the variances is not possible.
THE TEST PROCEDURES
Levene’s test and the jackknife test are classified as tests based on
modifications of the Ftest for means and Klotz’s test and Fligner and
Killeen’s test are classified as a linear rank tests (or a ranklike tests)
(Conover et al., 1981). For these tests, pseudoobservations
that in some way reflect changes in the magnitude of the variance are computed.
Then, the pseudoobservations are analyzed.
In general, the pseudoobservations are some function of the absolute deviations from either the mean or the median. For the (1) continuous pseudoobservations, the analysis will be the twoway ANOVA with Ftests. For the (2) rankbased pseudoobservations, an analogous analysis using chisquare test statistics is more appropriate. The tests considered in this study, such as Levene’s test, the jacknife test, Klotz’s test and Fligner and Killeen’s test for the oneway model are among the tests that follow this basic test rationale.
The following seven tests are considered and compared for balanced data using
the pseudoobservations in this study (Ozaydin et al.,
1999) considered the first five of these tests and then compared all these
five tests as having the best performance on the basis of robustness and power.
The last two of the seven tests are generalizations of Klotz’s test.
(1) 
For (Lev 1) the square of the residuals was computed, then
ANOVA was performed on these pseudoobservations: 
where,
is the sample mean for the ij’th group (Ozaydin et
al., 1999).
(2) 
For a modified Levene’s test (Lev 2), the absolute deviations
from the sample medians
was computed as follows, then ANOVA was performed on these pseudoobservations
(Ozaydin et al., 1999). 
(3) 
For the jackknife test, (Ozaydin et
al., 1999) performed ANOVA Ftests on the following pseudoobservations
(jack): 
where,
is the usual sample variance for the ijth group.
was calculated and then:
(4) 
The first generalization of Fligner and Killeen’s test
applies an Ftest analysis to rankbased pseudoobservations. Following
(Conover et al., 1981; Ozaydin
et al., 1999) used absolute deviations from the median: 
The Z_{ijk}’s were then ranked from smallest to largest (i.e.,
from 1 to N = nt_{A}t_{B}). The ranks are denoted by R_{ijk}.
Following Conover (1980), if there were ties in my sample,
I calculated the average of the ranks of the tied values and used the average
as the R_{ijk} value for the tied Z_{ijk}’s.
Then to modify the ranks, a score function was used. The score function (denoted
FK) was used by Fligner and Killeen (1976):
where, Φ is the standard normal cumulative distribution function. Following
(Ozaydin et al., 1999), I used Fligner and Killeen’s
original score function and denote the values a(R_{ijk}) by a_{ijk}.
Conover et al. (1981) used the positive square
root of this score function. Ozaydin et al. (1999)
used Fligner and Killeen’s original score function since it has better
power and is more robust than using the positive square root of this score function.
The ANOVA Ftest analysis was performed to the scores a_{ijk}. The procedure
obtained by applying Ftests to the pseudoobservations FK is denoted as FKF,
which is same as by Ozaydin et al. (1999).
(5) 
The second generalization of Fligner and Killeen’s test
performs a chisquare analysis on the pseudoobservations FK. For the overall
test of equality of variances, the oneway test statistic presented by Conover
et al. (1981) was used: 
where,
where,
is the mean score in the ij’th sample and
is the overall mean score. Under the null hypothesis of equal variances, D^{2}
is the known variance of the scores. Ozaydin et al.
(1999) partitioned X into three pieces, corresponding to each of the three
hypotheses mentioned earlier as:
where,
is the mean score in the ith sample
where,
is the mean score in the jth sample
Under the null of no differences, each of the above test statistics is approximately distributed as a chisquare random variable with degrees of freedom corresponding to the numerator sum of squares:
where, I denote the procedure obtained by applying the chisquare tests to
the pseudoobservations FK by FKChi.
(6) 
The first generalization of Klotz’s test applies an Ftest
analysis to rankbased pseudoobservations. Following the same procedures
as in (4), values of R_{ijk} were calculated 
A score function was then used to modify the ranks. Klotz
(1962) used the score function (denoted Klotzs)
where,Φ is the standard normal cumulative distribution function. The values
a(R_{ijk}) are denoted by a_{ijk}. The ANOVA Ftest analysis
was applied to the scores a_{ijk}. The procedure obtained by applying
Ftests to the pseudoobservations Klotzs is denoted as KlotzF.
(7) 
A second variant of Klotz’s test is a generalization
of the usual rankbased nonparametric approach; i.e., a chisquare analysis
was performed on the pseudoobservations Klotzs. Following the same steps
explained as in (5), the chisquare tests were performed. The procedure
obtained by applying the chisquare tests to the pseudoobservations Klotzs
is denoted by KlotzChi 
The last four tests are variations of Fligner and Killeen’s and Klotz’s
nonparametric rankbased approach. Chisquare tests are more appropriate and
have traditionally been used for rankbased tests. Following Conover
and Iman (1981), rank transformation procedures applying the usual ANOVA
Ftests to the ranks are easier to obtain. The fourth and sixth tests use this
notion but apply the usual ANOVA Ftests to scores based on ranks. The fifth
and seventh tests use a traditional chisquare analysis of the scores.
All of the above tests use pseudoobservations in different way. For example, pseudoobservations which are used in Lev1 are on the scale of variance, pseudoobservations which are used in Lev2 are on the scale of standard deviation, in Jack are on a log transformed scale. The FK and Klotz pseudoobservations are based on ranks.
RESULTS
It was shown by Conover et al. (1981) study
that performance of Klotz’s test for symmetric parent distributions was
quite well according to power and robustness for oneway design. Therefore,
in this study performance of Klotz’s test statistics was studied for twoway
design. Lev1 and Lev2 are modifications of the Levene’s test. ANOVA performed
on any monotonically increasing function of the absolute value of deviations
from the sample means. To obtain more robust tests Levene’s idea can be
applied to deviations from the median (Miller, 1968).
The third test is the jackniffe test. The forth and fifth tests are variations
of Fligner and Killeen’s and the sixth and seventh tests are variations
of Klotz’s nonparametric rankbased approach. For rankbased tests, chisquare
tests are more appropriate then ANOVA. In this study, usual ANOVA Ftests applied
to the ranks were also studied since it is easier to obtain (Conover and Iman,
1981). The fifth (FKchi) and seventh Klotzchi tests use traditional chisquare
analysis of the scores. The forth (FK) and sixth (Klotz) tests use ANOVA Ftests
to scores based on the ranks.
For the special case of a 2 by 2 treatment structure, simulations using SAS/STAT software and procedures compared the power and robustness of the seven proposed test procedures. Variance configurations, a range of parent distributions and sample sizes were indicated.
Uniform (shorttailed), normal, double exponential (longtailed) and Chisquare
(skewed) parent distributions were used for the simulation study. When sampling
from a chisquare distribution to maintain the level of skewness, 1 d.f. chisquare
random variables were generated and multiplied by the appropriate constant to
attain the indicated variability. Samples of size 5, 10 and 20 observations
were used for each of the following variance combinations indicated by ():
(1) 
(1,1,1,1) 
(2) 
(1,1,2,2), (1,1,4,4), (1,1,8,8) 
(3a) 
(1,2,2,1), (1,4,4,1), (1,8,8,1) 
(3b) 
(1,1,1,2), (1,1,1,4), (1,1,1,8) 
(4) 
(1,2,2,3), (1,4,4,7), (1,8,8,15) 
Variance combination (1) was used to approximate the true type I error rate. (i.e., the observed significance level). The set of variance combinations (2) was used to investigate the power of the tests when only a factor main effect is present. The set of variance combinations (3a) was used to investigate the power of the tests when an interaction effect is present. The set of variance combinations (3b) was used to investigate the power of the tests when an interaction effect and both main effects are present. The set of variance combinations (4) was used to investigate the power of the tests when both A and B main effects exist.
The simulation results are shown in the Table 13.
The simulation results of first five tests are similar to (Ozaydin
et al., 1999) results. It is observed that as sample size increased
and as the difference in variances increased all tests have higher power.
All of the tests are robust with the exception of Jack, Lev 1, KlotzF and
KlotzChi. Jack, KlotzF and KlotzChi are never robust when applied to samples
from a chisquare parent distribution. Lev 1 is not robust when used with small
samples (n = 5) from a chisquare parent distribution. Lev 2, FKF and FKChi
suffer from being too conservative and also lack power when applied to small
samples (n = 5).
Table 1: 
Results of small samples (n = 5) 

For VC see the variance combinations. As for the proportions
in the columns Model, Only A, Only B, Both A and B, INT are based on 10,000
simulations for VC (1) and 30,000 for all other VC’s, since these combine
all three subcombinations for combinations (2), (3a), (3b), and (4). VC
(1) exploring robustness, while VC (2) explores power in the presence of
an A main effect, VC (3a) and (3b) explore power in the presence of interaction,
VC (4) investigating power in the presence of A and B main effects. The
column Model corresponds to the overall test of homogeneity of variance.
Categories Only A, Only B, Both A and B, and INT are mutually exclusive
and correspond to a factorial ANOVA 
Table 2: 
Resulsts of moderate samples (n = 10). 

For VC see the variance combinations. As for the proportions
in the columns Model, Only A, Only B, Both A and B, INT are based on 10,000
simulations for VC (1) and 30,000 for all other VC’s, since these combine
all three subcombinations for combinations (2), (3a), (3b), and (4). VC
(1) exploring robustness, while VC (2) explores power in the presence of
an A main effect, VC (3a) and (3b) explore power in the presence of interaction,
VC (4) investigating power in the presence of A and B main effects. The
column Model corresponds to the overall test of homogeneity of variance.
Categories Only A, Only B, Both A and B, and INT are mutually exclusive
and correspond to a factorial ANOVA 
Table 3: 
Result of Large samples (n = 20) 

For VC see the variance combinations. As for the proportions
in the columns Model, Only A, Only B, Both A and B, INT pare based on 10,000
simulations for VC (1) and 30,000 for all other VC’s, since these combine
all three subcombinations for combinations (2), (3a), (3b), and (4). VC
(1) exploring robustness, while VC (2) explores power in the presence of
an A main effect, VC (3a) and (3b) explore power in the presence of interaction,
VC (4) investigating power in the presence of A and B main effects. The
column Model corresponds to the overall test of homogeneity of variance.
Categories Only A, Only B, Both A and B, and INT are mutually exclusive
and correspond to a factorial ANOVA 
For skewed parent distribution like chisquare KlotzF and KlotzChi tests are not robust and strongly recommended that these two tests shold not be used for skewed parent distributions. For symmetric parent distributions performance of KlotzF and KlotzChi is quite well. Both KlotzF and KlotzChi tests show the most power. They are also robust. KlotzF and KlotzChi tests can strongly be recommended when applied to samples from symmetric parent distributions.
NUMERICAL EXAMPLE
An example of a greenhouse experiment is used from Ozaydin
et al. (1999). The treatment structure for experiment is a 2 by 2
factorial. The application of the seven test procedures to a twoway design
is studied by the data. The first factor was chile (absent, present) and the
other factor was source of nematodes (nematode from chile or from tomato) (Table
4). Germination of the yellow nutsedge tubers produced during the experiment
was the response variable.
Regardless of nematode source, variance in yellow nutsedge tuber germination
was higher when chile was present (Table 5). To test for the
presence and form of heterogeneity of variance for twoway structures all seven
of the analysis for twoway structures were used (Table 6).
The conclusion of all the analyses was: Absence or presence of chile affects
the variance of nutsedge tuber germination (means statistically significant
chile main effect).
The variance of the tuber germination with respect to presence and absence of chile was pooled since the data suggested that there are two distinct variances, one when chile is present and one when it is absent. A two sample ttest comparing the means for the treatments (chile absent, nematode from chile) and (chile absent, nematode from tomato) was performed. The result was there is no significant difference in the mean tuber germination (p = 0.4609). Another two sample ttest comparing the means for the treatments (chile present, nematode from chile) and (chile present, nematode from tomato) was performed. The result was there is no significant difference in the mean tuber germination (p = 0.7856). Therefore the means within chile levels was also pooled (Table 7).
Although, in practice the most appropriate test should be chosen and applied, for purposes of example, all seven tests were applied to the data.
Table 4: 
Nutsedge tuber germination by chile and nematode source^{a} 

^{a}Data from (Ozaydin et al.,
1999) 
Table 5: 
Means and variances of nutsedge tuber germination by chile
and nematode source^{a} 

^{a}Data from Ozaydin et al.
(1999) 
Table 6: 
Summary of analyses of the effect of chile nematode (N) source
on nutsedge tuber germination variability 

Table 7: 
Means and variances of nutsedge tuber germination by Chile^{a} 

^{a}Data from (Ozaydin et al.
(1999) 
CONCLUSION
Although performance of Klotz’s tests for symmetric distributions were
quite well according to power and robustness in a oneway design, Klotz’s
test statistics was not studied in a twoway design. This is the reason why
in this study the performance of the Klotz’s test statistics was studied
and compared with the other five tests. Also, a simulation study was performed
for 10000 times instead 1000 times. So, the performance of other previously
studied tests were restudied based on 10000 simulations. For future studies,
other tests examined by (Conover et al., 1981)
for oneway design could be tested and compared for more complex designs to
evaluate their performance.