Many studies have shown that the F test for equal means among k (k≥2) independent groups is not robust to violations of the usual equal variance and normality assumptions. One strategy for the comparison of means from groups with heterogeneous variances is to use an alternative test. Among the most frequently cited parametric alternatives to ANOVA are procedures proposed by Welch, James, Marascuilo, Brown and Forsythe, Wilcox and Alexander and Govern. These tests have been the subjects of a number of simulation studies and reviews in which their performances have been compared with each other and with ANOVA[4,1,3,7,8,9,10,21,5,22,6,11,12]. Several other alternatives to ANOVA F test were proposed such as Cohran test, Brown-Forsythe test, Modified Brown-Forsythe test and the Approximate ANOVA F test[13,4,14,15].
The main purpose of this study is to compare some of the alternative tests of ANOVA for comparing Type I error rates.
Description of the procedures
Let Xik be the ith observation in the kth group; where I=
K and let Σnk=N. The Xik are assumed
to be independent and normally distributed with expected values μk
and variance σ2k . The best linear unbiased estimates
of μk and σ2k are:
It is known that the test statistics for ANOVA F test is;
, when population variances are equal. F is distributed as a central F variable
with (K-1) and (N-K) degrees of freedom.
Brown-Forsythe Test (BF):
The is approximately distributed as an F variable with (K-1) and f degrees
of freedom. The F is obtained with the Sattertwaite approximation
Modified Brown-Forsythe Test (MBF): Mehrota Modified the original Brown-Forsythe test as follows
The Brown-Forsythe test used K-1 numerator degrees of freedom while Mehrota used a Box approximation to obtain the numerator degrees of freedom, v1, where
and the denominator degrees of freedom;
Under the null hypothesis, is distributed approximately as an F variable with v1 and v degrees of freedom[15,18].
Cochran test (COC):
The test statistic is
and under the null hypothesis, COC
statistic is distributed asymptotically as a central χ2-variable with K-1 degrees of freedom.
Approximate ANOVA F test (AAF): The test statistic of this test is the same as the Brown-Forsythe and the Modified Brown-Forsythe test; AAF= =FBF. Under the null hypothesis, the test statistic AAF is distributed approximately as an F variable with v1 and v2 degrees of freedom, where
The numerator degrees of freedom for AAF and are equal. The difference between the two alternatives is in the denominator degrees of freedom in the unbalanced case.
MATERIALS AND METHODS
A simulation study was conducted to compare the performance of the Cochran, original Brown-Forsythe, Modified Brown-Forsythe and approximate ANOVA F tests. Type I error rates were used to measure the testing equality of several independent group means. Data from Normal (0.1), t (5), Chi-square (3) and exponential (0.75) distributions for each given set of parameter values were used the subroutines RNNOA, RNSTT, RNCHI and RNEXP, available with the IMSL library functions, generated samples from these distributions[19,20].
The parameter values were taken for k=3 and 6 groups. For the three group simulation,
the parameters were (n1=n2=n3=3, 6, 9, 15,
24, 30) and =1:1:1, 1:2:4 and 1:1:8. For the six group simulation, the parameters
6, 9, 15, 24, 30) and =1:1:1:1:1:1,
1:1:1:2:2:2 and 1:1:2:2:4:4. To investigate the relationship between sample
sizes and group variances, sample sizes were chosen as (3:4:5, 5:10:15, 10:20:30,
5:4:3, 15:10:5, 30:20:10) for k=3 and were chosen as (3:4:5:6:7:8, 5:8:11:14:17:20,
5:10:15:20:25:30, 8:7:6:5:4:3, 20:17:14:11:8:5 and 30:25:20:15:10:5) for k=6.
The populations were standardized because they have different means and variances.
Shape of the distributions was not changed while the means were changed to 0
and the standard deviations were changed to 1. For each pair of samples, Cochran
test (COC test), original Brown-Forsythe test (BF test), modified Brown-Forsythe
(MBF test) and the approximate ANOVA F test (AAF test) statistics were calculated
and a check was made to see if the hypothesis, which is true the was rejected
at α=0.05. The experiment was repeated 50,000 times and the proportion
of observations falling in the critical regions was recorded for different α,
k, n, variance patterns and distributions. This proportion estimation is the
Type I error rate if the means from the populations do not differ (μ1=μ2).
The predetermined alpha level was 0.05 in all simulations. Both homogeneous
and heterogeneous variance conditions were considered. To form heterogeneity
among the population variances, random numbers in the samples were multiplied
by specific constant numbers .
A FORTRAN program was written for Intel Pentium III processor to compute all
Three groups: Empirical Type I error results of 50 000 simulation runs
are given in (Table 1, 2). When k=3, the
distribution was normal and sample sizes were equal, BF test, MBF test and AAF
test had similar and acceptable Type I error rates.
|| Empirical Type I error probabilities (α) (%) when k=3
MBF and AAF tests gave the same results under all distributions and variance
ratios when the data was balanced. On the other hand, when the data was unbalanced,
results from these two tests were slightly different. These results are consistent
with Hartung and Argac.
The other tests attained similar and acceptable alpha levels also. When distributions deviated from normal, data was balanced and variances were homogenous, Type I error rates for BF test, MBF test and AAF tests were smaller than 5.0%, while Type I error rate for COC test was greater than that. It may be suggested that one of these tests (BF, MBF and AAF) can be used to compare equality of means.
Under the conditions of moderate heterogeneity of variances (1:2:4), balanced
data and normal distribution, COC test was able to approach the predetermined
alpha level only when the sample size was greater than 24 (Table
1). Other tests, especially the BF test was able to preserve the alpha level
except for sample sizes 3:3:3. MBF and AAF tests resulted in acceptable alpha
levels also except for the sample sizes 3:3:3. Under the same variance conditions
and using balanced data, COC test deviated greatly from the predetermined alpha
level when distributions were chi-square (3) and exp. (0.75), even for sample
sizes as large as 30 (Table 1). Under these conditions, the
BF, MBF and AAF tests may be used to compare equality of means.
Under the conditions of moderate heterogeneity of variances (1:2:4), unbalanced data (direct pairing) and normal distribution, MBF and AAF tests were closest to the 0.05 alpha level. When inverse pairing was applied, all the tests deviated greatly from the predetermined alpha level, especially for the COC test. This was more pronounced when variances were largely heterogeneous (1:1:8).
Six groups: When there were six groups, variances were homogenous and data was balanced, Type I rates were similar to those when there were three groups for all tests and all distributions except for the small sample sizes (Table 2).
||Empirical Type I error probabilities (α) (%) when k=6
Under small deviations from homogeneity of variances (1:1:1:2:2:2:2), balanced data and normal distribution, most reliable results were obtained from BF, MBF and AAF tests. As the distribution deviated from normal, Type I error rates obtained from the BF test were closer to the predetermined alpha level compared to the other tests. All the tests were affected from small sample sizes (Table 2).
Simulation Results: Under the same conditions of variance homogeneity, all tests were affected adversely from the unbalanced data structure. This adverse affect increased as the inequality of sample sizes increased for all tests. As in the three groups, inverse pairing increased the deviation from the alpha level 0.05.
When the heterogeneity of variances increased (1:1:2:2:4:4), most reliable results were obtained from the BF test (Table 2).