Abstract: The present Monte Carlo study compared Type I error rates of four tests under nonnormality and heterogeneity of variance assumptions. Cochran test, Brown–Forsythe test, modified Brown–Forsythe test and approximate ANOVA F test were evaluated for three–and six different groups. At the end of 50,000 simulation trials, the Type I error rates for four tests were affected by the sample size, variance ratio, the number of groups and the relationship between sample sizes and groups variances. Type I error rates of the Brown Forsythe test, extension of the modified Brown–Forsythe test and approximate ANOVA F–test were close to the predetermined alpha levels (0.05), while Cochran tests greatly deviated from the predetermined level.
INTRODUCTION
Many studies have shown that the F test for equal means among k (k≥2) independent groups is not robust to violations of the usual equal variance and normality assumptions. One strategy for the comparison of means from groups with heterogeneous variances is to use an alternative test. Among the most frequently cited parametric alternatives to ANOVA are procedures proposed by Welch[1], James[2], Marascuilo[3], Brown and Forsythe[4], Wilcox[5] and Alexander and Govern[6]. These tests have been the subjects of a number of simulation studies and reviews in which their performances have been compared with each other and with ANOVA[4,1,3,7,8,9,10,21,5,22,6,11,12]. Several other alternatives to ANOVA F test were proposed such as Cohran test, Brown-Forsythe test, Modified Brown-Forsythe test and the Approximate ANOVA F test[13,4,14,15].
The main purpose of this study is to compare some of the alternative tests of ANOVA for comparing Type I error rates.
Description of the procedures
Let Xik be the ith observation in the kth group; where I= nk and k=1 K and let Σnk=N. The Xik are assumed to be independent and normally distributed with expected values μk and variance σ2k . The best linear unbiased estimates of μk and σ2k are:
It is known that the test statistics for ANOVA F test is;
(1) |
Where
Brown-Forsythe Test (BF):
(2) |
The is approximately distributed as an F variable with (K-1) and f degrees of freedom. The F is obtained with the Sattertwaite approximation[16] as;
Modified Brown-Forsythe Test (MBF): Mehrota[15] Modified the original Brown-Forsythe test as follows
(3) |
The Brown-Forsythe test used K-1 numerator degrees of freedom while Mehrota[15] used a Box[17] approximation to obtain the numerator degrees of freedom, v1, where
(4) |
and the denominator degrees of freedom;
(5) |
Under the null hypothesis, is distributed approximately as an F variable with v1 and v degrees of freedom[15,18].
Cochran test (COC):
The test statistic is
(6) |
where
statistic is distributed asymptotically as a central χ2-variable with K-1 degrees of freedom[18].
Approximate ANOVA F test (AAF): The test statistic of this test is the same as the Brown-Forsythe and the Modified Brown-Forsythe test; AAF= =FBF. Under the null hypothesis, the test statistic AAF is distributed approximately as an F variable with v1 and v2 degrees of freedom, where
(7) |
The numerator degrees of freedom for AAF and are equal. The difference between the two alternatives is in the denominator degrees of freedom in the unbalanced case[14].
MATERIALS AND METHODS
A simulation study was conducted to compare the performance of the Cochran, original Brown-Forsythe, Modified Brown-Forsythe and approximate ANOVA F tests. Type I error rates were used to measure the testing equality of several independent group means. Data from Normal (0.1), t (5), Chi-square (3) and exponential (0.75) distributions for each given set of parameter values were used the subroutines RNNOA, RNSTT, RNCHI and RNEXP, available with the IMSL library functions, generated samples from these distributions[19,20].
The parameter values were taken for k=3 and 6 groups. For the three group simulation,
the parameters were (n1=n2=n3=3, 6, 9, 15,
24, 30) and =1:1:1, 1:2:4 and 1:1:8. For the six group simulation, the parameters
were (n1=n2=n3=n4=n5=n6=3,
6, 9, 15, 24, 30) and
RESULTS
Three groups: Empirical Type I error results of 50 000 simulation runs are given in (Table 1, 2). When k=3, the distribution was normal and sample sizes were equal, BF test, MBF test and AAF test had similar and acceptable Type I error rates.
Table 1: | Empirical Type I error probabilities (α) (%) when k=3 |
MBF and AAF tests gave the same results under all distributions and variance ratios when the data was balanced. On the other hand, when the data was unbalanced, results from these two tests were slightly different. These results are consistent with Hartung and Argac[18].
The other tests attained similar and acceptable alpha levels also. When distributions deviated from normal, data was balanced and variances were homogenous, Type I error rates for BF test, MBF test and AAF tests were smaller than 5.0%, while Type I error rate for COC test was greater than that. It may be suggested that one of these tests (BF, MBF and AAF) can be used to compare equality of means.
Under the conditions of moderate heterogeneity of variances (1:2:4), balanced data and normal distribution, COC test was able to approach the predetermined alpha level only when the sample size was greater than 24 (Table 1). Other tests, especially the BF test was able to preserve the alpha level except for sample sizes 3:3:3. MBF and AAF tests resulted in acceptable alpha levels also except for the sample sizes 3:3:3. Under the same variance conditions and using balanced data, COC test deviated greatly from the predetermined alpha level when distributions were chi-square (3) and exp. (0.75), even for sample sizes as large as 30 (Table 1). Under these conditions, the BF, MBF and AAF tests may be used to compare equality of means.
Under the conditions of moderate heterogeneity of variances (1:2:4), unbalanced data (direct pairing) and normal distribution, MBF and AAF tests were closest to the 0.05 alpha level. When inverse pairing was applied, all the tests deviated greatly from the predetermined alpha level, especially for the COC test. This was more pronounced when variances were largely heterogeneous (1:1:8).
Six groups: When there were six groups, variances were homogenous and data was balanced, Type I rates were similar to those when there were three groups for all tests and all distributions except for the small sample sizes (Table 2).
Table 2: | Empirical Type I error probabilities (α) (%) when k=6 |
Under small deviations from homogeneity of variances (1:1:1:2:2:2:2), balanced data and normal distribution, most reliable results were obtained from BF, MBF and AAF tests. As the distribution deviated from normal, Type I error rates obtained from the BF test were closer to the predetermined alpha level compared to the other tests. All the tests were affected from small sample sizes (Table 2).
Simulation Results: Under the same conditions of variance homogeneity, all tests were affected adversely from the unbalanced data structure. This adverse affect increased as the inequality of sample sizes increased for all tests. As in the three groups, inverse pairing increased the deviation from the alpha level 0.05.
When the heterogeneity of variances increased (1:1:2:2:4:4), most reliable results were obtained from the BF test (Table 2).