Evaluation of Four Tests When Normality and Homogeneity of Variance Assumptions are Violated

Abstract: The present Monte Carlo study compared Type I error rates of four tests under nonnormality and heterogeneity of variance assumptions. Cochran test, Brown–Forsythe test, modified Brown–Forsythe test and approximate ANOVA F test were evaluated for three–and six different groups. At the end of 50,000 simulation trials, the Type I error rates for four tests were affected by the sample size, variance ratio, the number of groups and the relationship between sample sizes and groups variances. Type I error rates of the Brown Forsythe test, extension of the modified Brown–Forsythe test and approximate ANOVA F–test were close to the predetermined alpha levels (0.05), while Cochran tests greatly deviated from the predetermined level.

Fulltext PDF Fulltext HTML

How to cite this article

Mehmet Mendes and Akin Pala, 2004. Evaluation of Four Tests When Normality and Homogeneity of Variance Assumptions are Violated. Journal of Applied Sciences, 4: 38-42.

Keywords: Typ1 error rates, homogeneity of variance, analysis of variance and normality

INTRODUCTION

Many studies have shown that the F test for equal means among k (k≥2) independent groups is not robust to violations of the usual equal variance and normality assumptions. One strategy for the comparison of means from groups with heterogeneous variances is to use an alternative test. Among the most frequently cited parametric alternatives to ANOVA are procedures proposed by Welch^[1], James^[2], Marascuilo^[3], Brown and Forsythe^[4], Wilcox^[5] and Alexander and Govern^[6]. These tests have been the subjects of a number of simulation studies and reviews in which their performances have been compared with each other and with ANOVA^{[4,1,3,7,8,9,10,21,5,22,6,11,12]}. Several other alternatives to ANOVA F test were proposed such as Cohran test, Brown-Forsythe test, Modified Brown-Forsythe test and the Approximate ANOVA F test^[13,4,14,15].

The main purpose of this study is to compare some of the alternative tests of ANOVA for comparing Type I error rates.

Description of the procedures

Let X_ik be the ith observation in the kth group; where I=…n_k and k=1…K and let Σn_k=N. The X_ik are assumed to be independent and normally distributed with expected values μ_k and variance σ²_k . The best linear unbiased estimates of μ_k and σ²_k are:

, respectively.

It is known that the test statistics for ANOVA F test is;

(1)

Where , when population variances are equal. F is distributed as a central F variable with (K-1) and (N-K) degrees of freedom.

Brown-Forsythe Test (BF):

(2)

The is approximately distributed as an F variable with (K-1) and f degrees of freedom. The F is obtained with the Sattertwaite approximation^[16] as;

Modified Brown-Forsythe Test (MBF): Mehrota^[15] Modified the original Brown-Forsythe test as follows

(3)

The Brown-Forsythe test used K-1 numerator degrees of freedom while Mehrota^[15] used a Box^[17] approximation to obtain the numerator degrees of freedom, v₁, where

(4)

and the denominator degrees of freedom;

(5)

Under the null hypothesis, is distributed approximately as an F variable with v₁ and v degrees of freedom^[15,18].

Cochran test (COC):

The test statistic is

(6)

where and under the null hypothesis, COC

statistic is distributed asymptotically as a central χ²-variable with K-1 degrees of freedom^[18].

Approximate ANOVA F test (AAF): The test statistic of this test is the same as the Brown-Forsythe and the Modified Brown-Forsythe test; AAF= =F_BF. Under the null hypothesis, the test statistic AAF is distributed approximately as an F variable with v₁ and v₂ degrees of freedom, where

(7)

The numerator degrees of freedom for AAF and are equal. The difference between the two alternatives is in the denominator degrees of freedom in the unbalanced case^[14].

MATERIALS AND METHODS

A simulation study was conducted to compare the performance of the Cochran, original Brown-Forsythe, Modified Brown-Forsythe and approximate ANOVA F tests. Type I error rates were used to measure the testing equality of several independent group means. Data from Normal (0.1), t (5), Chi-square (3) and exponential (0.75) distributions for each given set of parameter values were used the subroutines RNNOA, RNSTT, RNCHI and RNEXP, available with the IMSL library functions, generated samples from these distributions^[19,20].

The parameter values were taken for k=3 and 6 groups. For the three group simulation, the parameters were (n₁=n₂=n₃=3, 6, 9, 15, 24, 30) and =1:1:1, 1:2:4 and 1:1:8. For the six group simulation, the parameters were (n₁=n₂=n₃=n₄=n₅=n₆=3, 6, 9, 15, 24, 30) and =1:1:1:1:1:1, 1:1:1:2:2:2 and 1:1:2:2:4:4. To investigate the relationship between sample sizes and group variances, sample sizes were chosen as (3:4:5, 5:10:15, 10:20:30, 5:4:3, 15:10:5, 30:20:10) for k=3 and were chosen as (3:4:5:6:7:8, 5:8:11:14:17:20, 5:10:15:20:25:30, 8:7:6:5:4:3, 20:17:14:11:8:5 and 30:25:20:15:10:5) for k=6. The populations were standardized because they have different means and variances. Shape of the distributions was not changed while the means were changed to 0 and the standard deviations were changed to 1. For each pair of samples, Cochran test (COC test), original Brown-Forsythe test (BF test), modified Brown-Forsythe (MBF test) and the approximate ANOVA F test (AAF test) statistics were calculated and a check was made to see if the hypothesis, which is true the was rejected at α=0.05. The experiment was repeated 50,000 times and the proportion of observations falling in the critical regions was recorded for different α, k, n, variance patterns and distributions. This proportion estimation is the Type I error rate if the means from the populations do not differ (μ₁=μ₂). The predetermined alpha level was 0.05 in all simulations. Both homogeneous and heterogeneous variance conditions were considered. To form heterogeneity among the population variances, random numbers in the samples were multiplied by specific constant numbers . A FORTRAN program was written for Intel Pentium III processor to compute all tests.

RESULTS

Three groups: Empirical Type I error results of 50 000 simulation runs are given in (Table 1, 2). When k=3, the distribution was normal and sample sizes were equal, BF test, MBF test and AAF test had similar and acceptable Type I error rates.

Table 1:	Empirical Type I error probabilities (α) (%) when k=3

MBF and AAF tests gave the same results under all distributions and variance ratios when the data was balanced. On the other hand, when the data was unbalanced, results from these two tests were slightly different. These results are consistent with Hartung and Argac^[18].

The other tests attained similar and acceptable alpha levels also. When distributions deviated from normal, data was balanced and variances were homogenous, Type I error rates for BF test, MBF test and AAF tests were smaller than 5.0%, while Type I error rate for COC test was greater than that. It may be suggested that one of these tests (BF, MBF and AAF) can be used to compare equality of means.

Under the conditions of moderate heterogeneity of variances (1:2:4), balanced data and normal distribution, COC test was able to approach the predetermined alpha level only when the sample size was greater than 24 (Table 1). Other tests, especially the BF test was able to preserve the alpha level except for sample sizes 3:3:3. MBF and AAF tests resulted in acceptable alpha levels also except for the sample sizes 3:3:3. Under the same variance conditions and using balanced data, COC test deviated greatly from the predetermined alpha level when distributions were chi-square (3) and exp. (0.75), even for sample sizes as large as 30 (Table 1). Under these conditions, the BF, MBF and AAF tests may be used to compare equality of means.

Under the conditions of moderate heterogeneity of variances (1:2:4), unbalanced data (direct pairing) and normal distribution, MBF and AAF tests were closest to the 0.05 alpha level. When inverse pairing was applied, all the tests deviated greatly from the predetermined alpha level, especially for the COC test. This was more pronounced when variances were largely heterogeneous (1:1:8).

Six groups: When there were six groups, variances were homogenous and data was balanced, Type I rates were similar to those when there were three groups for all tests and all distributions except for the small sample sizes (Table 2).

Table 2:	Empirical Type I error probabilities (α) (%) when k=6

Under small deviations from homogeneity of variances (1:1:1:2:2:2:2), balanced data and normal distribution, most reliable results were obtained from BF, MBF and AAF tests. As the distribution deviated from normal, Type I error rates obtained from the BF test were closer to the predetermined alpha level compared to the other tests. All the tests were affected from small sample sizes (Table 2).

Simulation Results: Under the same conditions of variance homogeneity, all tests were affected adversely from the unbalanced data structure. This adverse affect increased as the inequality of sample sizes increased for all tests. As in the three groups, inverse pairing increased the deviation from the alpha level 0.05.

When the heterogeneity of variances increased (1:1:2:2:4:4), most reliable results were obtained from the BF test (Table 2).

REFERENCES

Welch, B.L., 1951. On the comparison of several mean values: An alternative approach. Biometrika, 38: 330-336.
CrossRef Direct Link

James, G.S., 1951. The comparison of several groups of observations when the ratios of the population variances are unknown. Biometrika, 38: 324-329.
CrossRef Direct Link

Marascuilo, L.A., 1971. Statistical Methods for Behavioral Science Research. McGraw-Hill, New York, USA

Brown, M.B. and A.B. Forsythe, 1974. The small sample behavior of some statistics which test the equality of several means. Technometrics, 16: 129-132.

Wilcox, R.R., 1989. Adjusting for unequal variances when comparing means in one-way and two-way effects ANOVA models. J. Educ. Stat., 14: 269-278.
Direct Link

Alexander, R.A. and D.M. Govern, 1994. A new and simpler approximation for ANOVA under variance heterogeneity. J. Educ. Statistics, 19: 91-101.
CrossRef Direct Link

Levy, K.J., 1978. Some empirical power results associated with welch`s robust analysis of variance technique. J. Stat. Comm. Simul., 8: 49-57.
Direct Link

Dijkstra, J.B. and P.S. Werter, 1981. Testing the equality of several means when the population variances are unequal. Commun. Stat. Simulations Comput., 10: 557-569.

Tomarken, A.J. and R.C. Serlin, 1986. Comparison of ANOVA alternatives under variance. Psychol. Bull., 99: 90-99.
CrossRef

Tabatabia, M.A. and W.Y. Tan, 1986. Some monte carlo studies on the comparison of several means under heteroscedasticity and robustness with respect to departure from normality. J. Biometry, 7: 801-814.

Lix, L.M., J.C. Keselman and H.J. Keselman, 1996. Concequences of assumption violations revisited: A quantitative review of alternatives to the one-way analysis of variance F test. Rev. Educ. Res., 66: 579-619.
Direct Link

Mendes, M., 2002. The comparison of some parametric alternative test to one-way analysis of variance in term of type I error rates and power of test under non-normality and heterogeneity of variance. Ph.D. Thesis, Ankara University Graduates School of Natural and Applied Sciences Department of Animal Science.

Cochran, W.G., 1954. Some methods for strengthening the common χ² tests. Biometrics, 10: 417-451.
CrossRef Direct Link

Asiribo, O. and J. Gurland, 1990. Coping with variance heterogeneity. Commun. Statistics Ser. A, 19: 4029-4048.

Mehrota, D.V., 1997. Improving the brown-forsythe solution to generalized behrens-fisher problem. Commun. Stat. Ser. B, 26: 1139-1145.

Satterhwaite, F.E., 1941. Synthesis of variance. Psychometrika, 6: 309-316.
CrossRef Direct Link

Box, G.E.P., 1954. Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. Ann. Math. Stat., 25: 290-302.
CrossRef

Hartung, J. and D. Arga�, 2001. Testing for homogeneity in combining of two-armed trials with normally distributed responses, Sankhya. Indian J. Stat. Ser. B, 63: 298-310.
Direct Link

Anonymous, 1994. FORTRAN Subroutines for Mathematical Applications. Vol. 1-2, IMSL Publisher, Houston, USA

Mendes, M. and A. Pala, 2003. Type I error rate and power of three normality tests. Inform. Technol. J., 2: 135-139.
CrossRef Direct Link

Krutchkoff, R.G., 1988. One-way fixed effects analysis of variance when the error variances may be unequal. J. Stat. Comm. Simul., 30: 259-271.

Oshima, T.C. and J. Algina, 1992. Type I error rates James`s second-order tesst and Wilcox`s Hm test under heteroscedasticity and non-normality. Br. J. Math. Stat. Pscychol., 45: 255-263.

HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2004 | Volume: 4 | Issue: 1 | Page No.: 38-42
DOI: 10.3923/jas.2004.38.42

Evaluation of Four Tests When Normality and Homogeneity of Variance Assumptions are Violated

Mehmet Mendes and Akin Pala

How to cite this article

Mehmet Mendes and Akin Pala, 2004. Evaluation of Four Tests When Normality and Homogeneity of Variance Assumptions are Violated. Journal of Applied Sciences, 4: 38-42.

Keywords: Typ1 error rates, homogeneity of variance, analysis of variance and normality

REFERENCES

HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2004 | Volume: 4 | Issue: 1 | Page No.: 38-42 DOI: 10.3923/jas.2004.38.42

Evaluation of Four Tests When Normality and Homogeneity of Variance Assumptions are Violated

Mehmet Mendes and Akin Pala

How to cite this article

Mehmet Mendes and Akin Pala, 2004. Evaluation of Four Tests When Normality and Homogeneity of Variance Assumptions are Violated. Journal of Applied Sciences, 4: 38-42.

Keywords: Typ1 error rates, homogeneity of variance, analysis of variance and normality

REFERENCES

Year: 2004 | Volume: 4 | Issue: 1 | Page No.: 38-42
DOI: 10.3923/jas.2004.38.42