Where:
y_{i} 
= 
The ith observed response 
x_{i} 
= 
A px1 vector of predictors 
β 
= 
A px1 vector of unknown finite parameters 
∈ 
= 
Uncorrelated random errors with mean 0 and variance σ^{2} 
Writing Y = (y_{1},y_{2},….y_{n})^{T} , X = (x_{1},x_{2},…x_{n})^{T} and ∈ = (∈_{1}, ∈_{2},… ∈_{n})^{T} the model is:
where, E(∈) = 0, V(∈) = σ^{2}I and I is an identity matrix of order n. Now we establish general expressions for the moments, coefficients of skewness and kurtosis of the true errors assuming only that the errors are uncorrelated, zero mean and identically distributed and the existence of their first four moments. Let us define the kth moment about the origin of the ith error by: As E(∈) = 0, on the null model, we need not distinguish here between moments about the origin and those about the mean. Simple forms of the coefficients of skewness and kurtosis are given by: In practice population moments are often estimated by sample moments. We generally define the kth sample moment by: and hence the raw coefficients of skewness and kurtosis can be defined as: Much work has been done in producing omnibus tests for normality, combining S and K in one test statistic. So a large deviation either of S from 0 or K from 3 should give a significant result, regardless of which one deviates from normal values. D'Agostino and Pearson^{[6]} first suggested this kind of test. However, it requires an assumption of independence between S and K which is only asymptotically correct. Bowman and Shenton^{[2]} suggested a normality test, popularly known as JarqueBera test, with the test statistic:
Where:
σ^{2}(S) 
= 
6/n and σ^{2}(K) = 24/n are the asymptotic variance of S
and K respectively 
S 
= 
Under normality asymptotically, is distributed as x^{2} distribution
with 2 degrees of freedom 
Gel and Gastwirth^{[8] }give a robust version of the JarqueBera test using a robust estimate of spread which is less influenced by outliers in the denominators of the sample estimates of skewness and kurtosis. They consider the average absolute deviation from the sample median (MAAD) proposed by Gastwirth^{[7]} and is defined by:
where, .
The robust sample estimates of skewness and kurtosis are ,
where
are the 3rd and 4th order of the estimated sample moments respectively, which
lead the development of Gel and Gastwirth Robust JarqueBera (RJB) test statistic:
To obtain the constants B_{1} and B_{2}, we need to find the
expressions for
for a finite sample size n, which can be calculated as suggested by Geary^{[10]}.
However, such calculations are quite tedious and are not of practical use since
the convergence of estimators of kurtosis to the asymptotic normal distribution
is very slow. We obtain B_{1} and B_{2} from the Monte Carlo
simulation results given by Gel and Gastwirth^{[8]}. In particular,
if one desires to preserve the nominal level of 0.05, they recommend B_{1}
= 6 and B_{2} = 64.
The above test procedures are designed in a fashion that we know the value of each observation and we can easily compute the value of a test statistic and can come to a conclusion. It is a common practice over the years to use the Ordinary Least Squares (OLS) residuals as substitutes of true errors but there is evidence that residuals have very different skewness and kurtosis than their corresponding true errors and hence normality tests on OLS residuals may perform poorly unless the test statistics are modified.
Let us assume that X is an nxp matrix of full column rank p, the OLS estimator
of β is .
The OLS residuals are given by:
In matrix notation, the residual vector is: The residual vector can also be expressed in terms of unobservable errors as:
We shall follow Chatterjee and Hadi^{[3]} by referring to
as the residual hat matrix. The elements h_{ij} of this matrix will
be termed as residual hat elements, which play a very important role in linear
regression. The quantities
are often referred to as leverage values which measure how far the input vector
x_{i} are from the rest of the data. Let the kth moment of the ith
OLS residual be defined as:
for i = 1,…,n, where we note that .
Using Eq. 12, the ith residual can be expressed in terms
of true errors as:
Hence the second, third and fourth order moments of OLS residuals can be expressed^{[11]} as:
The coefficients of skewness and kurtosis of
are:
In the case of OLS residuals, their sample mean is zero and therefore, the kth sample moment is defined as:
The sample coefficients of skewness and kurtosis on OLS residuals are then
directly obtained from (5) by substituting ∈ by
in it and thus coefficients of skewness and kurtosis based on OLS residuals
can be defined as:
Imon^{[13] }shows that the sample moments as defined in (19) are biased and he also suggests unbiased estimates of the second, third and fourth order moments defined as:
Imon^{[13]} also suggests some simple approximations for the scaling
factors which avoid the necessity of additional computation with the h_{ij}'s
after the regression. He shows that an expression such as ,
where k>1, is often dominated by the term .
When also summing over i, the replacement of
by
the kth power of the average value of h_{ij}
may be considered as possible approximations. Thus we might approximate:
and Substituting the values (24) and (25) in (2123), Imon^{[13]} suggests Rescaled Moments (RM) of OLS residuals as:
where, c = n/(np). Using the above approximations, the rescaled coefficients of skewness and kurtosis become: and the Rescaled Moment (RM) normality test statistic is defined as: Likewise the JB statistic, the RM statistic follows a chisquare distribution with 2 degrees of freedom. Now using the average absolute deviation from the sample median (MAAD) as a robust estimate of spread we can define the Robust Rescaled Moment (RRM) test statistic as: Under the null hypothesis of normality, the RRM test statistic asymptotically follows a chisquare distribution with 2 degrees of freedom. B_{1}andB_{2} are computed similar to the RJB test statistic in Eq. 9 as suggested by Geary^{[10]}. RESULTS Numerical examples: We consider few real life data sets for testing normality assumption in the presence of outliers. Belgian road accident data: Our first example presents the number of road accidents in Belgium recorded between 1975 and 1981 taken from Rousseeuw and Leroy^{[19]} and shown in Table 1. It has been reported by many authors that the Belgian road accident data contains a single outlier (record of 1979) which must cause nonnormality of residuals. Table 2 exemplifies the power of normality tests of this data. Shelfstocking data: Next we consider the shelfstocking data given by Montgomery et al.^{[17]}.
Table 1: 
Belgian road accident data 

Table 2: 
Power of normality tests for Belgian road accident data 

Table 3: 
Shelf stocking data (original and modified) 

Table 4: 
Power of normality tests for original data modified Shelf
stocking data 

This data presents the time required for a merchandiser to stock a grocery store shelf with a soft drink product as well as the number of cases of product stocked. We deliberately change one data point (putted in parenthesis) to get an outlier and both the original and modified data are shown in Table 3. Table 4 shows the power of normality test of this original and modified data.
Power simulations: We carry out a simulation experiment to compare the
performance of the newly proposed RRM test with the existing tests of normality,
in particular, with the JarqueBera, the Rescaled Moment (RM) and the Robust
JarqueBera (RJB) tests. The estimated powers of these tests under various distributions
for different sample sizes are shown in Table 57.
We consider simulated power for ten different distributions; the normal, the
exponential, the tdistribution with 3 and 5 degree of freedom, the logistic,
the Cauchy, the exponential, the lognormal and the contaminated normal distributions
with shifts in location, scale and containing outliers.
Table 5: 
Simulated power of different tests for normality for n =
20 

Table 6: 
Simulated power of different tests for normality for n =
50 

Table 7: 
Simulated power of different tests for normality for n =
100 

For contaminating normal distributions, 90% observations are generated from
standard normal distribution and the remaining 10% observations come from a
normal with either shifted mean or variance or containing outliers. All our
results are given at the 5% level of significance and are based on 10,000 simulations.
DISCUSSION Here we discuss the results that we have obtained in previous section using real data sets and Monte Carlo simulation experiments. Figure 1 shows us a genuine picture that the residuals for this data do not follow normality pattern. We compute the coefficients of skewness (1.764) and kurtosis (4.601) for this data which are also far from normality.
 Fig. 1: 
Normal probability plot of the residuals of the Belgian road
accident data 
But it is interesting to observe from the results shown in Table 2 that the classical JarqueBera test fails to detect the nonnormality pattern of residuals for this data even at the 10% level of significance. The rescaled moments tests as suggested by Imon^{[13]} can successfully detect nonnormality at the 5% level. Both the robust RJB and RRM rejects normality in a highly significant way, but the RRM possesses higher power than the RJB for this data. The normal probability plots of the original and modified shelfstocking data are shown in Fig. 2. It is not clear from these plots whether the residuals are normally distributed or not. We apply the classical and robust methods to check the normality and the results are shown in Table 4. For the original data all the methods show that the residuals for this data are normally distributed. Since we have inserted an outlier in the modified data it is expected that the normality pattern of residuals will be affected. It is worth mentioning that all commonly used outlier detection techniques such as the Least Median of Squares (LMS), the Least Trimmed Squares (LTS), the Block Adaptive ComputationallyEfficient Outlier Nominator (BACON)^{[4] }can easily identify one observation as an outlier. The standard theory tells us that the normality should break down in the presence of outliers. But it is interesting to observe that both the JarqueBera and the RM test fail to detect nonnormality here. The RJB test also fails to detect nonnormality at the 5% level of significance. But the performance of the RRM test is quite satisfactory in this occasion. It can detect the problem of nonnormality even at 1.6% level of significance.
 Fig. 2: 
Normal probability plot of the residuals of (a): The original
and (b): The modified shelfstocking data 
Simulation results shown in Table 57 show that when the data come from normal distributions the performance of classical normality tests are good. Both the RJB and the RRM show a slightly higher size but the values are not that big and tend to decrease with the increase in sample size. But the classical tests perform poorly when the errors come from either heavier tailed or contaminated normal distributions. But the newly proposed RRM test performed best throughout and hence may be considered as the most powerful test for normality. Figure 3 shows the merit of using the RRM test for normality. We observe that this test possesses much higher power than the JarqueBera, the RM and the robust JarqueBera tests under a variety of error distributions.


Fig. 3: 
Power of the normality tests for different distributions 
CONCLUSION In this study we develop a new test for normality in the presence of outliers, called the RRM, which is a robust modification of the rescaled moments test for normality. The real data sets and Monte Carlo simulation shows that the modified rescaled moment test offers substantial improvements over the existing tests and performs superbly to test the normality assumption of regression residuals. " target="_blank">View Fulltext
 Related
Articles  Back
