An outlier is an observation in the data that differs
noticeably from other observations. They are wild observation which does
not appear to be consistent with the rest of the data. Grubbs (1969) remarks
that an outlying observations or outlier is one that appears to deviate
markedly from other members of sample in which it occurs. Recent research
Battaglia (2006) also defined outlier as not only as an anomalous observation
arising from anomalous events, but as an observation that is incoherent
with the surrounding observations. But it is difficult to give exact criteria
for deciding when a value is too big or too small or in general too extreme.
Many authors have studied the occurrence of outliers in univariate and
multivariate time series. Fox (1972) proposed two parametric models for
studying outliers, Abraham and Box (1979) used the Bayesian method, Chang
(1982) adopted Fox`s model and proposed an iterative procedure to detect
multiple outliers, Chang and Tiao (1983), Hoaglin and Iglewicz (1987)
and Martin and Yoai (1988) treated outlier as contamination generated
from a given probability distribution, Tsay (1988) to mention a few have
investigated outliers, level shift and variance changes in a unified manner,
Tsay et al. (2000) have investigated outliers, level shift and
variance changes in a unified manner. Galiano et al. (2004) extended
the study of outlier`s detection and investigated their effect in univariate
and multivariate time series. Despite these efforts, researchers have
not come up with an efficient method to deal with detected outliers.
The study of outliers in a data set is often inevitably
an informal screening process preceding fuller and more formal analysis
of the data. Since the presence of one or more outliers in a data set
could lead to bias in the estimation of parameters of the model and greatly
inflates the estimate of the variance (σ2), there is serious
need to consider methods of removal of outliers from time series data.
It may also be possible to apply an outlier robust methods of making valid
inference and reliable forecasts for the future. Collett and Lewis (1976)
in his earlier study carried out on accommodation of outliers in the middle
of eighteenth century about the combination of astronomical data (observations)
... Is it right to hold that the several observations
are of the same weight or moment, or equally prone to any or every error?
Is every other outlier with the same probability? Such an assertion would
be quite absurd, I see no way of drawing a dividing line between those
that are to be utterly rejected and those that are to be wholly returned;
it may even happen that the rejected observation is the one that would
have supplied the best correction to the others. Nevertheless, I do not
condone in every case the principle of rejecting one or other of the observations,
indeed I approve it, whenever in the course of observations an accident
occurs which in itself raises an immediate scruple in the mind of the
observer, if there is no such reason for satisfaction I think each and
every observation should be admitted whatever its quality, as long as
the observer is conscious that he has taken every case.
For the reasons mentioned in the quoted views about 200
years after, many researchers were still investigating the timing and
occurrence of outliers in statistical data. Many outlier generating techniques
have been developed; among them Rosner (1975) suggested the Extreme Studentized
deviate (ESD); Chang and Tiao (1983) also propose the innovative outlier
model (10) and the additive outlier mo0del (AO) while Shangodoyin (1994)
improved on them. Shittu (2000) proposed the multiplicative outlier generating
model (MO) and the condition of the Innovative and Additive outlier generating
model (CO) to mention a few.
This study therefore aims at developing an alternative
technique that uses the robust trigonometric regression. The method is
expected to improve the precision of the estimates, increase accuracy
of forecasts of time series data in frequency domain.
TREATMENT OF OUTLIERS
Having enumerated some of the efforts that have been
made to identify or label aberrant observation, the pertinent question
to be asked are What action are we to take when one or more observations
in a set of data is adjudged to be an outlier? How should we react to
outliers and what principles and methods can be used to support rejecting
them, adjusting them values or leaving them anuttered prior to processing
the principal mass of data.
Treatment of outliers depends on the form of the population
and the technique will be conditioned by and specific to the postulated
basic model for the population. However, the method of processing outliers
takes a relative form.
Rejection of Outliers
According to Hawkins (1980) early approaches to processing of outliers
involve testing an outlier with a view to determining whether it should
be retained or rejected since an outlying observation could represent
one of the most important pieces of data, perhaps pointing to some special,
as yet undiscovered, feature of the relationship between related variables.
However, care must be taken in making decision on whether
to delete an observation. On this Kruskal (1960) said.
As to practice, I suggest that it is of great importance
to preach the doctrine that apparent outliers should always be reported
even when one feels that their causes are known or when one reject them
for whatever good rule or reason. The immediate pressure of practical
statistical analysis are almost uniformly in the direction of suppressing
announcement of observation that do not fit the pattern, we must maintain
a strong sea-wall against these pressure and I quote;
Thus outright rejection of suspected outliers has statistical
consequences reduction of the number of observation in the sample, thus
further analysis will be on the reduced sample (or censored sample) may
affects inferences on the related population. Since often times rejection
was not carried out according to any formal procedure, but was purely
a matter of the observer`s judgment.
Weighting is an alternative to outright rejection of extreme values.
Glaisher (1872) was perhaps the first to publish a paper on weighting
procedure. Rider (1933) wrote in his study:
Since the object of combining observation is to obtain
the best possible estimation of the time value of a magnitude, the principle
underlying (weighting) methods is that an observation which differs widely
from the best should be returned, but assigned a smaller weight than the
others in computing a weighted average of course retention with an exceedingly
small weight amounts to virtual rejection.
Glaisher`s method was concerned with n observation
from normal distributions, with common mean required to be estimated and
with unknown and unequal variances. He proposed estimating the mean μ
iteratively by a weighted combination of the Xi with weights
determined from the squared deviation of the values of the observation.
Stone (1968) criticized Glaisher`s method and proposed an alternative
weighting procedure base on maximum likelihood. This leads to a weighted
mean μ given by the (n-1)th degree equation
Another alternative to outright rejection is trimming. It is a procedure
in which a fixed fraction α of lower and upper, extreme sample values
are totally discarded before processing the sample. To illustrate this
procedure, suppose we are estimating a location parameter μ from
n observation X1, X2, ...., Xn. Since
outliers manifest themselves as extreme values; it is possible to control
the variability due to the r lowest sample values X1, X2,
...., Xr and the S highest ones Xn-s + 1, ....,
Xn where (r + S) observation are adjudged outliers. If the
(r + S) observations are omitted, so that we confine ourselves to a censored
sample of size (n-r-S), we get the (r, S)-fold trimmed mean.
This procedure is not quite different from rejection
technique even though Barnett and Lewis (1985) believed that-trimmed mean
does not throw out` outliers, in the sense of ignoring them completely.
He claims that it bring them in` toward the bulk of the sample.
If on the other hand the r and or S lowest and largest sample values
are each replaced by their (nearest neighbor) values of the nearest observation
to be retained unchanged, then we have (r, S)-fold Winsorized mean.
Thus given n observations X1, X2,
...., Xn where it is known apriori or detected through some
statistical procedure that X1, ...., Xr and Xs
+ 1, ...., Xn are lower and upper outliers respectively,
replacing the lower and upper (largest) extreme sample values by rXr
+ 1 and sXs + 1 so that we work with a transformed sample
of size n. The (r, S)-fold winsorized mean is given by:
This makes each of the later values appear twice in the
This is a method considered by Xie (1993) where the underlying series
is assumed to be linear and parametric. He assumed an ARMA (p, q) model
for the contaminated series and considered the outlying observation as
a missing data, then obtained supplement to the values by using well known
ALTERNATIVE APPROACH-FILTERING METHOD
Considering the performance and the limitations of various
procedure for treatment of outliers in earlier section above coupled with
the fact that censored data set denies the investigator a great loss of
confidence in the specified model we propose the method/procedure for
filtering of suspected outliers.
This approach also uses the robust trigonometric regression
to obtain the robustified discrete Fourier transform such that at each
frequency, we fit a sine and cosine coefficient, by using either the repeated
median technique of Chang and Tiao (1983) or the biweight of Turkey (Andrews
et al., 1972).
The filtered value of the suspected outlier be substituted
back into the data set before further analysis is carried out.
To apply the filtering method, the observed series will
be subjected to outlier test with a view to detecting aberrant observations
using the following algorithms.
Algorithm I (Detection of Outliers)
Given a time series X1, X2, X3, ....,
||Compute the median of the series (i.e., )
||Compute the Fourier frequencies
for I = 1, 2, ..., k where,
||Obtain the estimate of
||Compute the periodogram
in the range by
is very close to its true value, then
will also be close to
respectively, hence the squared amplitude will be non-zero and here will
be large peak. This corresponds to the frequency with the greatest contribution
to the variance. However, if
is substantially far from its expected value the periodogram will be close
for i = 1, 2, ...., k whose squared amplitude is significantly greater than
||Obtain the residual variance of the series.
||Compute the test statistics.
for i = 1, 2, ..., k since median is more
resistant to outlier, hence a robust measure of central tendency Atkinson
where, C is the critical value simulated as 1.00 or 1.10,
is declared an outlier.
This procedure was applied in Shittu and Shangodoyin
Algorithm 2 (Accommodation of Outliers)
When outliers are detected using algorithm I above, the median as
a measure of location rather than the mean Atkinson (1981) is employed.
||Obtain the median of
the observed series X1, X2, X3, ....,
||Determine the value of Fourier frequencies
for i = 1, 2, ...., k whose squared amplitude is non-zero.
||Using the biweight filter of Tatum and Hurvich (1993) and the repeated
median filter of Siegel (1982) and for all ;
compute the discrete Fourier transform
gives the filtered data set whose contamination/outlier has been cleaned
||The detected outlier
using algorithm 1 is then replaced by
before further analysis is carried out.
This approach uses the robust trigonometric regression
to obtain the robustified discrete Fourier transform such that at each
frequency, we fit a sine and cosine coefficient, by using either the biweight
filter of Tatum and Hurvich (1993) or the repeated median technique of
In the final analysis, the filtered value of the suspected
outlier will be substituted back into the data set with a view to comparing
its performance over the other existing method of treating outliers.
In this study, five different real and well analyzed data are used to illustrate
the use of the above algorithms. They are series A: Zadakat data daily offerings
in a local mosque in Ibadan, Oyo State, Nigeria, between 18th February, 2001
and 13th July, 2001; series B: a Wolfer`s sunspot data, the yearly record of
the activities in the solar system from 1749 to 1924, a well analyzed data from
Anderson (1970), Series C; Batch chemical data, a well analysed data obtainable
from Box and Jenkins (1976); series D: a monthly Consumer Price Index data obtained
from the annual abstract of statistic for the Federal Office of Statistics (FOS),
Lagos, Nigeria, FOS (1998) and series E: a monthly diabetic disease data collected
from the University College Hospital (UCH), Ibadan, Nigeria between January
1974 and February 1986 and used in detection of outbreak of epidemics (Osanaiye
and Talabi, 1989).
|| The collected data for fixed model (series A) using different
|| The collected data for fixed model (series B) using different
|| The collected data for fixed model (series C) using different
The proposed algorithm is used to diagnose collected
data for outliers using the Spectral method (Shittu and Shangodoyin, 2007).
We also found that 2 and 8 observations were identified as outliers in
series A and B while 4 observations each were identified in E (Table
1 -3). No observation were identified in series
C and D. This is not to say that the algorithm can not work for small
sample size data as studies have shown that the procedure performs efficiently
in any series where contamination is suspected.
The suspected outliers were either rejected as it they
were not part of the series, or excluded (i.e., trimmed). The labeled
data were also winsorized by replacing the suspected data with its largest
The proposed accommodation technique (Filtering method)
was also applied. This is a method whereby the value of the suspected
outlier is replaced with the corresponding value obtained after Fourier
transformation, an analogue of the bi-weight filter of Tatum and Hurvich
Relative Performance of the Accommodation Techniques
Here, various treatments techniques are applied using the algorithm
with a view to measuring the performance among the existing techniques
and with the proposed technique. To do this, the fixed and dynamic models
were fitted to the resulting series with a view to measure the relative
performance of the various outlier accommodation methods.
In this attempt, our interest is not to determine the
appropriate model for the data, but to examine the variance and standard
error of the parameter estimates before and after treatment. The issue
of appropriate modeling technique will be addressed in the next section.
The simple least squares regression model for univariate sample was used
to fit regression lines to all the series.
|| The collected data for fixed model (series D) using different
|| The collected data for fixed model (series E) using different
Since the least squares fitting is not resistant to outliers and neither is
the slope of the regression, the least squares` fitting is not resistant to
outliers and neither is the fitted slope estimates, materials from (http://www.basc.nwu.edu/statguidefiles/ancovaassviol.html).
The results of our finding were given in Table 1 -5
in the Appendix.
It can be seen from the Table 1 that the appropriate
model for that original data was the simple regression model as indicated
by the p-value (p = 0.0145). After treatment for outliers, it was found
out that the underlying structure of the model was not specified by the
simple linear regression model again as indicated by their p-values.
It should be noted that the largest observed reduction
is noticed in the filtering method.
The above shows that the occurrence of outliers has inflated
the error structure of the model as well as the of the parameter estimates,
that led to model mis-specification.
It is well known that simple regression model is not appropriate for
the sunspot data (Series B) above Hathaway and Wilson (2004). However,
there was a significant reduction in the standard error of the estimates
and that of the fitted model as shown in Table 2. It
is expected that the same or greater percentage reduction in the residual
error can be achieved even if the most appropriate model was fit to the
original data set.
Series: C and D
No treatment was carried out on series C and D as none of their observations
were labeled as outlier incidentally, the fitted models to the two series
were good as indicated in their p-values (Table 3, 4).
In Table 5 in the Appendix, reductions in the standard
error of the parameter estimates and the standard deviation of the model
were noticed with the winsorizing and filtering methods in a tie, i.e.,
Again the performance of the various outlier treatment techniques
was examined under dynamic systems. The model identification tools (ACF
and PACF) were used to fit appropriate ARMA (p, q) model to all the series
in this study. The contaminated series A, B, E and F are then modeled
after being treated with different treatment techniques.
The summary of the result of the analysis are given in
Table 7-10 in the Appendix.
|| The collected dat for dynamic model (series A) using different
|| The collected dat for dynamic model (series B) using different
|| The collected dat for dynamic model (series C) using different
|| The collected dat for dynamic model (series D) using different
|| The collected dat for dynamic model (series E) using different
In series A, the appropriate model is ARMA (1, 1), as in the fixed
model, substantial reduction in the standard error of the estimates were
achieved after treatment with filtering method performing best among others
Autoregressive model of order one [AR(1)] was found most suitable
for series B, as in series A reduction in standard errors were observed
with winsorizing method performing better than all other techniques.
Series: C and D
ARMA(1,1) and AR(1) were found to be most appropriate for series C
and D respectively. Since both series were not contaminated with outliers,
no treatment were required.
The model that best describe the underlying structure of series E
was found to be ARMA(1, 1). As in the fixed model, reductions were noticed
in the standard error of the model as well as the error of the estimates
with the filtering method performing better than all other traditional
Having noticed that the filtering method performs better
in 3 out of 4 series in the dynamic modeling and about the same performance
in the fixed model, the filtering method is hereby declared as the best
method of treatment for outlier contaminated series.
The procedures are based on simple techniques, they can
be used as data cleaning device in spectral estimation and robust time
The performance of various accommodation techniques was
determined in respect of the fixed and dynamic models. It was discovered
that the new method of accommodating outliers (filtering method) is best
in term of the residual error of the filtered data as well as in the standard
error of the estimates (Table 1 -5)
for fixed model and 7-10 for the
dynamic model in the appendix. The filtering method performs best in all
the series except in Series e where the performance of the Winsorizing
and Filtering methods are almost the same.
Based on our findings, we conclude that the proposed
filtering method of accommodating outliers is considered best among the
existing methods in terms of the residual error of the filtered data as
well as the precision of the estimates.
This implies that forecast values obtained from the Filtered
data will be more accurate and reliable thus can be used for meaningful
planning and control of events and programmes.
There is no need going into lengthy computation if there
is sufficient information on the existence of outliers, however, more
often than not non or scanty information are provided on the occurrence
of outliers and since the number of outliers present in a set of data
can not be determined apriori, it is recommended that every data set,
especially time series data should be diagnosed for outliers using the
proposed algorithms which have been found to be more efficient than other
traditional techniques, before further analysis could be carried out.
Detected outlier(s) should be accommodated by the filtering method which
has been established to be the most efficient technique that is capable
of guaranteeing the integrity of the data.