INTRODUCTION
The impact of air pollution is noticeable, especially for human health. There
are several health effects that correlate to air pollution such as asthma, chronic
bronchitis and increasing respiratory symptoms, such as sinusitis, sore throat,
dry and wet cough and hay fever (WHO, 1998). An issue
of great concern has been the detrimental effect of low air quality onto human
health, chronically or acutely. Since air pollution can be a major problem especially
to human health, air quality monitoring should be done continuously.
There are many sources of air pollution such as mobile sources, stationary
sources and open burning sources (Afroz et al., 2003).
Mobile sources include personal vehicles, commercial vehicles and motorcycles.
Stationary sources refer to factory and industry, power stations, industrial
fuel burning processes and domestic fuel burning while open burning sources
refer to burning of solid wastes and forest fires. In Malaysia, there are 52
monitoring locations throughout the country that belong to the Department of
Environment (Department of Environment Malaysia, 2010).
The parameters monitored include Particulate Matter (PM_{10}), Sulphur
Dioxide (SO_{2}) and several airborne heavy metals. Three major sources
of air pollution in Malaysia are mobile sources, stationary sources and open
burning sources. However, for the past few years, emissions from mobile sources
have been the major sources of air pollution which contribute around 70 to 75%
of total air pollution in Malaysia (Afroz et al.,
2003). Malaysia Ambient Air Quality Guidelines state that the 24 h mean
for PM_{10} concentration should not exceed 150 μg m^{3}.
The probability density function of concentration in an atmospheric plume is
an important quantity used to describe and discuss environmental diffusion (Yee
and Chan, 1997). The concentrations of air pollutants are usually correlated
with the emission levels and meteorological conditions. When the parent probability
distribution of air pollutants is correctly chosen, the specific distribution
can be used to predict the mean concentration and probability of exceeding a
critical concentration (Lu and Fang, 2003). Selecting
appropriate probability models for the data is an important step in environmental
data analysis. These probability models may become the basis for estimating
the parameters to meet the evolving information needs of environmental quality
management. The developed models also can be easily implemented for public health
protection by providing early warning to the respective population (UlSaufie
et al., 2012).
Lognormal distribution have been used extensively in atmospheric sciences to
describe phenomena that take on nonnegative values, such as storm, daily and
longerperiod rain, snow and hail amounts, particle size distributions, pollutant
concentrations, cloud dimensions, air velocity fluctuations, flood frequencies
and radio wave amplitude fluctuations. In most cases these are empirically observed
and tested fits and often other models such as the gamma distribution, are also
fitted or at least cannot be excluded as being consistent with the data (Lopez,
1977).
In Malaysia, lognormal distribution is the best distribution to represent the
PM_{10} concentration in the residential area (Sansuddin
et al., 2011). Lognormal distribution was also used to fit the PM_{10}
concentration in one of the industrial area in Malaysia which is Seberang Perai
and result shows that the lognormal distribution agrees with the data in several
years (Yusof et al., 2010). In environmental
engineering, one of the great concerns is return period. In this study, return
period is an estimate of the interval of time where the high particulate event
occurs.
Particulate matter enters the body when we breathe. Large particles can be
trapped in nose and throat and are removed when we cough or sneeze. In some
areas, particulate matter can be very heavy because of high levels of industrial
activity. This study will be concerned about coarse particles that are 10 micrometers
in aerodynamic diameter (PM_{10}) or smaller because those are the particles
that generally pass through the throat and nose and enter the lungs. Once inhaled,
these particles can affect the heart and lungs and cause serious health effects.
This research compared the performance of parameter estimator for twoparameter
and threeparameter lognormal distribution by using PM_{10} concentration
in Nilai, Negeri Sembilan, Malaysia and hence find the best distribution to
represent the PM_{10 }concentration in Nilai, Negeri Sembilan.
MATERIALS AND METHODS
Study area: Nilai is a town located in Negeri Sembilan and can be classified
as an industrial area. Geographically located at latitude 2°45'N of the
equator and longitude 102°15'E of the prime meridian, Nilai is a rapidly
growing town due to its proximity and easy connection to Kuala Lumpur. The data
used in this study is hourly PM_{10} concentration in Nilai, Negeri
Sembilan taken from the year 2003 to 2009. Missing values were replaced using
mean top bottom method where the data were filled with the average of data available
above and below the missing values (Yahaya et al.,
2005).
Lognormal distribution: Equation 1 shows the probability
density function for the two parameter lognormal distribution (Evans
et al., 2000):
where, x≥0, λ represent a scale parameter and σ represent a location
parameter for annual measurement of particular sites.
For three parameter lognormal distribution, probability density function given
by Johnson et al. (1994) is as follows:
where, x>δ;∞<σ<∞; λ>0. λ represent
the scale parameter, σ represent the location parameter and δ represent
the threshold parameter.
Parameter estimation: The lognormal distribution model was used to fit
the hourly PM_{10} concentration observed in Nilai. There are several
methods can be used to estimate the parameter of the lognormal distribution.
This study only focuses on the method of moment and method of probability weighed
moment to estimate the parameter. To estimate the parameters λ and σ
of the lognormal distribution by using the method of moments, the formula given
by Evans et al. (2000) is as follows:
The following equations are given by Hosking (1990)
where erf(x) is the error function:
Where:
and F(.) is the normal distribution function which can be evaluated using the
following equations:
Equation 5 and 6 can be solved for μ
and σ by replacing λ_{1} and λ_{2} by the sample
Lmoments l_{1} and l_{2} to give the following equations:
For the threeparameter lognormal distribution, the first two moments of the
lognormal distribution are given by Kite (1977) as follows:
the coefficient of variation of (xα), z_{2}, is also given by
Kite (1977) as below:
where, w is defined by:
where, γ_{1} in Eq. 12 is the coefficient of
skewness of the original variable x. The values of the parameters after replacing
μ'_{1}, μ_{2} and γ_{1} by their sample
estimates m'_{1}, m_{2}, and g_{1} are given by:
An approximate solution for parameter estimates by using probability weighted
moment given by Hosking (1990) is as follows:
Where:
λ_{r} is the rth Lmoment of the random variable X, l_{1}
and l_{2} are the first and second sample Lmoment.
Performance indicator: Five performance indicators are used to determine
the best estimator. The root mean square error (RMSE) summarizes the difference
between the observed and imputed concentrations and is used to provide the average
error (Junninen et al., 2004). It is defined
as:
where, N is the number of imputations, O_{i} is the observed data point
and P_{i} is the imputed data point. The root mean square error method
is the most common indicator. For a good estimator, the RMSE value must approach
zero. Therefore, a smaller RMSE value means that the model is more appropriate
(Junninen et al., 2004).
The Normalized Absolute Error (NAE) is a more sensitive measure of residual
error than RMSE (Junninen et al., 2004). It is
defined as:
where, N is the number of imputations, O_{i} is the observed data point,
P_{i} is the imputed data point. A small value for the normalized absolute
error means that the model is appropriate (Junninen et
al., 2004).
Index of agreement is define as follow (Junninen et
al., 2004):
where, N is the number of imputations, O_{i} is the observed data point,
P_{i} is the imputed data point. For a good estimator, the RMSE value
must approach the value one (Junninen et al., 2004).
The coefficient of determination is define as follow (Junninen
et al., 2004):
where, N is the number of imputations, O_{i} is the observed data point,
P_{i} is the imputed data point. For a good estimator, the RMSE value
must approach the value one (Junninen et al., 2004).
The prediction accuracy:
where, N is the number of imputations, O_{i} is the observed data point,
P_{i} is the imputed data point. For a good estimator, the RMSE value
must approach the value one (Junninen et al., 2004).
RESULTS AND DISCUSSION
Box plot for hourly PM_{10} concentration are shown in Fig.
1. As a simple graphical display, box plot is very ideal for comparisons.
The maximum PM_{10} concentrations exceed the limit (150 μg m^{3})
for all year with the maximum reading is 542 μg m^{3} in 2005.
Table 1 shows the box plots and also the descriptive statistics
for hourly PM_{10} concentration in Nilai, Negeri Sembilan from 2003
to 2009, respectively. The mean for each year are higher than the median, which
indicate the pollutants data are skewed to the right. The highest skewness is
3.59, shows that 2005 experienced high particulate events. This is likely due
to the haze event that occurred that year as an effect of the transboundary
movement of air pollutants emitted from forest fires and open burning activities
in Indonesia (Department of Environment Malaysia, 2005).
The smoke from biomass burning from regional sources also contribute to the
PM_{10} concentration reading especially during dry season (Juneng
et al., 2009). In addition, since Nilai is an industrial area, neighbouring
precursory emissions which occur as a result of local societal and also industrial
development also contribute to the variations of PM_{10} concentration
(Juneng et al., 2011).
Twoparameter lognormal distribution and threeparameter lognormal distribution
were used to fit the PM_{10} concentration in this study. The parameters
for the distributions were estimated using method of moments and method of probability
weighted moments. The best estimator and distribution was selected according
to the performance of five types of goodnessoffit criteria: Normalized Absolute
Error (NAE), Prediction Accuracy (PA), coefficient of determination (R^{2}),
Root Mean Square Error (RMSE) and index of agreement (IA). All these goodnessoffit
criteria were used to describe how well the distribution fits a set of observations.

Fig. 1: 
Characteristic for the PM_{10} data in μg m^{3}
for the monitoring site in Nilai, Negeri Sembilan 
Table 1: 
Descriptive statistics for PM_{10} concentration
for Nilai 

Table 2: 
Performance Indicators Value for Nilai, Negeri Sembilan 

Table 2 shows the value of each performance indicator for
each estimator. The best distribution representing each year can be identified
based on this goodnessoffit criteria. From the goodnessoffit criteria, threeparameter
lognormal distribution fits the PM_{10} concentration better compared
to twoparameter lognormal distribution for every year from 2003 to 2009. These
finding supported by Taylor et al. (1986) where
lognormal distribution is appropriate for particulate data. Norazian
et al. (2011) also found that lognormal distribution is very suitable
to represent the air pollutant data where lognormal distribution perform better
for PM_{10 }concentration in industrial area in Malaysia. Previous research
by Sansuddin et al. (2011) found that gamma distribution
was the best distribution to represent PM_{10} concentration in Nilai
for 2002. However, previous researches only consider the twoparameter distribution.
However, lognormal distribution has been widely used to model many kinds of
environmental contaminant data including air quality data (Gilbert,
1987).
The probability density function graphs were plotted using the values of parameter
according to the best distribution for each year, as given in Table
3. Figure 2 shows the probability density function plot
of PM_{10} concentration using the best estimator and distribution selected
through the performance indicator criteria. The probability density function
plot in Fig. 2 shows that the distribution for each year is
positively skewed with the highest skewness in 2005.
Table 3: 
Parameter for the lognormal distribution using the best method 


Fig. 2: 
The probability density function plot of PM_{10} concentration 

Fig. 3: 
The cumulative density function plot of PM_{10} concentration 
This is probably caused by the high particulate event in 2005 (Department
of Environment Malaysia, 2005). The probability density function plots also
shown that a concentration of PM_{10} exceed the threshold limit set
by Malaysia Ambient Air Quality Guidelines (150 μg m^{3}) for
each year, perhaps caused by monitoring station in Nilai are located in industrial
area. In 2008, the pdf plot shows that the PM_{10} concentrations were
lower compared to other years.
Table 4: 
The predicted and actual exceedences 

Figure 3 shows the cumulative density function plot for the
best distribution that represents the observed monitoring record from 2003 to
2009. The cumulative density function plot was used to determine the probability
of PM_{10} concentration exceeding the Malaysia Ambient Air Quality
Guideline (MAAQG). The probability of exceeding more than 150 μg m^{3}
were used to obtained the return period as shown in Table 4.
CONCLUSION
Based on the results from the analysis of PM_{10} concentration in
Nilai, Negeri Sembilan from 2003 to 2009, every year experienced high particulate
events. Twoparameter and threeparameter lognormal distribution were used to
fit the PM_{10} concentration using method of moments and method of
probability weighted moments to estimate the parameters. Results show that threeparameter
lognormal distribution is the best distribution to represent the industrial
area in Nilai, Negeri Sembilan. Predictions of the PM_{10} exceedences
were estimated using the best estimator and the best distribution for each year.
There are differences between predicted and actual values of exceedences but
the errors are small. The results of this study provide useful information on
air quality status in Nilai Negeri Sembilan and can be used for air quality
management.