HOME JOURNALS CONTACT

Journal of Environmental Science and Technology

Year: 2015 | Volume: 8 | Issue: 3 | Page No.: 102-112
DOI: 10.3923/jest.2015.102.112
Three Days Ahead Prediction of Daily 12 Hour Ozone (O3) Concentrations for Urban Area in Malaysia
Muqhlisah Muhamad, Ahmad Zia Ul- Saufie and Sayang Mohd Deni

Abstract: Ground-level ozone (O3) is a secondary pollutant and has an adverse effect on human health, agriculture and ecosystems. The aim of this study is to develop model and to predict future O3 concentrations level in Shah Alam for next day (D+1), next two days (D+2) and next three days (D+3) using traditional method of Multiple Linear Regression (MLR) based on the concept of Ordinary Least Square estimate (OLS). This study uses daily average data of air pollutants (O3, NOx, NO, SO2, NO2, CO) and meteorological variables (WS, T, RH) that was selected from 2002 until 2013 as independent variables. The performance indicator of the models are measured by accuracy measures (Prediction accuracy, Index agreement and Coefficient of determination) and error measures (Root mean square error, Normalized absolute value). The average accuracy measures (AI, PA and R2) show that the prediction for D+1, D+2 and D+3 is 0.4492, 0.3797 and 0.304 respectively. Meanwhile, the average error measures (RMSE, NAE) show that the prediction for D+1, D+2 and D+3 is 0.1453, 0.1374 and 0.1302, respectively.

Fulltext PDF Fulltext HTML

How to cite this article
Muqhlisah Muhamad, Ahmad Zia Ul- Saufie and Sayang Mohd Deni, 2015. Three Days Ahead Prediction of Daily 12 Hour Ozone (O3) Concentrations for Urban Area in Malaysia. Journal of Environmental Science and Technology, 8: 102-112.

Keywords: ordinary least square, O3 concentrations, Multiple linear regression and performance indicator

INTRODUCTION

Ground level ozone (O3) in urban areas has become a serious air pollution problem. Based on the air quality status for 2013, 31.51% of unclean air was recorded in Shah Alam. According to Department of Environmental Malaysia (DoE), Air Pollution Index (API) was dominated by O3 concentrations around afternoon until evening. The O3 is a secondary pollutant. It is not originated directly from the earth’s surface but it is formed by the chemical reaction under the influence of sunlight combining with nitrogen oxides (NOx) and Volatile Organic Compound (VOCs) (DoE., 2006). VOCs are often known as nonmethane hydrocarbons (NmHC) (Ghazali et al., 2010). On the other hand, O3 results in photochemical smog and as a trigger to human respiratory system problem (Sanna, 2009).

Various methods have been widely used from the previous study of O3. One of the popular methods in prediction of O3 concentrations level is Multiple Linear Regression (MLR). In statistical tools, analysis of regression is commonly used to analyze data. The MLR is a traditional method based on the concept of Ordinary Least Square estimate (OLS) (Ul-Saufie et al., 2012). MLR is used to present the relationship between dependent variable and independent variables (Chatterjee and Hadi, 2006).

Previous studies proved that MLR is a standard method and easy to be applied (Ul-Saufie et al., 2013). Future daily PM 10 concentrations prediction by combining regression models and feed forward back propagation models with Principle Component Analysis (PCA), in 2013. According to Barrero et al. (2006), the process involved in O3 formation could be easy to understand by using MLR. Prediction of O3 concentrations for several hours could be implemented by using MLR (Ramli et al., 2010). MLR is used to predict O3 concentrations and at the same time to be used to understand the increasing and decreasing patterns of O3 and NO2, respectively under the influenced of weather parameters (Ghazali et al., 2010).

The aim of this study is to present the result of the multiple linear regression in prediction of O3 concentrations level for the next day (D+1), the next two days (D+2) and the next three days (D+3) as the function of meteorological variables (WS, T, RH) and other pollutants concentration (NOx, NO, SO2, NO2, O3, CO).

MATERIALS AND METHODS

Study area: Monitoring station in Shah Alam is located at Taman Tun Dr. Ismail (TTDI) Jaya Primary School (N03°06.287’, E101o 33.368’) and nearby residential area. At the same time, this station is located at the main transportation area such as major road, highways and airport as well as surrounded by light industrial area (Azmi et al., 2010). Besides, Shah Alam city is located at the center of Petaling Jaya city (east) and Klang town (west) (Leh et al., 2014).

Monitoring record: The variables used in this study are ozone (O3, ppm), wind speed (WS, km h-1), ambient temperature (T, °C), relative humidity (RH, %), nitrogen oxide (NOx, ppm), nitric oxide (NO, ppm), sulphur dioxide (SO2, ppm), nitrogen dioxide (NO2, ppm) and carbon monoxide (CO, ppm). The primary data was managed by Alam Sekitar Malaysia Sendirian Berhad (ASMA) which is the private company under supervision of Department of Environmental Malaysia (DoE).

According to Ahamad et al. (2014), Ghazali et al. (2010) and Banan et al. (2013) measurements of air pollutants and meteorological variables were monitored by Teledyne Ozone Analyzer Model 400A UV Absorption (O3), Teledyne Model 200A (NOx, NO, NO2), Teledyne Model 100A (SO2), Teledyne Model 300 (CO), Met One 010C Sensor (WS), Met One 062 Sensor (T) and Met One 083D Sensor (RH). These monitoring instruments automatically record the air pollutant concentrations and meteorological variables hourly. The instruments and procedures of monitoring record is based on the method fixed by the United States Environmental Protection Agency (EPA) standard (Ghazali et al., 2010). Furthermore, the secondary data from 1st January 2002 until 31st December 2013 was obtained from Department of Environmental Malaysia (DoE).

In this study, the hourly concentrations for each variables were transformed into daily average concentrations. Eighty percent of monitoring records were randomly selected and twenty percent were used for validation of the models. The statistical software used in the data analysis are SPSS Version 20, MATLAB R2012a and Microsoft Excel 2013.

Variable selection: The variables selected in this study are based on previous study of O3 concentrations level (Table 1). The formation of O3 is a result from emission and combination of the other air pollutants through a chemical process. The main substances of O3 formation are VOCs and NOx.

Table 1: Summarization of selected variables by previous researchers
SO2: Sulphur dioxides, PM10: Particulate matter, WD: Wind direction, WV: Wind violation, SR: Solar radiation, NMHC: Nonmethane hydrocarbon

According to Department of Environment, Malaysia, 2006, VOCs are emitted from factories’ chimney, motor vehicles, industrial activities, consumers and commercial products. Meanwhile, NOx are released by motor vehicles, power plants and combustions. Most of the previous studies stated that meteorological conditions also contribute to the formation of O3 concentrations.

Meteorological variable: Khiem et al. (2010) found that the low wind speed which associated with the other meteorological conditions has a high ability to contribute O3 concentrations. Urban area has very little difference of O3 concentrations level with rural area during high wind speeds (Husar and Renard, 1997). Temperature is also one of the main factors in the O3 production and formation. The concentrations level of O3 tends to increase at high temperature (Banja et al., 2012). Besides, relative humidity could be considered as a contributor to the O3. The lack of photochemical process efficiency due to the high relative humidity has always been associated with low level of O3 concentrations (Lelieveld and Crutzen, 1990).

Air pollutant variable: According to DoE. (2014), the sources of SO2 come from power plants (50%), industrial activities (9%), motor vehicles (7%) and others (34%). The contributors of NO2 are power plants (61%), motor vehicles (26%), industrial activities (6%) and others (7%). Meanwhile, the emissions of CO are detected from motor vehicles (95.3%), power plants (3.8%), industrial activities (0.4%) and others (0.5%). These situations increase the formation of O3 concentrations level in Malaysia.

Regression analysis: Regression analysis that was used in this study is Multiple Linear Regression (MLR) based on traditional approaches of Ordinary Least Square estimate (OLS). MLR is an extension from a simple linear regression. In MLR, there are one dependent variable (response variable) and several independent variables (explanatory variables/predictors). Chatterjee and Hadi (2006) defined the general equation of MLR as follows:

(1)
Where:
y = Dependent variable (response variable)
x = Independent variable (predictor)
p = Represent values of the predictors for ith unit
β0 = Regression constant
βp = Regression coefficient

Fig. 1: Procedure for development of multiple linear regression model

Table 2: Ordinary least square assumption
Source: Chatterjee and Hadi (2006), Residual = error = ε

Montgomery et al. (2012), found that the method of least square is used to estimate parameters and MLR mostly was used as an empirical model. There are a few stages required to obtain MLR model (Fig. 1).

Stage 1, 80% of the data for each variables was randomly selected by MATLAB R2012a. Stage 2; the Variance Inflation Factor (VIF) was used to check the multicollinearity test. The model is considered to be free of multicollinearity problem if the value of VIF is less than 10 (Field, 2005). During Stage 3, check the OLS assumptions (Table 2). Then, the models are validated by performance indicator (RMSE, NAE, PA, IA, R2) using 20% complete monitoring record in stage 4. Finally in stage 5, the MLR model was obtained.

Performance indicator: Performance indicators are used to evaluate the performance models for next day (D+1), next two days (D+2) and next three days (D+3) predictions. The performance models (Table 3) are consists of accuracy measures (PA, IA and R2) and error measures (RMSE and NAE).

Table 3: Performance indicators
Source: Gervasi (2008), N: No. of sample daily measurement of a particular station, pi: Predicted value, Oi: Observed values,: Mean of the predicted values,: Mean of the predicted values of one set daily monitoring record

RESULTS AND DISCUSSION

Descriptive statistic is used to describe a situation (Bluman, 2009). Shah Alam is located at TTDI Jaya Primary School. The mean average of O3 concentration for Shah Alam is 0.032 ppm and the monitoring record is assumed to be moderately skewed with the value of 0.717. The maximum amount of O3 concentration recorded was 0.097 ppm (Table 4 and Fig. 2). This is due to open burning and smokes from vehicles (DoE., 2004). According to Department of Environment, Malaysia, 2011, the unhealthy days from year 2001 to 2012 in Klang Valley was mainly due to the high concentration level of O3. Shah Alam was recorded as having the highest number of unhealthy days except for year 2005, 2010, 2011 and 2012 (Fig. 3).

In order to investigate the correlation between O3, D+1 (for next day) and each independent variables, regression analysis was performed based on the value of correlation coefficient (R) and scatter plot. From the regression analysis for each variables (Table 5 and Fig. 4), the relationship for each variables with O3, D+1 are WS (R = -0.036), T (R = 0.155), RH (R = -0.212), NOx (0.056), NO (R = -0.018), SO2 (R = 0.121), NO2 (R = 0.136), O3 (R = 0.445) and CO (R = 0.177), where WS, RH and NO have a negative correlation with O3, D+1 and the rest of variables have a positive correlation. Table 6 shows that the correlation coefficient (R) among O3 and the other predictors from the previous studies are (Ghazali et al., 2010), R2 = 89.90% for Shah Alam and Gombak (Banan et al., 2014), for Putrajaya, NOx (R = 0.681), NO (R = -0.537), NO2 (R = -0.499), for Petaling Jaya, NOx (R = 0.515), NO (R = -0.678), NO2 (R = -0.102) and Jerantut, NOx (R = 0.416), NO (R = -0.557) NO2 (R = -0.079), Ramli, Ghazali et al. (2010), R2 for Shah Alam and Nilai are 89.7 and 89.0%, respectively. The previous studies from the world wide show (Wang et al., 2003), the value of R2 for Hong Kong is 76.16% (Agirre-Basurko et al., 2006), the value of R for NO2 is 0.88 and (Banja et al., 2012), T (R = 0.72), RH (R = -0.40) and R2 is 76%.

According to Field (2005) and Montgomery et al. (2012), the model is considered to have a problem with multicollinearity if the value of VIF is larger than 10. Since the value of VIF for variables NOx, NO and NO2 are larger than 10, thus the model of O3 for next day prediction (D+1) has multicollinearity problem (Table 7).

Fig. 2: Box and whisker plot for O3 concentrations

Fig. 3: Number of unhealthy days in Klang Valley from year 2001 until 2012 (Source: DoE., 2014)

Table 4: Descriptive statistics of O3 in Shah Alam
SD: Standard deviation

Table 5: Correlation coefficient between O3, D+1 and each variables


Fig. 4(a-i): Scatter plot of O3, D+1 versus independent variables

Table 6:
Correlation (R) and coefficient of determination (R2) between O3 concentrations level and its predictor variables

Table 7: Multicollinearity test of O3, D+1

Table 8: Multicollinearity test of O3, D+1 (Without NOx)

This is due to the presence of NOx where NOx is a result of NO and NO2 (NOx = NO+NO2) (Ghazali et al., 2010). According to this problem, NOx should not be included in this study and after the variable of NOx was truncated, the range of VIF showed that the model was free from multicollinearity problem (Table 8).

Fig. 5: Histogram and table of O3 residual: D+1

Fig. 6: Scatter plot of residual versus fitted values

Table 9: Multiple linear regression and performance indicator

The residual of O3 concentrations in Shah Alam for next day (D+1) shows that the graph of histogram has bell-shaped distributions which means that the residuals approximately normally distributed with zero mean of residual (Fig. 5). The assumption of the residual has a constant variance is satisfied when the scatter plot (Fig. 6) shows an equal spread and approach to regression line (homoscedasticity). Besides, the assumption of the residuals being uncorrelated with the independent variables is satisfied when the value of Durbin Watson is close to 2 (1.945).

The procedures from Table 8, Fig. 5 and 6 were repeated to obtain MLR model of Shah Alam for next two days (O3,D+2) and next three days (O3,D+3).

CONCLUSION

Table 9 shows the performance indicators for next day (D+1), next two days (D+2) and next three days (D+3) in Shah Alam that were obtained from the model of multiple linear regression. The average accuracy measures (AI, PA and R2) show that the prediction for D+1 is 0.4492 followed by D+2, 0.3797 and D+3, 0.304. Besides, the average error measures (RMSE, NAE) show that the prediction for D+1, D+2 and D+3 are 0.1453, 0.1374 and 0.1302, respectively. Due to the data limitation of VOCs and UVB, the value of PA, IA and R2 are not close to one but the model is still appropriate in prediction of O3 concentrations level since the value of RMSE and NAE is close to zero. This is supported by the previous study from Yousef et al., 2008, where the best linear regression model for the air pollutant of particulate matter (PM10) for dry season and wet season are 0.262 and 0.240, respectively. Therefore, these three models could be implemented for public health protection to provide early warnings to the respective populations.

ACKNOWLEDGMENT

This study was funded by Universiti Teknologi MARA, Malaysia under Grant 600-RMI/FRGS 5/3 (40/2014). Special appreciation to Department of Environmental Malaysia (DoE), Alam Sekitar Malaysia (ASMA) for providing the air quality data for this research and special thanks to Universiti Teknologi Mara, Malaysia.

REFERENCES

  • Agirre-Basurko, E., G. Ibarra-Berastegi and I. Madariaga, 2006. Regression and multilayer Perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environ. Modell. Software, 21: 430-446.
    CrossRef    Direct Link    


  • Ahamad, F., M.T. Latif, R. Tang, L. Juneng, D. Dominick and H. Juahir, 2014. Variation of surface ozone exceedance around Klang Valley, Malaysia. Atmos. Res., 139: 116-127.
    CrossRef    Direct Link    


  • Azmi, S.Z., M.T. Latif, A.S. Ismail, L. Juneng and A.A. Jemain, 2010. Trend and status of air quality at three different monitoring stations in the Klang Valley, Malaysia. Air Qual. Atmos. Health, 3: 53-64.
    CrossRef    Direct Link    


  • Banan, N., M.L. Latif, L. Juneng and F. Ahamad, 2013. Characteristics of surface ozone concentrations at stations with different backgrounds in the Malaysian Peninsula. Aerosol Air Qual. Res., 13: 1090-1106.
    Direct Link    


  • Banan, N., M. Latif, L. Juneng and M.F. Khan, 2014. An Application of Artificial Neural Networks for the Prediction of Surface Ozone Concentrations in Malaysia. In: From Sources to Solution, Aris, A.Z., T.H.T. Ismail, R. Harun, A.M. Abdullah and M.Y. Ishak (Eds.). Springer, Singapore, pp: 7-12


  • Banja, M., D. Papanastasiou, A. Poupkou and D. Melas, 2012. Development of a Short-term ozone prediction tool in Tirana area based on meteorological variables. Atmos. Pollut. Res., 3: 32-38.
    Direct Link    


  • Barrero, M.A., J.O. Grimalt and L. Canton, 2006. Prediction of daily ozone concentration maxima in the urban atmosphere. Chemom. Intell. Lab. Syst., 80: 67-76.
    CrossRef    Direct Link    


  • Bluman, A., 2009. Elementary Statistics a Step by Step Approach. McGraw Hill, New York


  • Chatterjee, S. and A.S. Hadi, 2006. Regression Analysis by Example. 4th Edn., John Wiley, New Jersey, Pages: 416


  • Delcloo, A.W. and H. de Backer, 2005. Modelling planetary boundary layer ozone, using meteorological parameters at Uccle and Payerne. Atmos. Environ., 39: 5067-5077.
    CrossRef    Direct Link    


  • DoE., 2004. Malaysia environmental quality report 2003. Department of Environment, Ministry of Natural Resources and Environment, Kuala Lumpur, Malaysia.


  • DoE., 2006. Malaysia environmental quality report 2005. Department of Environment, Ministry of Natural Resources and Environment, Putrajaya, Malaysia.


  • DoE., 2014. Malaysia environmental quality report 2013. Department of Environment, Ministry of Natural Resources and Environment, Kuala Lumpur, Malaysia.


  • Field, A.P., 2005. Discovering Statistics Using SPSS. 2nd Edn., SAGE Publication Ltd., London, ISBN-13‏:‎ 978-0761944522, Pages: 816
    Direct Link    


  • Gervasi, O., 2008. Computational Science and its Applications. Springer, New York, USA


  • Ghazali, N.A., N. Ramli, A. Yahaya, N.F. Yusof, N. Sansuddin and W. Al Madhoun, 2010. Nurul adya transformation of nitrogen dioxide into ozone and prediction of ozone concentrations using multiple linear regression. Environ. Monit. Assess., 165: 475-489.


  • Heo, J.S., K.H. Kim and D.S. Kim, 2004. Pattern recognition of high episodes in forecasting daily maximum ozone levels. Terrestrial Atmospheric Oceanic Sci., 15: 199-220.


  • Husar, R.B. and W.P. Renard, 1997. Ozone as a function of local wind direction and wind speed evidence of local and regional transport. Center for Air Pollution Impact and Trend Analysis (CAPITA), May 7, 1997.


  • Jaioun, K., K. Saithanu and J. Mekparyup, 2014. Multiple linear regression model to estimate ozone concentration in chonburi, Thailand. Int. J. Applied Environ. Sci., 9: 1305-1308.


  • Khiem, M., R. Ooka, H. Huang, H. Hayami, H. Yoshikado and Y. Kawamoto, 2010. Analysis of the relationship between changes in meteorological conditions and the variation in summer ozone levels over the Central Kanto area. Adv. Meteorol.
    CrossRef    


  • Leh, O.L.H., S.N.A.M. Musthafa and A.R.A. Rasam, 2014. Urban environmental heath: Respiratory infection and urban factors in urban growth corridor of Petaling Jaya, Shah Alam and Klang, Malaysia. Sains Malaysiana, 43: 1405-1414.
    Direct Link    


  • Lelieveld, J. and P.J. Crutzen, 1990. Influences of cloud photochemical processes on tropospheric ozone. Nature, 343: 227-233.
    CrossRef    Direct Link    


  • Montgomery, D.C., E. Peck and G. Vining, 2012. Introduction to Linear Regression Analysis. 5th Edn., John Wiley, New Jersey, ISBN-13: ISBN-13: 9780470542811, Pages: 672


  • Musa, M., A. Jemain and W.W. Zin, 2013. Scaling and persistence of ozone concentrations in Klang Valley. J. Qual. Measur. Anal., 9: 9-20.
    Direct Link    


  • Ramli, N.A., N.A. Ghazali AND A.S. Yahaya, 2010. Diurnal fluctuations of ozone concentrations and its precursors and prediction of ozone using multiple linear regressions. Malaysian J. Environ. Manage., 11: 57-69.


  • Sanna, E., 2009. Air Pollution and Health. Health and the Environment, New York


  • Schlink, U., O. Herbarth, M. Richter, S. Dorling, G. Nunnari, G. Cawley and E. Pelikan, 2006. Statistical models to assess the health effects and to forecast ground-level ozone. Environ. Modell. Software, 21: 547-558.
    CrossRef    Direct Link    


  • Ul-Saufie, A.Z., A.S. Yahaya, N.A. Ramli and H.A. Hamid, 2012. Performance of multiple linear regression model for long-term PM10 concentration prediction based on gaseous and meteorological parameters. J. Applied Sci., 12: 1488-1494.
    CrossRef    Direct Link    


  • Ul-Saufie, A.Z., A.S. Yahaya, N.A. Ramli, N. Rosaida and H.A. Hamid, 2013. Future daily PM10 concentrations prediction by combining regression models and feedforward backpropagation models with Principle Component Analysis (PCA). Atmos. Environ., 77: 621-630.
    CrossRef    Direct Link    


  • Wang, W., W. Lu, X. Wang and A.Y. Leung, 2003. Prediction of maximum daily ozone level using combined neural network and statistical characteristics. Environ. Int., 29: 555-562.
    CrossRef    Direct Link    


  • Yousef, N.F., N.A. Ghazli, N.A. Ramli, A.S. Yahia, N. Sansuddin and W.A. Al Madhoun, 2008. Correlation of PM10 concentration and weather parameters in conjunction with haze event in Seberang Perai, Penang. Proceeding of the International Conference on Construction and Building Technology, June 16-20, 2008, Kuala Lumpur, Malaysia -.

  • © Science Alert. All Rights Reserved