INTRODUCTION
The PM_{10} particulate matter with an aerodynamic diameter of less than 10 μm, is identified as a type of air pollution that causes the greatest concern to public health and also the environment. This pollutant is also the main air pollutant exists during the haze events in Malaysia since 1980^{1,2}. Besides that, this pollutant can result in short and long term health impacts. The presence of PM_{10} in ambient air may cause severe health effects such as asthma, throat irritation, respiratory problems and even hospital admission^{3}. There are five major sources of PM_{10} emissions: Power plants and heat, motor vehicle exhausts, industrial sources and open burning^{4}. However, the most predominant sources of PM_{10} emissions in Malaysia are heavy traffic and industrial emissions^{5}.
Statistical modelling has allowed environmental authorities to carry out daily air pollution forecasts since this model provides a good insights in short term predictions of future air pollution levels. The regression models and Artificial Neural Network (ANN) are commonly used in predicting the PM_{10} concentrations by previous studies^{68}. Besides that, Central Fitting Distributions (CFD) and Extreme Value Distributions (EVD) can also yield good results to fit the mean concentrations of PM10^{9}. There have been many efforts made in monitoring this air pollutant. However, the sequence of polluted and nonpolluted days affected by PM_{10} still receives less attention among researchers. A Markov chain model is dependent on its previous state and this model is highly suited to the pattern of observations. Hence, once the patterns are identified, it is possible to predict the possibility of future events based on the information of previous day event.
Markov chain models are intended to be simple models requiring only two parameters and fitting various aspects of occurrence patterns. Simple Markov chain models are widely used in describing the sequences of daily rainfall occurrences all over the world including Malaysia Chin^{10}, Deni et al.^{11} and Gabriel and Neumann^{12}. The use of this method is also beneficial in describing the sequences of daily PM_{10} occurrences due to this pollutant being controlled by weather conditions and showing similar persistence^{13}. Rahimi et al.^{14} used Markov chain models in order to study the persistence of days affected by PM_{10} in Tehran and found that the first order of two states Markov chain models had a good fitting on the data of five selected stations. Furthermore, this model had been applied to other air pollution data such as those in studies by Lin^{13} and Lin and Huang^{15}.
Even though there are still few studies on PM_{10} concentration that apply this model, the advantage of using a Markov chain model as a model suitable for future prediction considering previous events, have made this model useful to be applied in this study. Subsequently, this model is also frequently used to forecasts the weather at some future time by given the current state as reported by previous studies such as Mangaraj et al.^{16}, Deni et al.^{11} and Chin^{10}.
Although, the first order of Markov chain models is simple and the calculation is easier than the higher order, but according to Chin^{10}, a Markov chain model cannot be assumed to always be one because sometimes it is inadequate to give an appropriate model. Besides that, since the effect of being exposed to PM_{10} is more than 24 h as reported by World Health Organization (WHO)^{17}, thus, there is a need to use a higher order of Markov chain models in describing the sequence of PM_{10} concentration (consider more than one previous events) in order to obtain a better prediction of PM_{10} occurrences and improving the quality of reports from the data generations. Thus, in this study, simple and higher orders are considered. The aim of this study was to determine the optimum order of Markov chain model in describing the sequence of polluted (nonpolluted) days of PM_{10} concentration in Shah Alam, Malaysia.
MATERIALS AND METHODS
The air quality in Malaysia is monitored by the Department of Environment (DoE) through 52 continuous monitoring stations. These stations are strategically located in order to detect any significant changes of air quality in that area. This study considers the PM_{10} concentration data from Shah Alam, an urban area. SekolahKebangsaan Raja Muda, Shah Alam, is where the monitoring station was placed. The coordinates of this monitoring station reads 3.08° North latitude and 101.51° East longitude. The main contributor of PM_{10} concentration in this area is the emissions from motor vehicles, since Shah Alam is the state capital of Selangor, Malaysia and due to the increasing number of vehicles, as well as rapid urban development^{18}.
Twelve years worth of PM_{10} concentration data provided by DoE from year 2002 until year 2013 were used to achieve the objective of this study. In this study, a polluted day is defined as a day when the PM_{10} concentration exceeds the threshold value, while a nonpolluted day is defined as a day when the PM_{10} concentration is less than the threshold value. For example, a day with PM_{10} concentration of more than 50 μg m^{–3} is a polluted day if the threshold value is 50 μg m^{–3} and if the value is less than 50 μg m^{–3}, it is considered a nonpolluted day. The threshold values considered in this study are 50 μg m^{–3} (WHO guideline), 52 μg m^{–3} (Background concentration of PM_{10} at this station); 100, 120 and 150 μg m^{–3} (New Malaysia ambient air quality standard^{19}). The purpose of using various levels of threshold values is to investigate the effect of these values with the optimum order of Markov chain model.
Table 1: 
Transition probabilities of the occurrence of PM_{10} concentration for Shah Alam station with threshold value of 50 μg m^{–3} 

The example of calculation in order to achieve the aim of this paper is illustrated at each section. For the example of calculation, all the values used were based on the 12 years data of PM_{10} concentration at Shah Alam monitoring station with threshold value of 50 μg m^{–3} and the transitions probabilities of the occurrence of PM_{10} concentration is shown in Table 1.
Markov chain property: The purpose of checking the Markov chain property is to statistically test whether or not the successive events are independent. Furthermore, according to Moon et al.^{20}, the successive events can form or possess Markov chain models when the events are dependent on each other. As for the statistics, α is defined as in Eq. 1 if the successive events are independent:
where, the P_{ij} denotes the conditional probability of the jth day event depends on the ith day event and P._{j} is the probability of the jth day event. Equation 1 is distributed a symptomatically as χ^{2} with degree of freedom of (m1)^{2}. The m is the total number of state (in this case: m = 2) andthe marginal probabilities for jth column of the transition probabilities (P._{j}):
For example, to obtain the value of the statistics, α for threshold value of 50 μg m^{–3}, the calculation is shown as given below and all the values used in the calculation are obtained from the transition probabilities:
Determination the optimum order of Markov chain models for occurrence of PM_{10} concentration: The sequence of polluted (nonpolluted) days of daily PM_{10} concentration is denoted as X_{1}, X_{2}, X_{3},..., X_{t},...X_{n}, for narbitrary days. A twostate Markov chain model was considered in this study where one denotes a polluted day 1 and 0 denotes a nonpolluted day 0. The sequence of polluted (nonpolluted) days is assumed to follow a Markov chain of a first order at time t, when X_{1}t depends on previous events, X_{t1}. Thus, the two conditional probabilities of the first order can be given by P_{10} = (X_{t} = 0∣X_{t1} = 1) and P_{11} = P (X_{t} = 1∣X_{t1} = 1). The transition probability matrix P, which describes the 2state Markov chain model is as in Eq. 3^{16}:
where, P_{ij} = P(X_{1} = j∣X_{0} = i) i. j = 0,1.
Note that P_{00}+P_{01} = 1 and P_{11}+P_{10} = 1. The definition of the conditional probabilities is as follows:
P_{00}: 
The probability of a day being nonpolluted given that the previous day was also a nonpolluted day 
P_{01}: 
The probability of a day being polluted given that the previous day was a nonpolluted day 
P_{10}: 
The probability of a day being nonpolluted given that the previous day was a polluted day 
P_{11}: 
The probability of a day being polluted given that the previous day was also a polluted day 
As for the assumption that the Markov chain is stationary, the transition probabilities of the k^{th} order are as in Eq. 4 and the joint probability distribution for X_{1}, X_{2}, X_{3},..., X_{t},...X_{n} is as in Eq. 5^{10} :
Akaike’s Information Criteria (AIC) and Bayesian Information Criteria (BIC) are two decision criteria which commonly used by the researchers in describing the optimum order of the Markov chain models. The PM_{10 }concentration is determined when the minimum loss function is obtained. For instance, Berchtold and Raftery^{21}, Singh and Kripalani^{22}, Deni et al.^{11} and Dastidar et al.^{23} applied these two loss functions in their studies. Both criteria are based on the likelihood functions for the transition probabilities of the fitted Markov chain models. The maximum likelihood function for the k^{th} order chain can be written as:
where, is the estimated transition probabilities of each of the sequence going from state s_{1} to s_{2}, from state s_{2} to s_{3} and from state s_{k1 }to s_{k} (s_{k} is the state of the most recent observation). The denotes the associated transition counts. The maximum likelihood estimator used in Eq. 7 of the transition probabilities is given by:
The maximum likelihood computed is used to decide the optimum order of two different Markov chain models, say the Markov chain models of the k^{th} and m^{th} orders where, k<m and k = 0, 1,..., m1. Thus the maximized likelihood ratio statistics is given by:
Where:
Table 2: 
Loss function, AIC and BIC values of Shah Alam monitoring station 

AIC: Akaike’s information criteria, BIC: Bayesian information criteria, *Minimum loss function 
For example, the maximized likelihood ratio statistics for _{0}H_{1} is calculated where k = 0 and m = 1. The parameter estimation (P_{00}, P_{11}, P_{10} and P_{01}) is then substituted into the equation and the value of _{0}H_{1} is given by:
As stated earlier, in determining the optimum order of Markov chain model, two loss functions are used, namely AIC and BIC. Tong^{24} proposed that the loss function (AIC) is to define risk on the basis of the AIC criteria, while the BIC criteria, introduced by Schwarz^{25} is to define risk on the basis of the BIC criteria. The only difference between these two criteria is the form of the penalty function. These criteria attempt to find the value of k that minimizes the loss function. The equation of AIC and BIC are as in Eq. 9 and 10, respectively:
where, v = (s^{m}s^{k}) (s1) is the degree of freedom, s represents the number of states which in this case is 2 (polluted and nonpolluted) and n is the number of the sample size. For example, for the value of AIC and BIC when k = 0 and m = 1 is given as follows:
All the values obtained from the loss function of AIC and BIC in determining the optimum order of Markov chain model for a threshold value of 50 μg m^{–3} are presented in Table 2. The comparison between the minimum values of the loss function was done in order to choose the optimum order. For example, based on Table 2, the minimum values for both functions are at order three, which means that the optimum order for this threshold is at third order of Markov chain model. Many studies had also used these two loss functions in determining the optimum order of Markov chain model^{11,2123}.
Fitting the higher order of Markov chain model: The information obtained from the transition probabilities was also used to calculate the frequency distribution of the order of Markov chain model, which was used to assess the performance of the higher order. The first order of Markov chain model was considered only for one preceding day and, similarly for the second order, the observed day depends on two preceding days and as does the other order. The joint probabilities of the k^{th} order of the Markov chain model is the following^{11}:
n>k+1; k = 0, 1.....
The conditional probabilities of events of n polluted days with the k^{th} order of the Markov chain can be written as Eq. 12^{11}. From this equation, [n] represents n times. For example, the conditional probability of two consecutive polluted days can be written as P(011∣0). The expected number of polluted days is computed by multiplying the conditional probabilities obtained from Eq. 12 with the total number of nonpolluted days. For instance, the number of nonpolluted days of this station for a threshold value of 50 μg m^{–3} is 1981 days (Table 1). The chisquare test with a degree of freedom of d = v1 was employed in this study to compare the observed and expected distributions of polluted events:
In assessing the best fitted higher order Markov chain model, the expected distribution which was closer to the observed distribution of polluted events was considered. The information from the transition probabilities was used to calculate the frequency. For example, the conditional probabilities of the first order of Markov chain model for polluted events in Shah Alam with a threshold value of 50 μg m^{–3} are given by:
The calculation of the conditional probabilities was continued until the maximum duration of polluted days for a sequence of polluted days was met. For example, the maximum number of polluted days for the Shah Alam station was 25 days for the 12 years worth of data used. Then, to get the expected frequencies of the first and higher orders, the conditional probabilities obtained from Eq. 12 were multiplied with the number of nonpolluted days as mentioned earlier.
RESULTS AND DISCUSSION
Characteristics of PM_{10} concentration: The descriptive statistics of PM_{10} concentration data in Shah Alam monitoring station are shown in Table 3. Based on the Table 3, the maximum PM_{10} concentration value of 587 μg m^{–3} had occurred at Shah Alam monitoring station in August (11/8/2005) which may due to the haze event caused by transboundary pollution from Kalimantan and Sumatera in Indonesia^{26}. The background concentration of PM_{10} is based on the median value, thus, the value of 52 μg m^{–3} was used for threshold value based on background concentration. The average daily of PM_{10} concentration from year 2002 until 2013 as illustrated in Fig. 1. Figure 1 shows that, higher values recorded were between 161st231st days due to the occurrence of the dry season (Southwest monsoon) in Malaysia that occurred in the months of June until August^{9}.
Table 3: 
Descriptive statistics of PM_{10} concentration data in Shah Alam monitoring station 


Fig. 1: 
Average daily of PM_{10} concentration from year 2002 until 2013 
Table 4: 
Conditional probabilities of the sequence of polluted and nonpolluted days of PM_{10} concentration and the values of α for all threshold values 

The conditional probabilities of the sequence of polluted and nonpolluted days of PM_{10} concentration and the values of α for all threshold values which obtained from Eq. 1 is shown in Table 4, respectively. These values were used to check whether the successive events (polluted or nonpolluted) are independent of each other or not. The events could form Markov chain models or possess Markov chain properties if the events were dependent on each other. Table 4 shows that, the values of α for all threshold values used show that the successive events are dependent on each other and possess the Markov chain property where the value of α is larger than χ^{2} with a value of 3.84 at a 5% level with 1° of freedom. Therefore, future analysis could be done since the events of PM_{10} concentration occurrence are dependent on each other. The values of the conditional probabilities that were used to analyse the persistency of the events in the area of study are shown in Table 4. Based on the Table 4, the conditional probabilities for both events show an increasing values where these values indicate the strong relationship between the observed and the previous event, which also means that the probabilities of getting polluted or nonpolluted day influencing by previous events is higher regardless of threshold value used. For example, the probability of getting a polluted day based on the previous day that was also a polluted day (P_{11}), is greater than the unconditional probability (P_{1}). Indirectly, it also means that the higher persistency of polluted days indicates the occurrence of two or more consecutive polluted days for a given threshold value. Besides, the probability of two consecutive polluted days (P_{11}) is found higher than the probability of polluted day (P_{1}), which may be due to the behavior of PM_{10} concentration dependency since the effect of PM_{10} according to WHO is more than 24 h.
Optimum order of Markov chain model in describing the sequence of PM_{10} concentration at Shah Alam monitoring station: The AIC and BIC values for Shah Alam monitoring Station with different threshold values of PM_{10} concentration is shown in Table 5. Table 5 shows the threshold value less than 120 μg m^{–3}, the higher order or an order of more than one is optimum for both AIC and BIC, which means that the occurrence of polluted (nonpolluted) events for this station is dependent on the events of two or three days before the observed day. However, according to Katz^{27}, the AIC has the tendency to overestimate the optimum order. It was also found that the BIC estimate for rainfall data of the Tel Aviv data is unity. Besides that, Dastidar et al.^{23} stated the use of the BIC also gives a mathematical formulation with a principle of parsimony in model building. Thus, this study considers the optimum order obtained from the minimum loss function of the BIC. The table also shows an order of three is optimal for threshold values of less than 100 μg m^{–3}, which best describes the sequence. While for a threshold value of 120 μg m^{–3}, the optimum order is two. As for a threshold value of 150 μg m^{–3}, a simple order is the optimum order that best describes the sequence of polluted (nonpolluted) days of PM_{10} concentration at this station. Thus, it can be concluded that the higher order is more appropriate in describing the sequence of polluted (nonpolluted) days of PM_{10} concentration at this station for threshold value less than 120 μg m^{–3}.
Besides that, the results obtained also show that there are less dependency on previous events when the threshold value is increasing, which indicates that it is not accurately predict occurrences of PM_{10} concentration when the threshold value used is more than 120 μg m^{–3} for this station. Indirectly, this study also suggests the reason why the limit or threshold value of PM_{10} should be revised to suit with Malaysia nowadays so that better prediction based on previous events can be made for early precaution to the public and the environment, as suggested by DoE^{28}.
Table 5: 
AIC and BIC values for Shah Alam monitoring station with different threshold values of PM_{10} concentration 

*Minimum loss function 
Appropriate order for polluted events: Since the third order is found to be the optimum order in the sequence of PM_{10} concentrations, therefore, the investigation on the fitting will be carry out further by considering the distribution of polluted events at this monitoring station. The observed and expected frequency distribution is computed as shown in Table 6. The chisquare goodness of fit test is considered as to select the most successful and the best fitted model for each threshold used. All the expected frequencies are more than 5 days and met the assumption of chisquare test. Table 6 shows that, at α = 0.05 level of significant, there is enough evidence to conclude that the observed and expected days of polluted events at threshold value of 50 and 52 μg m^{–3} for first and second order of Markov chain model is seems not satisfy in representing the distributions of polluted event at this station since the null hypothesis is rejected. However, the threshold value of 50 and 52 μg m^{–3} for third order produce better fit since the chisquare statistics value is lower than other order and the observed and expected days of polluted events also well describe the distribution. This study also can conclude that higher order (order three) produce better fit than other order since the value of chisquare produced is lower than other order at all threshold value used.
Figure 2 provides the observed and expected frequencies for the threshold value used in this study based on the appropriate order of the first three orders of the Markov chain models, known as order of one (MCM1), order of two (MCM2) and order of three (MCM3) for the distribution of the polluted events at this station. However, the results obtained show that the order more than one (MCM2 and MCM3) are the most appropriate order that best describe the distribution of the polluted events at Shah Alam monitoring station. The observed and expected frequencies of polluted events based on the best fitted order of Markov chain model at threshold values of 50, 52, 100 and 120 μg m^{–3} is shown in Fig. 2, respectively. Obviously, the expected frequencies of polluted events obtained from the order really describe the observed distributions, since among χ^{2}, χ^{2} for order three (MCM3) is the best fitted order for threshold values of 50, 52 and 100 μg m^{–3}, while for a threshold value of 120 μg m^{–3}, the distribution of polluted events does fit at the order of two (MCM2). Besides that, Fig. 2 also shows the number of polluted events decreasing when the threshold value of PM_{10} increases. In conclusion, threshold value used is also important in determining the optimum order and also describing the distribution of polluted events at this monitoring station.
Table 6: 
Chisquare test, degree of freedom and pvalue of observed and expected length of polluted days for Shah Alam monitoring station 

*pvalue <α = 0.05, **Smallest value of χ^{2} for each threshold value 

Fig. 2(ad): 
Observed and expected frequencies of polluted events based on the best fitted order of Markov chain model at threshold values of (a) 50 μg m^{–3}, (b) 52 μg m^{–3}, (c) 100 μg m^{–3} and (d) 120 μg m^{–3} 
CONCLUSION
This study has successfully discussed a complete description of the occurrence of PM_{10} concentration at the Shah Alam monitoring station by using Markov chain model. For threshold value of less than 120 μg m^{–3}, the optimum order for this station is order of 2 and 3. This results indicate that the occurrence of polluted (nonpolluted) days depends on the two or three days before the observed day and early prediction can be made by responsible authorities as they can predict the event of two or three days prior. As for a threshold value of 150 μg m^{–3}, an order of one is the optimum order, which indicates that prediction can be made by referring to the day before the observed day. As a conclusion, the higher order Markov chain model is appropriate in making the prediction of PM_{10} concentrations based on the minimum loss function at this monitoring station.
SIGNIFICANCE STATEMENTS
This study discover the future prediction of PM_{10} concentration events by considering the previous day events which beneficial in monitoring the effect of PM_{10} concentrations at the area of study. This study will help the researcher or authorities to provide the necessary information and early prediction of PM_{10} occurrences at particular area. Therefore, the effect of PM_{10} concentrations may be reduce by taking early precaution especially to high risk people such as children and elderly people.
ACKNOWLEDGMENT
The authors are greatly thankful to Universiti Teknologi Mara (UiTM), under Grant 600RMI/IRAGS 5/3 (36/2015). Special recognition goes to the Department of Environment (DoE) and Alam Sekitar Malaysia Sdn. Bhd (ASMA) for providing the air quality data for this study.