A Comparison of Parametric and Nonparametric Density Functions for 
        Estimating Annual Precipitation in Iran

Haghighat Jou, P.; Akhoond-Ali, A.M.; Behnia, A.; Chinipardaz, R.

Research Article

A Comparison of Parametric and Nonparametric Density Functions for Estimating Annual Precipitation in Iran

P. Haghighat Jou
Department of Hydrology and Water Resources,Faculty of Water Sciences Engineering, Shahid Chamran University, Ahwaz, Iran

A.M. Akhoond-Ali
Faculty of Water Sciences Engineering, Shahid Chamran University, Ahwaz, Iran

A. Behnia
Faculty of Water Sciences Engineering, Shahid Chamran University, Ahwaz, Iran

R. Chinipardaz
Department of Statistics, Faculty of Mathematical Sciences and Computer, Shahid Chamran University, Ahwaz, Iran

ABSTRACT

The main purpose of this study is to compare parametric density functions with nonparametric Fourier series to estimate annual precipitation for five old rain gauge stations (Bushehr, Isfahan, Meshed, Tehran and Jask) in Iran. The parametric density functions include normal, two and three parameters log-normal, two parameter gamma, Pearson and log-Pearson type 3 and Gumbel extreme value type 1. The nonparametric approach is Fourier series method. Annual precipitation data from the mentioned stations were fitted to all density functions including Fourier series. Results showed that the Fourier series predict annual precipitation much better than other parametric methods. Thus, the Fourier series can be used as a better alternative approach for precipitation frequency analysis.

PDF Abstract XML References Citation

INTRODUCTION

To design structure size and reservoir capacity of a single or multiple dam system for flood control, water storage or release and power production, volume of annual flood is needed for this multiple use of water. Since, in many areas, rivers are not gauged to monitor annual runoff, this term is usually calculated from the annual precipitation. Estimation of annual precipitation is essential in many other aspects such as watershed management, water resources management, water requirement for macro scale irrigation design, flood and drought studies and monitoring of climate change. All these aspects are fully or partly engaged in annual precipitation.

Annual precipitation in Iran is mostly occurred between November and April for a period of 6 months, with annual mean equal to 250 mm. Climate of Iran is categorized as arid and semi-arid. Precipitation frequency analysis is generally carried out using parametric methods in which a statistical distribution such as normal, two and three parameter log-normal, two parameter gamma, Pearson and log-Pearson type 3 and Gumbel extreme value type 1 are fitted to annual series of data. These methods have been successfully applied in many cases, but, have some disadvantages because of not fitting to the observed data very well or diverting from the extreme tails. Some other problems involved in selection of these methods, are difficulties to choose the best distribution function and estimation of their parameters (particularly for skewed data).

Against parametric approaches, several nonparametric methods such as variable kernel method (Lall et al., 1993) have been introduced in recent years to estimate probability density function and distribution function of hydrologic events. Guo (1991) proposed a nonparametric variable kernel estimation model which provides an alternative way in flood quantile estimation when historical floods data are available. It is shown that the nonparametric kernel estimator fitted the real data points closer than its parametric counterparts. Gingras and Adamowski (1992) applied both L-moments and nonparametric frequency analysis on the annual maximum floods. By coupling nonparametric frequency analysis with L-moment analysis, it is possible to confirm the L-moment selection of unimodal distribution, or to determine that the sample is actually from a mixed distribution. Thus, the nonparametric method helps to identify the underlying probability distribution, particularly when samples arise from a mixed distribution. Moon et al. (1993) compared selected techniques for estimating exceedance frequencies of annual maximum flood events at a gaged site. They applied four tail probability and a variable kernel distribution function estimators and concluded that the variable kernel estimator appears useful because it automatically gives stable and accurate flood frequency estimates without requiring a distributional assumption. Adamowski (1996) developed a nonparametric method for low-flow frequency analysis and compared with two commonly used parametric methods, namely, log-Pearson Type 3 and Weibull distributions. The numerical analysis indicates that the nonparametric method fits better the data and gives more accurate results than currently used parametric methods. Adamowski (2000) applied a Gaussian (normal) kernel function for regional analysis of Annual Maximum (AM) and Partial Duration (PD) flood data by nonparametric and L-moment methods. The results pointed out deficiencies in currently used parametric approaches for both AM and PD series, since traditional regional flood frequency analysis procedures assume that all floods within a homogeneous region are generated by the same, often unimodal distribution, while this is not always true and the data series may be multimodal. Faucher et al. (2002) compared the performance of parametric and nonparametric methods in estimation of flood quantiles. The log-Pearson type 3, two parameter lognormal and generalized extreme value distributions were used to fit the simulated samples. It was found that nonparametric methods perform quite similarly to the parametric methods. They compared six different kernel functions include biweight, normal, Epanechnikov, extreme value type 1, rectangular and Cauchy. They found no major differences between the first four above mentioned kernels.

Ghayour and Asakereh (2005) used the Fourier series method to estimate monthly temperature of Meshed. The results showed that this method can properly fit to both continuous and discrete time series of data and can reveal the trend of them. Behnia and Haghighat Jou (2007) applied Fourier series to estimate annual flood probability of the Great Karoun river flowing southwest of Iran. Then, the predicted results from the application of this method were compared to results of seven parametric methods including normal, two and three parameter log- normal, two parameter gamma, Pearson and log-Pearson type 3 and Gumbel extreme value type 1. Results of this comparison showed a better ability for Fourier series method. Karmakar and Simonovic (2008) used a 70 years flood peak flow, volume and duration data set and applied nonparametric methods based on kernel density estimation and orthonormal series to determine the nonparametric distribution functions for Red River in US. They selected the subset of the Fourier series consisting of cosine functions as orthonormal series. They found that nonparametric method based on orthonormal series is more appropriate than kernel estimation for determining marginal distributions of flood characteristics as it can estimate the probability distribution function over the whole range of possible values.

Fourier series is also used in this study, however, for annual precipitation frequency analysis. Results of this proposed nonparametric method will be compared to the results of above mentioned parametric methods using the same annual data sets of precipitation from five rain gauge stations in Iran. Then, the more accurate method will be introduced.

MATERIALS AND METHODS

This study was conducted from October 2007 at the Department of Hydrology and Water Resources, Faculty of Water Sciences Engineering, Shahid Chamran University, Ahwaz, Iran and currently is going on.

Climate of Iran
Iran is located in the middle east region, between Caspian sea at North and Persian gulf and Oman sea, at South of it. This area is situated in northern hemisphere and surrounded by eastern longitude of 44 to 63° and northern latitude of 25 to 40° Iran has a variable climate. In the northwest, winters are cold with heavy snowfall and subfreezing temperatures from November to February. Spring and fall are relatively mild, while, summers are dry and hot. Winters are mild in southern area and summers are very hot. There is a wide range of temperature from -36°C in the north to 54°C in the south. On the Khuzestan plain which is at southwest, heat is accompanied by high humidity. In general, climate of Iran is categorized as arid or semi-arid in which most of precipitation falls from November through April. In most of the country, yearly precipitation averages 240 mm or less. The major exceptions are the higher mountain valleys of the Zagros and the Caspian coastal plain, where, precipitation averages at least 500 mm annually. In the western part of the Caspian sea, rainfall exceeds 1000 mm annually and is distributed relatively throughout the year. This contrasts with some basins of the Central Plateau that receive 100 mm or less of precipitation annually. Precipitation varies widely, from less than 100 mm in the southeast to about 2000 mm in the Caspian region. The northern and western parts of Iran have four distinct seasons. Toward the south and east, spring and autumn become increasingly short and ultimately merge in an area of mild winters and hot summers.

Sources of Precipitation
There are five distinct sources of precipitation over Iran which are westerly winds blowing from Mediterranean sea, southwesterly winds flow from the Horn of Africa and northern winds which flow from Siberia. These winds produce rainfall on northwestern, western and southwestern parts of the country in winter. Southeasterly Monsoon winds blowing from Indian Ocean, which produce scanty and scattered rainfalls on southeast in summer and northerly winds blowing from Caspian sea which only produce relatively heavy rainfall on littoral provinces i.e., Gilan, Mazandaran and Golestan throughout the year.

Rain Gauge Stations Selection
The annual precipitation data from five old rain gauge stations in Iran were selected to be analyzed. These stations include Bushehr, Isfahan, Meshed, Tehran and Jask. Figure 1 shows geographical location of the stations on the map of Iran. There are many Synoptic and Meteorological stations in Iran, but, the mentioned stations were selected because they have long length records. The record lengths of these stations range between 84 to 113 years. The data were collected from two references including World Weather Records and Meteorological year books of Iran which are published by Iranian Meteorological Organization. Data up to year 1960 were collected from the first reference and the rest of them up to year 2004 were collected from the second reference. The sample sizes of data and date of establishment for each of the stations are given in Table 1 and the geographical characteristics of the stations are presented in Table 2. The statistical characteristics of annual precipitation data for these five stations are listed in Table 3 to be used in proposed methods.


Fig. 1:	Geographical location of rain gauge stations on the map of Iran

Table 1:	The sample sizes and date of establishment of stations

Table 2:	Geographical characteristics of the stations

Table 3:	Basic statistics of annual precipitation for the old stations

Fourier Series Method
A description of the distributions and parameters estimation methods are not presented in this study, because, they are available in other publications such as Kite (1988), Haan (1977), Rao and Hamed (1999), Yue et al. (1999), Yue (2000) and El Adlouni et al. (2008). Therefore, only Fourier series is described here. Kronmal and Tarter (1968) proposed the Fourier series method as a feasible nonparametric approach to estimate probability density and distribution functions. To design density estimators, orthogonal functions were developed by Devroye and Gyorfi (1985). For an orthogonal system, probability density function, and cumulative distribution, can be considered as follows (Kronmal and Tarter, 1968):

(1)

and

(2)

where, Ã¢_k and Ã‚_k are computed from a random sample of size n, which have the following property: E(Ã¢_k) = a_k and E(Ã‚) = A_k. Specifically, for Fourier estimators of the density and distribution functions a_k and A_k are the kth Fourier coefficient with respect to orthogonal functions Ïˆ_k (x). Kronmal and Tarter (1968) expressed the coefficients a_k and A_k in terms of the trigonometrical moments and , for all kâ‰¥0:

(3)

(4)

where, and zero elsewhere. Then, the estimators are:

(5)

(6)

where, is the mean of n samples and [a, b] is the interval of interest. To determine the optimal number of Fourier terms, Kronmal and Tarter (1968 ) suggested that the mth term should be included if:

(7)

In this study, due to this fact that incorporating higher Fourier terms improves the goodness of fit to observed data, all of the data were applied for numerical analysis. In this study for comparison of the parametric and nonparametric methods, the mean relative deviation (MRD) and the Mean Square Relative Deviation (MSRD) were used to measure the goodness of fit of above mentioned methods. These statistical terms are defined as follows:

(8)

(9)

where, x and are the observed and estimated annual precipitation respectively and n is sample size. In addition, observed and estimated data will be compared graphically as well.

Data Analysis
Annual precipitation data from five mentioned rain gauge stations in Iran were fitted to seven parametric functions and Fourier series as a nonparametric approach to compare their ability for proper fitness. For parametric methods, their parameters were estimated by both methods of moments and maximum likelihood procedures, then using Eq. 8 and 9 to calculate MRD and MSRD values. Table 4, 5 lists these values which range from 2.233 to 65.968 and from 6.998 to 122493.250, respectively. For nonparametric method, the values of x were calculated using Eq. 6. Then, the values of MRD and MSRD were calculated using as x in Eq. 8 and 9. Range of these values are from 0.350 to 2.630 and from 0.280 to 84.004, respectively and all of them are cited in last columns of Table 4, 5 under abbreviation of Fourier series (FS). The quantiles at return periods were estimated by the best fitted parametric distribution and Fourier series functions to each data set. These values were obtained for return period`s from 1.0101 through 100 years and are presented in Table 6.

Table 4:	Values of MRD using parametric methods and Fourier series

¹Parameters were estimated by the method of moments. ²Parameters were estimated by the maximum likelihood procedure. N = Normal, P3 = Pearson type 3, LN2 = 2 par. log-normal, LP3 = log-Pearson type 3, LN3 = 3 par. log-normal, G = Gumbel extreme value type 1, G2 = 2 par. Gamma, FS = Fourier series method

Table 5:	Values of MSRD using parametric methods and Fourier series

Table 6:	Annual precipitation quantiles estimated by parametric distributions and the Fourier series method

RESULTS AND DISCUSSION

According to Table 4, 5 and comparing the values of MRD and MSRD, it is impossible to choose a unique parametric distribution function for fitting to all data sets of stations. For example, the best fitted distributions for stations are ordered so that Pearson type 3 fits to Bushehr data ( due to its high skewness), log-Pearson type 3 fits to Isfahan and Jask data (due to their lower skewness) and three parameters log-normal fits to Meshed and Tehran data (due to their low skewness). However, considering the values of MRD and MSRD for the eight methods, it is undoubtedly apparent that the Fourier series method fits to all data sets much better than other parametric distributions and its selection among other methods is easy with due regard to very low values of MRD and MSRD. Hence, the Fourier series method is easy to apply and can be used as a robust approach both in fitting to data and estimation of quantiles. Figure 2 and 3 show the fitting of annual precipitation from Jask station and the relative goodness of fit is similar to those for the other stations.

In spite of parametric distributions which assume that the data follow certain models, the Fourier series have no such restrictions. Therefore, there are two advantages for application of Fourier series over parametric approaches and the aims of current study are achieved. The achieved results of this study are also similar to that of Behnia and Haghighat Jou (2007) at which they concluded that the Fourier series fits to the data better than other parametric methods as mentioned in introduction. However, they used Fourier series for Great Karoun river annual flood probability analysis in Iran. Furthermore, Karmakar and Simonovic (2008) used a 70 years flood peak flow, volume and duration data set for Red River in US and selected the subset of the Fourier series consisting of cosine functions as orthonormal series. They found that the method based on orthonormal series is more appropriate than kernel estimation for determining marginal distributions of flood characteristics as it can estimate the probability distribution function over the whole range of possible values. Also, we applied Fourier series as a nonparametric approach for annual precipitation probability analysis in Iran. Although, our results conform the results of Behnia and Jou (2007) and Karmakar and Simonovic (2008) however, our findings are for annual precipitation over Iran not for floods. Therefore, from this point of view, results are significant and unique findings of the current research in the country.


Fig. 2:	Observed and fitted Fourier series for Jask station


Fig. 3:	Observed and fitted log-Pearson type 3 for Jask station

CONCLUSION

Annual precipitation data sets from five old rain gauge stations (Bushehr, Isfahan, Meshed, Tehran and Jask) in Iran were fitted to seven parametric density functions (normal, two and three parameter log-normal, two parameter gamma, Pearson and log-Pearson type 3 and Gumbel extreme value type 1) and Fourier series. The aim of these fittings was to compare parametric density functions with nonparametric Fourier series to estimate annual precipitation for the mentioned stations. To do this, the values of MRD and MSRD for both approaches were calculated and compared. Since the minimum values were belonged to Fourier series approaches, concluded that this method estimates the annual precipitation for the stations much better than the other methods. Also, users will not be confused with selection of a unique approach to be applicable for all data sets. These are two advantages for application of Fourier series over parametric approaches to estimate annual precipitation in Iran.

REFERENCES

Adamowski, K., 1996. Nonparametric estimation of low-flow frequencies. J. Hydraulic Eng., 122: 46-50.
CrossRef Direct Link
Adamowski, K., 2000. Regional analysis of annual maximum and partial duration flood data by nonparametric and L-moment methods. J. Hydrol., 229: 219-231.
CrossRef
Behnia, A.K. and P.H. Jou, 2007. Estimating Karoun annual flood probabilities using fourier series method. Proceeding of the 7th International River Engineering Conference, February 13-15, 2007, Shahid Chamran University, Ahwaz, Iran, pp: 1-7.
Direct Link
Devroye, L. and L. Gyorfi, 1985. Nonparametric Density Estimation: The L₁ view. 1st Edn., John Wiley and Sons, New York, ISBN:0471816469.
El Adlouni, S., B. Bobée and T.B.M.J. Ouarda, 2008. On the tails of extreme event distributions in hydrology. J. Hydrol., 355: 16-33.
CrossRef Direct Link
Faucher, D., P.F. Rasmussen and B. Bobee, 2002. Nonparametric estimation of quantiles by the kernel method (In French). J. Water Sci., 15: 515-541.
Direct Link
Ghayour, H.A. and H. Asakereh, 2005. Application of the fourier models in estimating monthly temperature and its predictability. Case Study: Meshed Temperature Geographical Research Quarterly, No. 77, pp: 83-98 (In Persian).
Gingras, D. and K. Adamowski, 1992. Coupling of nonparametric frequency and L-moment analyses for mixed distribution identification. J. Am. Water Resourc. Associat., 28: 263-272.
Direct Link
Guo, S.L., 1991. Nonparametric variable kernel estimation with historical floods and paleoflood information. Water Resourc. Res., 27: 91-98.
CrossRef Direct Link
Haan, C.T., 1977. Statistical Methods in Hydrology. 1st Edn., Iowa State University Press, Ames. Iowa, ISBN: 0-8138-1510-X, pp: 378.
Direct Link
Karmakar, S. and S.P. Simonovic, 2008. Bivariate flood frequency analysis using copula with parametric and nonparametric marginals. Proceedings of the 4th International Symposium on Flood Defence: Managing Flood Risk, Reliability and Vulnerability, May 6-8, 2008, Toronto, Ontario, Canada, pp: 1-50.
Direct Link
Kim, K.D. and J.H. Heo, 2002. Comparative study of flood quantiles estimation by nonparametric models. J. Hydrol., 260: 176-193.
CrossRef
Kite, G.W., 1988. Frequency and Risk Analysis in Hydrology. 4th Edn., Water Resources Publications Littleton, Colorado 80161-2841, USA., ISBN: 0-918334-64-0, pp: 257.
Kronmal, R. and M. Tarter, 1968. The estimation of probability densities and cumulative by fourier series method. J. Am. Statist. Assoc., 63: 925-952.
Direct Link
Lall, U., Y.I. Moon and K. Bosworth, 1993. Kernel flood frequency estimators: Bandwidth selection and Kernel choice. Water Resourc. Res., 29: 1003-1016.
CrossRef Direct Link
Moon, Y.I., U. Lall and K. Bosworth, 1993. A comparison of tail probability estimators for flood frequency analysis. J. Hydrol., 151: 343-363.
CrossRef Direct Link
Park, B.U. and J.S. Marron, 1990. Comparison of data driven bandwidth selectors. J. Am. Statist. Assoc., 85: 66-72.
Direct Link
Rao, A.R. and K.H. Hamed, 1999. Flood Frequency Analysis. 1st Edn., CRC Press, Boca Raton, FL., ISBN-10: 0849300835, Pages: 376.
Yue, S., T.B.M.J. Ouarda, B. Bobee, P. Legendre and P. Bruneau, 1999. The gumbel mixed model for flood frequency analysis. J. Hydrol., 226: 88-100.
CrossRef Direct Link
Yue, S., 2000. The gumbel logistic model for representing a multivariate storm event. Adv. Water Resourc., 24: 179-185.
CrossRef

Research Journal of Environmental Sciences