A Comparison of Parametric and Nonparametric Density Functions for
Estimating Annual Precipitation in Iran
P. Haghighat Jou,
The main purpose of this study is to compare parametric
density functions with nonparametric Fourier series to estimate annual
precipitation for five old rain gauge stations (Bushehr, Isfahan, Meshed,
Tehran and Jask) in Iran. The parametric density functions include normal,
two and three parameters log-normal, two parameter gamma, Pearson and
log-Pearson type 3 and Gumbel extreme value type 1. The nonparametric
approach is Fourier series method. Annual precipitation data from the
mentioned stations were fitted to all density functions including Fourier
series. Results showed that the Fourier series predict annual precipitation
much better than other parametric methods. Thus, the Fourier series can
be used as a better alternative approach for precipitation frequency analysis.
To design structure size and reservoir capacity of a single or multiple
dam system for flood control, water storage or release and power production,
volume of annual flood is needed for this multiple use of water. Since,
in many areas, rivers are not gauged to monitor annual runoff, this term
is usually calculated from the annual precipitation. Estimation of annual
precipitation is essential in many other aspects such as watershed management,
water resources management, water requirement for macro scale irrigation
design, flood and drought studies and monitoring of climate change. All
these aspects are fully or partly engaged in annual precipitation.
Annual precipitation in Iran is mostly occurred between November and
April for a period of 6 months, with annual mean equal to 250 mm. Climate
of Iran is categorized as arid and semi-arid. Precipitation frequency
analysis is generally carried out using parametric methods in which a
statistical distribution such as normal, two and three parameter log-normal,
two parameter gamma, Pearson and log-Pearson type 3 and Gumbel extreme
value type 1 are fitted to annual series of data. These methods have been
successfully applied in many cases, but, have some disadvantages because
of not fitting to the observed data very well or diverting from the extreme
tails. Some other problems involved in selection of these methods, are
difficulties to choose the best distribution function and estimation of
their parameters (particularly for skewed data).
Against parametric approaches, several nonparametric methods such as variable
kernel method (Lall et al., 1993) have been introduced
in recent years to estimate probability density function and distribution function
of hydrologic events. Guo (1991) proposed a nonparametric
variable kernel estimation model which provides an alternative way in flood
quantile estimation when historical floods data are available. It is shown that
the nonparametric kernel estimator fitted the real data points closer than its
parametric counterparts. Gingras and Adamowski (1992)
applied both L-moments and nonparametric frequency analysis on the annual maximum
floods. By coupling nonparametric frequency analysis with L-moment analysis,
it is possible to confirm the L-moment selection of unimodal distribution, or
to determine that the sample is actually from a mixed distribution. Thus, the
nonparametric method helps to identify the underlying probability distribution,
particularly when samples arise from a mixed distribution. Moon
et al. (1993) compared selected techniques for estimating exceedance
frequencies of annual maximum flood events at a gaged site. They applied four
tail probability and a variable kernel distribution function estimators and
concluded that the variable kernel estimator appears useful because it automatically
gives stable and accurate flood frequency estimates without requiring a distributional
assumption. Adamowski (1996) developed a nonparametric
method for low-flow frequency analysis and compared with two commonly used parametric
methods, namely, log-Pearson Type 3 and Weibull distributions. The numerical
analysis indicates that the nonparametric method fits better the data and gives
more accurate results than currently used parametric methods. Adamowski
(2000) applied a Gaussian (normal) kernel function for regional analysis
of Annual Maximum (AM) and Partial Duration (PD) flood data by nonparametric
and L-moment methods. The results pointed out deficiencies in currently used
parametric approaches for both AM and PD series, since traditional regional
flood frequency analysis procedures assume that all floods within a homogeneous
region are generated by the same, often unimodal distribution, while this is
not always true and the data series may be multimodal. Faucher
et al. (2002) compared the performance of parametric and nonparametric
methods in estimation of flood quantiles. The log-Pearson type 3, two parameter
lognormal and generalized extreme value distributions were used to fit the simulated
samples. It was found that nonparametric methods perform quite similarly to
the parametric methods. They compared six different kernel functions include
biweight, normal, Epanechnikov, extreme value type 1, rectangular and Cauchy.
They found no major differences between the first four above mentioned kernels.
Ghayour and Asakereh (2005) used the Fourier series method
to estimate monthly temperature of Meshed. The results showed that this method
can properly fit to both continuous and discrete time series of data and can
reveal the trend of them. Behnia and Haghighat Jou (2007)
applied Fourier series to estimate annual flood probability of the Great Karoun
river flowing southwest of Iran. Then, the predicted results from the application
of this method were compared to results of seven parametric methods including
normal, two and three parameter log- normal, two parameter gamma, Pearson and
log-Pearson type 3 and Gumbel extreme value type 1. Results of this comparison
showed a better ability for Fourier series method. Karmakar
and Simonovic (2008) used a 70 years flood peak flow, volume and duration
data set and applied nonparametric methods based on kernel density estimation
and orthonormal series to determine the nonparametric distribution functions
for Red River in US. They selected the subset of the Fourier series consisting
of cosine functions as orthonormal series. They found that nonparametric method
based on orthonormal series is more appropriate than kernel estimation for determining
marginal distributions of flood characteristics as it can estimate the probability
distribution function over the whole range of possible values.
Fourier series is also used in this study, however, for annual precipitation
frequency analysis. Results of this proposed nonparametric method will
be compared to the results of above mentioned parametric methods using
the same annual data sets of precipitation from five rain gauge stations
in Iran. Then, the more accurate method will be introduced.
MATERIALS AND METHODS
This study was conducted from October 2007 at the Department of Hydrology
and Water Resources, Faculty of Water Sciences Engineering, Shahid Chamran
University, Ahwaz, Iran and currently is going on.
Climate of Iran
Iran is located in the middle east region, between Caspian sea at
North and Persian gulf and Oman sea, at South of it. This area is situated
in northern hemisphere and surrounded by eastern longitude of 44 to 63°
and northern latitude of 25 to 40° Iran has a variable climate. In
the northwest, winters are cold with heavy snowfall and subfreezing temperatures
from November to February. Spring and fall are relatively mild, while,
summers are dry and hot. Winters are mild in southern area and summers
are very hot. There is a wide range of temperature from -36°C in the
north to 54°C in the south. On the Khuzestan plain which is at southwest,
heat is accompanied by high humidity. In general, climate of Iran is categorized
as arid or semi-arid in which most of precipitation falls from November
through April. In most of the country, yearly precipitation averages 240
mm or less. The major exceptions are the higher mountain valleys of the
Zagros and the Caspian coastal plain, where, precipitation averages at
least 500 mm annually. In the western part of the Caspian sea, rainfall
exceeds 1000 mm annually and is distributed relatively throughout the
year. This contrasts with some basins of the Central Plateau that receive
100 mm or less of precipitation annually. Precipitation varies widely,
from less than 100 mm in the southeast to about 2000 mm in the Caspian
region. The northern and western parts of Iran have four distinct seasons.
Toward the south and east, spring and autumn become increasingly short
and ultimately merge in an area of mild winters and hot summers.
Sources of Precipitation
There are five distinct sources of precipitation over Iran which are
westerly winds blowing from Mediterranean sea, southwesterly winds flow
from the Horn of Africa and northern winds which flow from Siberia. These
winds produce rainfall on northwestern, western and southwestern parts
of the country in winter. Southeasterly Monsoon winds blowing from Indian
Ocean, which produce scanty and scattered rainfalls on southeast in summer
and northerly winds blowing from Caspian sea which only produce relatively
heavy rainfall on littoral provinces i.e., Gilan, Mazandaran and Golestan
throughout the year.
Rain Gauge Stations Selection
The annual precipitation data from five old rain gauge stations in
Iran were selected to be analyzed. These stations include Bushehr, Isfahan,
Meshed, Tehran and Jask. Figure 1 shows geographical
location of the stations on the map of Iran. There are many Synoptic and
Meteorological stations in Iran, but, the mentioned stations were selected
because they have long length records. The record lengths of these stations
range between 84 to 113 years. The data were collected from two references
including World Weather Records and Meteorological year books of Iran
which are published by Iranian Meteorological Organization. Data up to
year 1960 were collected from the first reference and the rest of them
up to year 2004 were collected from the second reference. The sample sizes
of data and date of establishment for each of the stations are given in
Table 1 and the geographical characteristics of the
stations are presented in Table 2. The statistical characteristics
of annual precipitation data for these five stations are listed in Table
3 to be used in proposed methods.
||Geographical location of rain gauge stations on the
map of Iran
||The sample sizes and date of establishment of stations
||Geographical characteristics of the stations
||Basic statistics of annual precipitation for the old
Fourier Series Method
A description of the distributions and parameters estimation methods are
not presented in this study, because, they are available in other publications
such as Kite (1988), Haan (1977),
Rao and Hamed (1999), Yue et al.
(1999), Yue (2000) and El Adlouni
et al. (2008). Therefore, only Fourier series is described here.
Kronmal and Tarter (1968) proposed the Fourier series
method as a feasible nonparametric approach to estimate probability density
and distribution functions. To design density estimators, orthogonal functions
were developed by Devroye and Gyorfi (1985). For an orthogonal
system, probability density function,
and cumulative distribution,
can be considered as follows (Kronmal and Tarter, 1968):
where, Ã¢k and Ã‚k are computed from a random sample
of size n, which have the following property: E(Ã¢k) = ak
and E(Ã‚) = Ak. Specifically, for Fourier estimators of the density
and distribution functions ak and Ak are the kth Fourier
coefficient with respect to orthogonal functions Ïˆk (x). Kronmal
and Tarter (1968) expressed the coefficients ak and Ak
in terms of the trigonometrical moments and
, for all kâ‰¥0:
and zero elsewhere. Then, the estimators
where, is the mean of n samples and [a, b] is the interval of interest. To
determine the optimal number of Fourier terms, Kronmal and
Tarter (1968 ) suggested that the mth term should be included if:
In this study, due to this fact that incorporating higher Fourier terms
improves the goodness of fit to observed data, all of the data were applied
for numerical analysis. In this study for comparison of the parametric
and nonparametric methods, the mean relative deviation (MRD) and the Mean
Square Relative Deviation (MSRD) were used to measure the goodness of
fit of above mentioned methods. These statistical terms are defined as
where, x and
are the observed and estimated annual precipitation respectively
and n is sample size. In addition, observed and estimated data will be
compared graphically as well.
Annual precipitation data from five mentioned rain gauge stations
in Iran were fitted to seven parametric functions and Fourier series as
a nonparametric approach to compare their ability for proper fitness.
For parametric methods, their parameters were estimated by both methods
of moments and maximum likelihood procedures, then using Eq.
8 and 9 to calculate MRD and MSRD values. Table
4, 5 lists these values which range from 2.233 to
65.968 and from 6.998 to 122493.250, respectively. For nonparametric method,
the values of x were calculated using Eq. 6. Then, the
values of MRD and MSRD were calculated using as x in Eq.
8 and 9. Range of these values are from 0.350 to
2.630 and from 0.280 to 84.004, respectively and all of them are cited
in last columns of Table 4, 5 under
abbreviation of Fourier series (FS). The quantiles at return periods were
estimated by the best fitted parametric distribution and Fourier series
functions to each data set. These values were obtained for return period`s
from 1.0101 through 100 years and are presented in Table
||Values of MRD using parametric methods and Fourier series
|1Parameters were estimated by the method
of moments. 2Parameters were estimated by the maximum likelihood
procedure. N = Normal, P3 = Pearson type 3, LN2 = 2 par. log-normal,
LP3 = log-Pearson type 3, LN3 = 3 par. log-normal, G = Gumbel extreme
value type 1, G2 = 2 par. Gamma, FS = Fourier series method
||Values of MSRD using parametric methods and Fourier
||Annual precipitation quantiles estimated by parametric
distributions and the Fourier series method
RESULTS AND DISCUSSION
According to Table 4, 5 and comparing
the values of MRD and MSRD, it is impossible to choose a unique parametric
distribution function for fitting to all data sets of stations. For example,
the best fitted distributions for stations are ordered so that Pearson
type 3 fits to Bushehr data ( due to its high skewness), log-Pearson type
3 fits to Isfahan and Jask data (due to their lower skewness) and three
parameters log-normal fits to Meshed and Tehran data (due to their low
skewness). However, considering the values of MRD and MSRD for the eight
methods, it is undoubtedly apparent that the Fourier series method fits
to all data sets much better than other parametric distributions and its
selection among other methods is easy with due regard to very low values
of MRD and MSRD. Hence, the Fourier series method is easy to apply and
can be used as a robust approach both in fitting to data and estimation
of quantiles. Figure 2 and 3 show
the fitting of annual precipitation from Jask station and the relative
goodness of fit is similar to those for the other stations.
In spite of parametric distributions which assume that the data follow certain
models, the Fourier series have no such restrictions. Therefore, there are two
advantages for application of Fourier series over parametric approaches and
the aims of current study are achieved. The achieved results of this study are
also similar to that of Behnia and Haghighat Jou (2007)
at which they concluded that the Fourier series fits to the data better than
other parametric methods as mentioned in introduction. However, they used Fourier
series for Great Karoun river annual flood probability analysis in Iran. Furthermore,
Karmakar and Simonovic (2008) used a 70 years flood peak
flow, volume and duration data set for Red River in US and selected the subset
of the Fourier series consisting of cosine functions as orthonormal series.
They found that the method based on orthonormal series is more appropriate than
kernel estimation for determining marginal distributions of flood characteristics
as it can estimate the probability distribution function over the whole range
of possible values. Also, we applied Fourier series as a nonparametric approach
for annual precipitation probability analysis in Iran. Although, our results
conform the results of Behnia and Jou (2007) and Karmakar
and Simonovic (2008) however, our findings are for annual precipitation
over Iran not for floods. Therefore, from this point of view, results are significant
and unique findings of the current research in the country.
||Observed and fitted Fourier series for Jask station
||Observed and fitted log-Pearson type 3 for Jask station
Annual precipitation data sets from five old rain gauge stations (Bushehr,
Isfahan, Meshed, Tehran and Jask) in Iran were fitted to seven parametric
density functions (normal, two and three parameter log-normal, two parameter
gamma, Pearson and log-Pearson type 3 and Gumbel extreme value type 1)
and Fourier series. The aim of these fittings was to compare parametric
density functions with nonparametric Fourier series to estimate annual
precipitation for the mentioned stations. To do this, the values of MRD
and MSRD for both approaches were calculated and compared. Since the minimum
values were belonged to Fourier series approaches, concluded that this
method estimates the annual precipitation for the stations much better
than the other methods. Also, users will not be confused with selection
of a unique approach to be applicable for all data sets. These are two
advantages for application of Fourier series over parametric approaches
to estimate annual precipitation in Iran.
Adamowski, K., 1996.
Nonparametric estimation of low-flow frequencies. J. Hydraulic Eng., 122: 46-50.CrossRef | Direct Link |
Adamowski, K., 2000.
Regional analysis of annual maximum and partial duration flood data by nonparametric and L-moment methods. J. Hydrol., 229: 219-231.CrossRef |
Behnia, A.K. and P.H. Jou, 2007.
Estimating Karoun annual flood probabilities using fourier series method. Proceeding of the 7th International River Engineering Conference, February 13-15, 2007, Shahid Chamran University, Ahwaz, Iran, pp: 1-7Direct Link |
Devroye, L. and L. Gyorfi, 1985.
Nonparametric Density Estimation: The L1
view. 1st Edn., John Wiley and Sons, New York, ISBN:0471816469
El Adlouni, S., B. Bobée and T.B.M.J. Ouarda, 2008.
On the tails of extreme event distributions in hydrology. J. Hydrol., 355: 16-33.CrossRef | Direct Link |
Faucher, D., P.F. Rasmussen and B. Bobee, 2002.
Nonparametric estimation of quantiles by the kernel method (In French). J. Water Sci., 15: 515-541.Direct Link |
Ghayour, H.A. and H. Asakereh, 2005.
Application of the fourier models in estimating monthly temperature and its predictability. Case Study: Meshed Temperature Geographical Research Quarterly, No. 77, pp: 83-98 (In Persian).
Gingras, D. and K. Adamowski, 1992.
Coupling of nonparametric frequency and L-moment analyses for mixed distribution identification. J. Am. Water Resourc. Associat., 28: 263-272.Direct Link |
Guo, S.L., 1991.
Nonparametric variable kernel estimation with historical floods and paleoflood information. Water Resourc. Res., 27: 91-98.CrossRef | Direct Link |
Haan, C.T., 1977.
Statistical Methods in Hydrology. 1st Edn., Iowa State University Press, Ames. Iowa, ISBN: 0-8138-1510-X, pp: 378Direct Link |
Karmakar, S. and S.P. Simonovic, 2008.
Bivariate flood frequency analysis using copula with parametric and nonparametric marginals. Proceedings of the 4th International Symposium on Flood Defence: Managing Flood Risk, Reliability and Vulnerability, May 6-8, 2008, Toronto, Ontario, Canada, pp: 1-50Direct Link |
Kim, K.D. and J.H. Heo, 2002.
Comparative study of flood quantiles estimation by nonparametric models. J. Hydrol., 260: 176-193.CrossRef |
Kite, G.W., 1988.
Frequency and Risk Analysis in Hydrology. 4th Edn., Water Resources Publications Littleton, Colorado 80161-2841, USA., ISBN: 0-918334-64-0, pp: 257
Kronmal, R. and M. Tarter, 1968.
The estimation of probability densities and cumulative by fourier series method. J. Am. Statist. Assoc., 63: 925-952.Direct Link |
Lall, U., Y.I. Moon and K. Bosworth, 1993.
Kernel flood frequency estimators: Bandwidth selection and Kernel choice. Water Resourc. Res., 29: 1003-1016.CrossRef | Direct Link |
Moon, Y.I., U. Lall and K. Bosworth, 1993.
A comparison of tail probability estimators for flood frequency analysis. J. Hydrol., 151: 343-363.CrossRef | Direct Link |
Park, B.U. and J.S. Marron, 1990.
Comparison of data driven bandwidth selectors. J. Am. Statist. Assoc., 85: 66-72.Direct Link |
Rao, A.R. and K.H. Hamed, 1999.
Flood Frequency Analysis. 1st Edn., CRC Press, Boca Raton, FL., ISBN-10: 0849300835, Pages: 376
Yue, S., T.B.M.J. Ouarda, B. Bobee, P. Legendre and P. Bruneau, 1999.
The gumbel mixed model for flood frequency analysis. J. Hydrol., 226: 88-100.CrossRef | Direct Link |
Yue, S., 2000.
The gumbel logistic model for representing a multivariate storm event. Adv. Water Resourc., 24: 179-185.CrossRef |