INTRODUCTION
The estimation of monthly precipitation is essential in the water resources
and water supply planning, irrigation and drainage system design, in agriculture,
crop water requirements and monitoring climate change. Precipitation in
Iran is mostly occurred between November and April, with annual mean equal
to 250 mm. The climate of Iran is arid and semiarid. Precipitation frequency
analysis is generally carried out using parametric methods in which a
statistical distribution such as normal (N), two parameter lognormal
(LN2), three parameter lognormal (LN3), two parameter gamma (G2), Pearson
type III (P3), logPearson type III (LP3) and Gumbel extreme value type
I (G) are used to fit the available data for frequency analysis and estimation
of rare events. These methods have been successfully applied in many cases,
but have some disadvantages because of not fitting to the observed data
very well, or diverting from the extreme tails. Some other conditions
that may cause problems with parametric methods are involved in difficulties
of estimation of the best parameters for these approaches particularly
for skewed data.
To estimate the probability density function and distribution function
of hydrologic events, several nonparametric methods such as variable kernel
method (Lall et al., 1993) have been introduced in recent years.
Guo (1991) proposed a nonparametric variable kernel estimation model which
provides an alternative way in flood quantile estimation when historical
floods data are available. It is shown that the nonparametric kernel estimator
fitted the real data points closer than its parametric counterparts. Gingras
and Adamowski (1992) applied both Lmoments and nonparametric frequency
analysis on the annual maximum floods. By coupling nonparametric frequency
analysis with Lmoment analysis, it is possible to confirm the Lmoment
selection of unimodal distribution, or to determine that the sample is
actually from a mixed distribution. Thus, the nonparametric method helps
to identify the underlying probability distribution, particularly when
samples arise from a mixed distribution. Moon et al. (1993) compared
selected techniques for estimating exceedance frequencies of annual maximum
flood events at a gaged site. They applied four tail probability and a
variable kernel distribution function estimators and concluded that the
variable kernel estimator appears useful because it automatically gives
stable and accurate flood frequency estimates without requiring a distributional
assumption. Adamowski (1996) developed a nonparametric method for lowflow
frequency analysis and compared with two commonly used parametric methods,
namely, logPearson Type III and Weibull distributions. The numerical
analysis indicates that the nonparametric method better fits the data
and gives more accurate results than currently used parametric methods.
Adamowski (2000) applied a Gaussian (normal) kernel function for regional
analysis of annual maximum (AM) and partial duration (PD) flood data by
nonparametric and Lmoment methods. The results pointed out deficiencies
in currently used parametric approaches for both AM and PD series, since
traditional regional flood frequency analysis procedures assume that all
floods within a homogeneous region are generated by the same, often unimodal
distribution, while this is not always true and the data series may be
multimodal. Also, Kim and Heo (2002) employed this nonparametric Gaussian
(normal) kernel function, however for their comparative study of flood
quantiles estimation by applying seven bandwidth selectors of Rule of
thumb (ROT), Least squares crossvalidation (LSCV), Jones, Marron and
Park crossvalidation (JMP), Smoothed crossvalidation (SCV), Biased crossvalidation
(BCV), Park and Marron plugin (PM) and Sheater and Jones plugin (SJ).
They concluded that among seven bandwidth selectors, the relative biases
of SJ were the smallest in most cases. Faucher et al. (2002) compared
the performance of parametric and nonparametric methods in estimation
of flood quantiles. The logPearson type III, two parameter lognormal
and generalized extreme value distributions were used to fit the simulated
samples. It was found that nonparametric methods perform quite similarly
to the parametric methods. They compared six different kernel functions
include biweight, normal, Epanechnikov, extreme value type I, rectangular
and Cauchy. They found no major differences between the first four above
mentioned kernels. Behnia and Jou (2007) applied Fourier series to estimate
annual flood probability of the Great Karoun river flowing southwest of
Iran. Then, the predicted results from the application of this method
were compared to results of seven parametric methods include normal, two
and three parameter log normal, two parameter gamma, Pearson and logPearson
type III and Gumbel extreme value type I distributions. Results of this
comparison showed a better ability for Fourier series method. Karmakar
and Simonovic (2008) used nonparametric methods based on kernel density
estimation and orthonormal series to determine the nonparametric distribution
functions for peak flow, volume and duration. They selected the subset
of the Fourier series consisting of cosine functions as orthonormal series.
They found that nonparametric method based on orthonormal series is more
appropriate than kernel estimation for determining marginal distributions
of flood characteristics as it can estimate the probability distribution
function over the whole range of possible values.
In the current study a Gaussian (normal) kernel function is performed
on a series of monthly precipitation from five old raingauge stations
in Iran. Results of the performance of this proposed nonparametric method
will be then compared to the seven above mentioned parametric treatments
to illustrate which method fit better to the data.
MATERIALS AND METHODS
This study was started from October 2007 at the Department of Hydrology
and Water Resources, Faculty of Water Sciences Engineering, Shahid Chamran
University, Ahwaz, Iran.
Sources of precipitation over Iran: There are five distinct sources
of precipitation over Iran which includes westerly winds blowing from
Mediterranean Sea, southwesterly winds flow from the Horn of Africa and
northern winds which flow from Siberia. These winds produce rainfall on
northwestern, western and southwestern parts of the country in winter.
Southeasterly Monsoon winds blowing from Indian Ocean, which produce scanty
and scattered rainfalls on southeast in summer and northerly winds blowing
from Caspian sea which only produce relatively heavy rainfall on littoral
provinces i.e., Gilan, Mazandaran and Golestan throughout the year.
Sources of data: The monthly precipitation data from five old
raingauge stations in Iran were selected to be analyzed. These stations
include Bushehr, Isfahan, Meshed, Tehran and Jask. Figure 1 shows geographical
location of the stations on the map of Iran. There are many Synoptic and
Meteorological stations in Iran, but the mentioned stations were selected
because they have long length records. The record lengths of these stations
range between 84 to 113 years. The data were collected from two sources
including World Weather Records and Meteorological year books of Iran
which are published by Iranian Meteorological Organization. Data up to
year 1960 were collected from the first source and the rest of them up
to year 2004 were collected from the second source. The sample sizes of
data and date of establishment for each of the stations are given in Table
1 and the geographical characteristics of the stations are shown in
Table 2.
Table 1: 
The sample sizes and date of establishment of stations 

Table 2: 
Geographical characteristics of the stations 


Fig. 1: 
Geographical location of rain gauge stations on the
map of Iran 
Table 3: 
Mean monthly precipitation (mm) for the old stations 

The statistical characteristics of monthly
precipitation data for the 5 stations are shown in Table
3 to be used in proposed methods.
A description of the distributions and parameter estimation methods are
not presented in this study, because they are available in other publications
such as (Kite, 1988; Haan, 1977; Rao and Hamed, 1999). Therefore, only
nonparametric kernel density function estimation is described here.
The kernel method: In the kernel method, a function k(x) is associated
with each observation in a sample. The main requirement to k is:
The nonparametric density function is constructed from the set of kernels
as:
where, n is the sample size, k is the kernel function and h is a parameter
that determines the degree of smoothing and is called bandwidth or smoothing
parameter. The kernel may be interpreted as a weight function that represents
the weight of each observation in the estimation of the density at x.
The kernel distribution function is the integration of the density function
(Eq. 2) as:
Where:
The kernel distribution function may serve to estimate quantiles corresponding
to a given probability of exceedance. For example, in the hydrological
context, one may be interested in determining the flood with a return
period of T years, that is:
where, F^{1} represents the inverse of distribution function,
F. In practice the value of x must be determined by solving Eq.
5 numerically.
In principle, all classical probability density functions like the Gaussian
(normal) or the Cauchy distributions are candidates for kernel functions.
Other types of functions subject to certain constraints could also be
considered. Some authors have argued that the kernel choice is not critical
compared to the choice of smoothing parameter.
In this study, a Gaussian or normal kernel (Gingras and Adamowski, 1992)
has been used and is given by:
Choosing the smoothing parameter: The problem of choosing the
value of smoothing parameter is of crucial in density estimation. One
strategy for selecting the smoothing parameter is to begin with a large
bandwidth and to decrease the amount of smoothing until the fluctuations
start to appear. Too large an h value will lead to a unimodal nonparametric
density regardless of the multimodality of the data while too small one
will lead to distorted multimodal density shape regardless of the unimodality
of the data. This approach is viable but there are also many cases where
it is beneficial to have the bandwidth automatically selected from the
data. In this study, four bandwidth selectors such as rule of thumb (ROT),
Adamowski criterion (AC), least squares crossvalidation (LSCV), Sheater
and Jones plugin (SJ) are employed. The optimal bandwidth for a kernel
density estimate is typically calculated based on an estimate for the
integrated square error (ISE) as follow:
And its expected value, the MISE is given by:
where, the first integral is integrated variance (IV) and the second
integral is integrated squared bias (IB). Hence the optimal bandwidths
depend on the unknown density for deriving of f.
Rule of thumb: The ROT was proposed to minimize the asymptotic
mean integrated square error (AMISE) (Silverman, 1986). The best tradeoff
between asymptotic variance and bias is given by:
where, h is the minimizer of the AMISE and R(f^{(2)}) is the
only unknown. Assuming the unknown distribution to be normal with parameter
μ (population mean) and σ (standard deviation of values), the
estimate of h_{–} for a Gaussian kernel was done as:
The advantage of ROT is that it provides a very practical method of bandwidth
selection while the disadvantage is that the bandwidth is wrong if the
population is not normally distributed.
Adamowski criterion for bandwidth selection: Adamowski (1989)
proposed the following formula for computing the smoothing parameter
where, x_{i} and x_{j} are order statistics of observation
and N is sample size.
Least square crossvalidation: The Least square crossvalidation
function is defined by Rudemo (1982) and Bowman (1985) as:
where,
is the density estimate based on the entire data set except for the ith
observation.
Sheater and Jones plugin: Sheater and Jones (1991) reconsidered
the problem of estimating R(f^{(2)}). They used the same idea
as Park and Marron (1990) but with bandwidth as follow:
where, R(f^{(2)}) and R(f^{(3)}) can be estimated by
and
and
both bandwidth g_{1} and g_{2} are determined by asymptotic
optimal values and only in this step the normal probable density function
(PDF) is used as a reference probability model. This improves the method
of Park and Marron (1990) but is not the best achievable rate yet. In
this study for comparison of the parametric and nonparametric methods,
the mean relative deviation (MRD) and the mean square relative deviation
(MSRD) were used to measure the goodness of fit of above mentioned methods,
these statistical terms are defined as follows:
where, x and are
the observed and calculated monthly precipitation and n is sample size.
In addition, observed and estimated data will be compared graphically
as well.
Data analysis: One of the most common characteristics of the monthly
precipitation data from the mentioned raingauge stations is the character
of their positive skewness. Histogram of January precipitation for Boushehr
station during 114 years is shown in Fig. 2 as an example.
Monthly precipitation data from five old raingauge stations in Iran were
fitted to the mentioned parametric and nonparametric methods to compare
the performance results of these various approaches.
For analysis of data by the nonparametric kernel function, the values
of smoothing parameter (h) for all months and all stations were calculated
by the mentioned four methods explained in the previous section at which
the LSCV method resulted in the minimum values for this parameter. Therefore,
the results of this method were selected to calculate MRD and MSRD values
for analysis of data by the nonparametric kernel function. Table
4 shows all of the smoothing parameter (h) values for Boushehr station
as an example and the selected minimum values resulted from LSCV method
are shown in Table 59 under the parameter
h for Boushehr, Isfahan, Meshed, Tehran and Jask stations respectively
to be used for calculation of MRD and MSRD for kernel function.

Fig. 2: 
Histogram of January precipitation over Boushehr station
during 114 years showing its positive skewness 
Table 4: 
Values of smoothing parameter, h for various months
resulted from various bandwidth selector methods for Boushehr station 

Table 5: 
Values of MRD and MSRD for best parametric method and
nonparametric normal kernel for Boushehr station 

Table 6: 
Values of MRD and MSRD for best parametric method and
nonparametric normal kernel for Isfahan station 

Table 7: 
Values of MRD and MSRD for best parametric method and
nonparametric normal kernel for Meshed station 

Table 8: 
Values of MRD and MSRD for best parametric method and
nonparametric normal kernel for Tehran station 

Table 9: 
Values of MRD and MSRD for best parametric method and
nonparametric normal kernel for Jask station 

For analysis of data by the parametric methods first, their parameters
were calculated by both the methods of moments and maximum likelihood
procedure then, MRD and MSRD were calculated for all months and all stations.
HYFA program was used for all calculations. Table 5
through Table 9 shows the minimum values of MRD and
MSRD for the best parametric methods (which are the methods with lowest
MRD and MSRD values).
RESULTS AND DISCUSSION
According to Table 4 and comparing the values of smoothing
parameter calculated from four methods, i.e., ROT, AC, LSCV, SJ for Boushehr
station, it is evident that the least values of bandwith results obtained
by LSCV evolve the least values of MRD and MSRD. Also, the results for
other stations in all months are similar to Boushehr. In general, for
all stations, in 82.7%, the least values of bandwidth resulted from LSCV
method, in 15.4% resulted from AC and in 1.9% resulted from SJ. These
results differ from the results of Kim and Heo (2002) because they obtained
the smallest values of the relative bias for SJ smoothing parameter in
most cases as mentioned in the introduction. However, it should be mentioned
that their conclusion was for flood quantiles estimation not for precipitation.
Due to the positive skewness character of monthly precipitation data for all
raingauge stations (see Fig. 2 as an example) and regarding
to the Table 59 and comparing the values
of MRD and MSRD for all methods, the best fitted distributions to the data obtained.
In accordance to this comparison, logpearson type III in 61.5% of cases, two
parameter gamma in 25% of cases and three parameter lognormal in 13.5% of cases
resulted the first three best fitness to the data. Other approaches including
normal, two parameter lognormal, Pearson type III, Gumbel extreme value type
I and Gaussian (normal) kernel function could not fit the data. In all cases
the values of MRD and MSRD for parametric methods are lower than the tabulated
values for the nonparametric normal kernel and hence the latter method is not
a good approach for frequency analysis of monthly precipitation in Iran.
CONCLUSION
One of the most common characteristics of the monthly precipitation in
Iran is the character of positive skewness. In general, for all stations,
the least values of the smoothing parameter (h) or bandwidth resulted
from LSCV method (in 82.7%). However, comparing the values of MRD and
MSRD for parametric methods and non parametric normal kernel function
concluded that the best approach to give the best fitness to the data
is logpearson type III. These results obtained from the long length periods
of monthly precipitation for the first time in the country and therefore,
are the unique findings of this study.