
Research Article


Cluster Analysis of RainfallRunoff Training Patterns to Flow Modeling Using Hybrid RBF Networks 

H. Abghari,
M. Mahdavi,
A. Fakherifard
and
A. Salajegheh



ABSTRACT

The artificial intelligence modeling of nonstationary
rainfallrunoff has some restriction in accuracy of simulation base on
complexity and nonlinearity of training patterns. Statistical preprocessing
of trainings could determine homogeneity of rainfallrunoff patterns before
modeling in artificial intelligence. In this study, the new hybrid model
of artificial intelligence in conjunction with statistical clustering
is introduced. Statistical preprocessing effects of 360 rainfallrunoff
patterns considered before modeling using Radial Basis Function Neural
Networks (RBFNNs). In the first step all 360 monthly rainfallrunoff patterns
classify by cluster analysis in 4 groups and each class modeled by different
RBFNNs topology. Results of 4 cluster baseRBFNNs compare with no action
one and the optimized structure of Hybrid Cluster baseRBFNN models of
Nazloochaei river flow present. Results show that clustering of rainfallrunoff
patterns and modeling of each dataset by different RBFNNs has higher accuracy
than no preprocessing of patterns in prediction and modeling of river
flow.







INTRODUCTION
The Bases of risk management is modeling and Prediction of Natural Hazard.
Intelligence models are distributed parallel processors that learn relationship
between input and output signals and present the optimized topology for simulation
of systems. In practice, many of the realworld dynamical system signals exhibit
two distinct characteristics: nonlinearity and nonstationary in the sense that
statistical characteristics change over time due to either internal or external
nonlinear dynamics (Coulibaly and Baldwin, 2005). The
rainfallrunoff modeling has nonlinearity process according to the temporal
and spatial distribution of Precipitation and other parameters. A major concern
in prediction of hydrological events is whether a given process should be modeled
as linear or as nonlinear. the evidence of nonstationarity of some existing
long hydrological records has raised a number of questions to adequacy of the
conventional statistical methods. There is a different kind of hydrological
models to simulate discharge of river basin but complexity of rainfall runoff
processing leads some restriction and problems in river flow modeling (Nouri
and Abghari, 2007). According to this complexity that caused nonlinear relation
between flow and rainfall, researchers try to found better methodology to develop
more accurate modeling. Artificial Neural Networks (ANNs), which are found more
suited to nonlinear inputoutput mapping. Recent reviews reveal that more than
90% of the applications of Artificial Neural Networks (ANNs) for water resources
variables modeling is the standard feedforward Neural Networks (Coulibaly
and Baldwin, 2005) but, also Radial basis Function Neural Networks (RBFNNs)
has high capability to modeling of hydrological Process (Mason
et al., 1996; Dibik and Solomatine, 2001;
Nouri and Abghari, 2007). Preprocessing of dataset before
training in artificial intelligence has better effect on training time and modeling
accuracy (Nouri and Abghari, 2007).
Hu et al. (2001) developed RangeDependent Hybrid
Neural Networks (RDNN), which are virtually threshold ANNs, to forecast annual
and daily streamflows. Pal et al. (2003) proposed
a hybrid ANN model that combines the SelfOrganizing Feature Map (SOFM) and
the MLP network for temperature prediction, where the SOFM serves to partition
the training data. Implementation of feedforward NNs (Hsu
et al., 1995; Lorrai and Sechi, 1995), Recurrent
NNs (Gao and Joo, 2005), Cluster Base Multi layer Perceptron
(Wang et al., 2006), RBFNNs (Mason
et al., 1996), Fuzzy NNs (Jang, 1993; Jang
et al., 1997) fuzzylogic based hybrid modeling (See
and Openshaw, 1999) and SOMcluster based hybrid modeling (Abrahart
and See, 2000) Bayesianconcept based modular ANN (Zhang
and Govindaraju, 2000; Nayak et al., 2004;
Dogan, 2005; Kisi, 2005, 2006),
models to Prediction of Time Series also proposed in Cigizoglu
(2004), Baratti et al. (2003), Yurekli
et al. (2004), Nilsson et al. (2005),
Dulkashi et al. (2006), Chen
et al. (2006), Bhattacharaya and Solomatine (2006)
and Jain and Srinivasulu (2006). Wang
et al. (2006) develop the hybrid Threshold base Multi layer perceptron
and Cluster BaseMLP to prediction of daily streamflow. Mason
et al. (1996) and Dibik and Solomatine (2001)
show that accuracy of river flow prediction using RBFNNs is better than MLP
and deterministic model, HecHMS. Samani et al. (2007)
demonstrate that using principal Component analysis as a preprocessing of dataset
and reduction of data dimension, MLP has more ability to modeling of Pump test
in well. Also, Ceylan and Ozbay (2007) demonstrate that
classification of ECG using some preprocessing of data like FCM, Wavelet Transformation
and Principle component analysis Comparison with ANN is better than MLP modeling
without preprocessing. In this study Cluster analysis consider for determine
the homogeneity of training patterns and each class of rainfallrunoff patterns
train and test in different RBF networks. Investigation of monthly homogeny
training patterns to river flow modeling and comparison of result with no preprocess
RBF modeling of nonstationary time series is main aim of this study.
MATERIALS AND METHODS
Study Area
One of the major tributary of Nazloochaei River Basin in North West
of Iran selected for the current study (Fig. 1). The
basin’s area is 2014 km^{2} and main river drainage to Urmia Lake.
Stream flow processes always influenced severely by irregular rainfall
event. Three hundred and sixty monthly data records of 5 precipitation
gauge and hydrometric station from 19762006 considered for RBF and Hybrid
RBF modeling.
Radial Base Function Neural Networks
Activation function of hidden layer in the RBF Neural Networks defined as
radial symmetric basis functions such as the Gaussian function and output Layer
is linear Function (Wasserman, 1993). RBFNNs recommended
by researcher Because of fast training time and generalization of RBF in hydrological
process (Mason et al., 1996; Dibik
and Solomatine, 2001). Figure 2 show the structure of
RBFNNs model. Learning process in the RBFNs is updating of the matrix weight
in each iteration base on model output and comparison of estimated discharge
with observant one to determine error.

Fig. 1: 
Nazloochaei River basin in the North West of Iran 
The output of neurons in hidden layer of RBFNNs (3) is radial difference function
(2) of Precipitation gauge data Matrix as an inputs p_{i} = (p_{1i},
p_{2i},..., p_{ki}) and weight vectors w_{j} = (w_{1i},
w_{2i},..., w_{mj}) to minimize the error of estimate discharge
by model (Mason et al., 1996). The RBFNNs topology
programming was computed using the MATLAB R2008a software package.

Fig. 3: 
Effect of different spread coefficient in Gaussian function
shape 
There is different choice to Y_{out} = f(δ_{j}) but usual
one is Eq. 3:
Which δ_{j} is radial difference between Precipitation p_{i}
and weights w_{ij} and σ define as Spread Coefficient.
Neurons are added to the RBF network until the sumsquared error falls
beneath an error goal. These types of networks tend to take more neurons
than feedforward NNs with tansig or logsig neurons in the hidden layer.
Because sigmoid neurons can have outputs over a large region of the input
space, while RBF neurons only respond to relatively small regions of the
input space. In addition to optimization parameter of networks like number
of neurons, training algorithm, improvement of weights, there is extra
parameter for optimization and flexibility of Gaussian activation function
of RBFNs named as spread coefficient. Spread Coefficient (σ in Eq.
3) determines the selectivity of neurons. Using different among of
Spread the Gaussian activation function could contract and expansion (Fig.
3). Therefore if spread is small the radial basis function is very
steep and the neuron with the weight vector closest to the input will
have a much larger output than other neurons. As spread becomes larger
the radial basis function’s slope becomes smoother and several neurons
can respond to an input vector. Thus the training process of RBFNNs is
performed by deciding on how many hidden neuron there should be for modeling
and the sharpness of the Gaussians using spread parameter. While the optimized
weights are estimated using Simple Back propagation algorithm like for
Multi Layer Perceptron approximation are kept fixed for RBF Networks modeling.
Models Performance Measures
There is an extensive literature on model forecasting evaluation indices
(Wang et al., 2006). The R squared and equivalently
Root Mean Squared Error (RMSE) is popular model measure because they are very
sensitive to even small errors, which is good for comparing small differences
of estimated and observed discharge on models.
Which n the numbers of observations,
average discharge of river, Q_{i} the observed discharge of hydrometric
station and
is the estimated discharge of RBF model.
Cluster Analysis of Monthly RainfallRunoff Patterns
Seasonality in the water year period leads to some incongruous in dataset
for training process of Neural Networks. Most of the hydrological models problems
are overcoming seasonality on simulation. The objective of cluster analysis
is the classification of objects according to similarities among them and organizing
of data into groups. Different classifications can be related to the algorithmic
approach of the clustering techniques. Variables that have high pairwise correlations
are assigned to the same cluster, whereas those having low pairwise correlations
are assigned to different clusters (Kamel and Selim, 1994).
Generally, cluster analysis is based on two ingredients: Distance measure and
Cluster algorithm. Distance can be measured among the data vectors themselves,
or as a distance form a data vector to some prototypical object of the cluster.
One of the recommended distance measures for quantification of (dis)
similarity of objects is Euclidean method (Eq. 6). For
cluster algorithm, Kmeans (Eq. 7) can be seen as an
optimization problem which could minimize the sum of squared withincluster
distances. Each cluster in the partition is defined by its member objects
and by its centroid. The centroid for each cluster is the point to which
the sum of distances from all objects in that cluster is minimized. In
the presented hybrid model Kmeans uses an iterative algorithm that minimizes
the sum of distances from each rainfallrunoff training patterns to its
cluster centroid, over all clusters. This algorithm moves training patterns
between clusters until the sum cannot be decreased further. The result
is a set of clusters that are as compact and wellseparated as possible.
Which, d_{e} (i, j) is Euclidean distance between precipitation
p_{ij} and p_{jk}, W(C) Minimize the sum of squared withincluster
distances.
Using this method 360 dataset of rainfallrunoff separated in 4 clusters,
124 training set in cluster one, 116 in cluster two, 52 in cluster three
and 68 in cluster 4. Each cluster’s rainfallrunoff patterns divided in
two sections for training and validation process of hybrid models. 70%
percent of each class pattern used for training and other residual used
for models validation. Four hybrid models, Cl 1RBFNNs, Cl 2RBFNNs, Cl
3RBFNNs and Cl 4RBFNNs, developed. Figure 4 show the
structure of Hybrid Cluster BaseRBFNNs.

Fig. 4: 
The structure of hybrid model (cluster baseRBFNNs) 
Table 1: 
Rainfallrunoff RBFNN optimized topology without any preprocessing
of patterns 

RESULTS AND DISCUSSION
The rainfallrunoff processes usually have pronounced seasonal means,
variances and dependence structures and the underlying mechanisms of
streamflow are likely to be quite different during low, medium and high
flow periods, especially when extreme events occur. Result of RBFNNs modeling
of all 360 training rainfallrunoff patterns without any preprocessing
show in Table 1. The optimized topology of training
and testing obtain by spread coefficient 0.4. Considering the Table
1 show that model accuracy using R^{2 }for training and validation
phase is 76.21, 68.94% and RMSE for training and validation phase 0.8802
and 0.9808. this result show the seasonality modeling problem for time
series that even RBFNN has a restriction in training of relation between
precipitation input vector and discharge output.
After clustering of all datasets in 4 classes, each classes train and test
in different Radial Base Function Neural Networks (Fig. 5)
and Cluster BaseRBFNNS developed. Because of homogenous in training cluster
and reduction of seasonality in the hybrid model, accuracy of each class improved
to some extend in comparison to no action on dataset. Result of each hybrid
models Cl 1RBFNNs, Cl 2RBFNNs, Cl 3RBFNNs, Cl 4RBFNNs show in Table
25. The optimized topology of the Cl 1RBFNNs in training
and testing obtain by spread 0.3. The Table 2 show that model
accuracy using R^{2} for training section is 90.95%, 82.37% and RMSE
is 0.0058 and 0.0987.
Table 2: 
Optimized topology of model 1 (cluster 1RBFNN) 

Table 3: 
Optimized topology of model 2 (cluster 2RBFNN) 

It is recognized that data preprocessing can have a significant effect on model
performance (Maier and Dandy, 2000). Results show that
Clustering of RainfallRunoff patterns and modeling of each dataset by different
RBFNNs has higher accuracy than no preprocessing of Patterns in prediction and
modeling of river flow. Considering Table 6 show the capability
of preprocessing of rainfallrunoff modeling using Cluster baseRBFNNs that
could demonstrate other preprocessing methods like data reduction using PCA
by Samani (2007).
Table 4: 
Optimized topology of model 3 (cluster 3RBFNN) 

Table 5: 
Optimized topology of model 4 (cluster 4RBFNN) 


Fig. 6: 
Monitoring of models accuracy in Training and testing process
using different spread coefficients 
Table 6: 
Topology of all 5 models RBFNN and Cluster baseRBFNNs 

Figure 6 show the Monitoring of spread coefficient of Gaussian
Function in 5 developed models. This result demonstrates that R square of each
cluster base models in selective spread has better accuracy than No preprocessing
in training and testing procedure.
CONCLUSION
There is a big difference between extreme data in the seasonal hydrological
time series that clustering could separate homogenous data. Using homogenous
training rainfallrunoff patterns modeling is much easier than no action.
Cluster baseRBFNNs could use as a hybrid model to overcoming of nonlinearity
modeling of river flow modeling. This comparison shows that preprocessing
of training dataset as the hybrid RBFNNs model has better optimization
than RBFNNs for simulating of the rainfallrunoff process in Nazloochaei
river basin. RBFNN has efficient training algorithm (vs. multilayer NN)
but large training set is a problem. Cluster baseRBFNN hybrid model show
high capability for prediction. It would be interesting to further compare
the hybrid Clusterbased RBFNN approach with other hybrid techniques,
such as Bayesianconcept based MLP, fuzzylogic NNs modeling and SOMcluster
based hybrid modeling.

REFERENCES 
Abrahart, R.J. and L. See, 2000. Comparing neural network and autoregressive moving average techniques for the provision of continuous river flow forecasts in two contrasting catchments. Hydrol. Process, 14: 21572172. CrossRef  Direct Link 
Baratti, R., B. Cannas, A. Fanni, M. Pintus, G.M. Sechi and N. Toreno, 2003. River flow forecast for reservoir management through neural networks. J. Neurocomput., 55: 421437. CrossRef 
Bhattacharya, B. and D.P. Solomatine, 2006. Machine learning in sedimentation modeling. J. Neural Networks, 19: 208214. CrossRef 
Cigizoglu, H.K., 2004. Estimation and forecasting of daily suspended sediment data by multilayer perceptrons. Adv. Water Resour., 27: 185195.
Ceylan, R. and Y. Ozbay, 2007. Comparison of FCM, PCA and WT techniques for classification ECG arrhythmias using artificial neural network. Expert Syst. Applic., 33: 286295. CrossRef  Direct Link 
Chen, Y., B. Yang and J. Dong, 2006. Timeseries prediction using a local linear wavelet neural network. Neurocomputing, 69: 449465. CrossRef 
Coulibaly, P. and C.K. Baldwin, 2005. Nonstationary hydrological time series forecasting using nonlinear dynamic methods. J. Hydrol., 307: 164174. CrossRef 
Dibik, Y.B. and D.P. Solomatine, 2001. River flow forecasting using artificial neural networks. Hydrol. Oceans Atmos., 26: 17. CrossRef 
Dogan, E., 2005. Suspended sediment load estimation in lower sakarya river by using artificial neural networks. Fuzzy logic neurofuzzy models. Elect. Lett. Sci. Eng., 1: 2232. Direct Link 
Gao, Y. and E.M. Joo, 2005. NARMAX time series model prediction: Feedforward and recurrent fuzzy neural network approaches. Fuzzy Sets Syst., 150: 331350. CrossRef 
Hu, T.S., K.C. Lam and S.T. Ng, 2001. River flow time series prediction with a rangedependent neural network. Hydrol. Sci. J., 46: 729745. Direct Link 
Hsu, K.I., H.V. Gupta and S. Sorooshian, 1995. Artificial neural network modeling of the rainfallrunoff process. Water Resour. Res., 31: 25172530. CrossRef  Direct Link 
Jain, A. and S. Srinivasulu, 2006. Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques. J. Hydrol., 317: 291306. CrossRef 
Jang, J.S.R., 1993. ANFIS: Adaptivenetworkbased fuzzy inference system. IEEE Trans. Syst. Man Cybern., 23: 665685. CrossRef  Direct Link 
Jang, J.S.R., C.T. Sun and E. Mizutani, 1997. NeuroFuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. 1st Edn., Prentice Hall, Upper Saddle River, NJ., USA., ISBN: 0132610663
Kisi, O., 2005. Suspended sediment estimation using neurofuzzy and neural network approaches. Hydrol. Sci. J., 50: 683696. CrossRef  Direct Link 
Kisi, O., 2006. Daily pan evaporation modelling using a neurofuzzy computing technique. J. Hydrol., 329: 636646. CrossRef  Direct Link 
Kamel, M.S. and S.Z. Selim, 1994. New algorithm for solving the fuzzy clustering problem. Pattern Recog., 27: 421428. CrossRef 
Lorrai, M. and H.M. Sechi, 1995. Neural nets for modeling rainfallrunoff transformations. Water Resour. Manage., 9: 299313. CrossRef  Direct Link 
Maier, H.R. and G.C. Dandy, 2000. Neural networks for the prediction and forecasting of water resources variables: A review of modeling issues and applications. Environ. Model. Software, 15: 101124. CrossRef 
Mason, J.C., R.K. Price and A. Ternme, 1996. A neural network model of rainfallrunoff using radial basis functions. J. Hydraulic Res., 34: 537548. Direct Link 
Nayak, P.C., K.P. Sudheer, D.M. Rangan and K.S. Ramasastri, 2004. A neurofuzzy computing technique for modeling hydrological time series. J. Hydrol., 291: 5266. CrossRef 
Nilsson, P., C.B. Uvo and R. Bentsen, 2005. Monthly runoff simulation: Comparing and combining conceptual and neural network models. J. Hydrol., 317: 120. CrossRef 
Nouri, M. and H. Abghari, 2007. Simulation of rainfallrunoff using RBFNNs base on probabilistic neural network classification. Proceedings of the 3rd Conference on Watershed Management Kerman and Water Resources Management, May 1011, 2007, Iranian Society of Irrigation and Water Engineering Press, pp: 11081113
Pal, N.R., S. Pal, J. Das and K. Majumdar, 2003. SOFMMLP: A hybrid neural network for atmospheric temperature prediction. IEEE Trans. Geosci. Remote Sensing, 41: 27832791. CrossRef  Direct Link  INSPEC
Samani, N., M. GohariMoghadam and A. Safavi, 2007. A simple neural network model for the determination of aquifer parameters. J. Hydrol., 340: 111. CrossRef 
See, L. and S. Openshaw, 1999. Applying soft computing approaches to river level forecasting. Hydrol. Sci. J., 44: 763778. Direct Link 
Wang, W. Van, P. Gelder and J.K. Vrijling, 2006. Forecasting daily stream flow using hybrid ANN models. J. Hydrol., 324: 383399. CrossRef 
Wasserman, P.D., 1993. Advanced Methods in Neural Computing. 1st Edn. John Wiley and Sons, Inc., New York, USA., ISBN:0442004613 pp: 250 Direct Link 
Yurekli, K., K. Kurunc and H. Simsek, 2004. Prediction of daily maximum streamflow based on stochastic approaches. J. Spatial Hydrol., 4: 112. Direct Link 
Zhang, B. and S. Govindaraju, 2000. Prediction of watershed runoff using bayesian concepts and modular neural networks. Water Resour. Res., 36: 753762. CrossRef  Direct Link 
Karunasinghe, D.S.K. and S.Y. Liong, 2006. Chaotic time series prediction with a global model artificial neural network. J. Hydrol., 323: 92105. CrossRef 



