Cluster Analysis of Rainfall-Runoff Training Patterns to Flow Modeling Using Hybrid RBF Networks
The artificial intelligence modeling of nonstationary
rainfall-runoff has some restriction in accuracy of simulation base on
complexity and nonlinearity of training patterns. Statistical preprocessing
of trainings could determine homogeneity of rainfall-runoff patterns before
modeling in artificial intelligence. In this study, the new hybrid model
of artificial intelligence in conjunction with statistical clustering
is introduced. Statistical pre-processing effects of 360 rainfall-runoff
patterns considered before modeling using Radial Basis Function Neural
Networks (RBFNNs). In the first step all 360 monthly rainfall-runoff patterns
classify by cluster analysis in 4 groups and each class modeled by different
RBFNNs topology. Results of 4 cluster base-RBFNNs compare with no action
one and the optimized structure of Hybrid Cluster base-RBFNN models of
Nazloochaei river flow present. Results show that clustering of rainfall-runoff
patterns and modeling of each dataset by different RBFNNs has higher accuracy
than no preprocessing of patterns in prediction and modeling of river
The Bases of risk management is modeling and Prediction of Natural Hazard.
Intelligence models are distributed parallel processors that learn relationship
between input and output signals and present the optimized topology for simulation
of systems. In practice, many of the real-world dynamical system signals exhibit
two distinct characteristics: nonlinearity and non-stationary in the sense that
statistical characteristics change over time due to either internal or external
nonlinear dynamics (Coulibaly and Baldwin, 2005). The
rainfall-runoff modeling has nonlinearity process according to the temporal
and spatial distribution of Precipitation and other parameters. A major concern
in prediction of hydrological events is whether a given process should be modeled
as linear or as nonlinear. the evidence of nonstationarity of some existing
long hydrological records has raised a number of questions to adequacy of the
conventional statistical methods. There is a different kind of hydrological
models to simulate discharge of river basin but complexity of rainfall runoff
processing leads some restriction and problems in river flow modeling (Nouri
and Abghari, 2007). According to this complexity that caused nonlinear relation
between flow and rainfall, researchers try to found better methodology to develop
more accurate modeling. Artificial Neural Networks (ANNs), which are found more
suited to nonlinear input-output mapping. Recent reviews reveal that more than
90% of the applications of Artificial Neural Networks (ANNs) for water resources
variables modeling is the standard feedforward Neural Networks (Coulibaly
and Baldwin, 2005) but, also Radial basis Function Neural Networks (RBFNNs)
has high capability to modeling of hydrological Process (Mason
et al., 1996; Dibik and Solomatine, 2001;
Nouri and Abghari, 2007). Preprocessing of dataset before
training in artificial intelligence has better effect on training time and modeling
accuracy (Nouri and Abghari, 2007).
Hu et al. (2001) developed Range-Dependent Hybrid
Neural Networks (RDNN), which are virtually threshold ANNs, to forecast annual
and daily streamflows. Pal et al. (2003) proposed
a hybrid ANN model that combines the Self-Organizing Feature Map (SOFM) and
the MLP network for temperature prediction, where the SOFM serves to partition
the training data. Implementation of feedforward NNs (Hsu
et al., 1995; Lorrai and Sechi, 1995), Recurrent
NNs (Gao and Joo, 2005), Cluster Base Multi layer Perceptron
(Wang et al., 2006), RBFNNs (Mason
et al., 1996), Fuzzy NNs (Jang, 1993; Jang
et al., 1997) fuzzy-logic based hybrid modeling (See
and Openshaw, 1999) and SOM-cluster based hybrid modeling (Abrahart
and See, 2000) Bayesian-concept based modular ANN (Zhang
and Govindaraju, 2000; Nayak et al., 2004;
Dogan, 2005; Kisi, 2005, 2006),
models to Prediction of Time Series also proposed in Cigizoglu
(2004), Baratti et al. (2003), Yurekli
et al. (2004), Nilsson et al. (2005),
Dulkashi et al. (2006), Chen
et al. (2006), Bhattacharaya and Solomatine (2006)
and Jain and Srinivasulu (2006). Wang
et al. (2006) develop the hybrid Threshold base Multi layer perceptron
and Cluster Base-MLP to prediction of daily streamflow. Mason
et al. (1996) and Dibik and Solomatine (2001)
show that accuracy of river flow prediction using RBFNNs is better than MLP
and deterministic model, Hec-HMS. Samani et al. (2007)
demonstrate that using principal Component analysis as a preprocessing of dataset
and reduction of data dimension, MLP has more ability to modeling of Pump test
in well. Also, Ceylan and Ozbay (2007) demonstrate that
classification of ECG using some preprocessing of data like FCM, Wavelet Transformation
and Principle component analysis Comparison with ANN is better than MLP modeling
without preprocessing. In this study Cluster analysis consider for determine
the homogeneity of training patterns and each class of rainfall-runoff patterns
train and test in different RBF networks. Investigation of monthly homogeny
training patterns to river flow modeling and comparison of result with no preprocess
RBF modeling of nonstationary time series is main aim of this study.
MATERIALS AND METHODS
One of the major tributary of Nazloochaei River Basin in North West
of Iran selected for the current study (Fig. 1). The
basins area is 2014 km2 and main river drainage to Urmia Lake.
Stream flow processes always influenced severely by irregular rainfall
event. Three hundred and sixty monthly data records of 5 precipitation
gauge and hydrometric station from 1976-2006 considered for RBF and Hybrid
Radial Base Function Neural Networks
Activation function of hidden layer in the RBF Neural Networks defined as
radial symmetric basis functions such as the Gaussian function and output Layer
is linear Function (Wasserman, 1993). RBFNNs recommended
by researcher Because of fast training time and generalization of RBF in hydrological
process (Mason et al., 1996; Dibik
and Solomatine, 2001). Figure 2 show the structure of
RBFNNs model. Learning process in the RBFNs is updating of the matrix weight
in each iteration base on model output and comparison of estimated discharge
with observant one to determine error.
||Nazloochaei River basin in the North West of Iran
The output of neurons in hidden layer of RBFNNs (3) is radial difference function
(2) of Precipitation gauge data Matrix as an inputs pi = (p1i,
p2i,..., pki) and weight vectors wj = (w1i,
w2i,..., wmj) to minimize the error of estimate discharge
by model (Mason et al., 1996). The RBFNNs topology
programming was computed using the MATLAB R2008a software package.
||Effect of different spread coefficient in Gaussian function
There is different choice to Yout = f(δj) but usual
one is Eq. 3:
Which δj is radial difference between Precipitation pi
and weights wij and σ define as Spread Coefficient.
Neurons are added to the RBF network until the sum-squared error falls
beneath an error goal. These types of networks tend to take more neurons
than feedforward NNs with tansig or logsig neurons in the hidden layer.
Because sigmoid neurons can have outputs over a large region of the input
space, while RBF neurons only respond to relatively small regions of the
input space. In addition to optimization parameter of networks like number
of neurons, training algorithm, improvement of weights, there is extra
parameter for optimization and flexibility of Gaussian activation function
of RBFNs named as spread coefficient. Spread Coefficient (σ in Eq.
3) determines the selectivity of neurons. Using different among of
Spread the Gaussian activation function could contract and expansion (Fig.
3). Therefore if spread is small the radial basis function is very
steep and the neuron with the weight vector closest to the input will
have a much larger output than other neurons. As spread becomes larger
the radial basis functions slope becomes smoother and several neurons
can respond to an input vector. Thus the training process of RBFNNs is
performed by deciding on how many hidden neuron there should be for modeling
and the sharpness of the Gaussians using spread parameter. While the optimized
weights are estimated using Simple Back propagation algorithm like for
Multi Layer Perceptron approximation are kept fixed for RBF Networks modeling.
Models Performance Measures
There is an extensive literature on model forecasting evaluation indices
(Wang et al., 2006). The R squared and equivalently
Root Mean Squared Error (RMSE) is popular model measure because they are very
sensitive to even small errors, which is good for comparing small differences
of estimated and observed discharge on models.
Which n the numbers of observations,
average discharge of river, Qi the observed discharge of hydrometric
is the estimated discharge of RBF model.
Cluster Analysis of Monthly Rainfall-Runoff Patterns
Seasonality in the water year period leads to some incongruous in dataset
for training process of Neural Networks. Most of the hydrological models problems
are overcoming seasonality on simulation. The objective of cluster analysis
is the classification of objects according to similarities among them and organizing
of data into groups. Different classifications can be related to the algorithmic
approach of the clustering techniques. Variables that have high pairwise correlations
are assigned to the same cluster, whereas those having low pairwise correlations
are assigned to different clusters (Kamel and Selim, 1994).
Generally, cluster analysis is based on two ingredients: Distance measure and
Cluster algorithm. Distance can be measured among the data vectors themselves,
or as a distance form a data vector to some prototypical object of the cluster.
One of the recommended distance measures for quantification of (dis-)
similarity of objects is Euclidean method (Eq. 6). For
cluster algorithm, K-means (Eq. 7) can be seen as an
optimization problem which could minimize the sum of squared within-cluster
distances. Each cluster in the partition is defined by its member objects
and by its centroid. The centroid for each cluster is the point to which
the sum of distances from all objects in that cluster is minimized. In
the presented hybrid model K-means uses an iterative algorithm that minimizes
the sum of distances from each rainfall-runoff training patterns to its
cluster centroid, over all clusters. This algorithm moves training patterns
between clusters until the sum cannot be decreased further. The result
is a set of clusters that are as compact and well-separated as possible.
Which, de (i, j) is Euclidean distance between precipitation
pij and pjk, W(C) Minimize the sum of squared within-cluster
Using this method 360 dataset of rainfall-runoff separated in 4 clusters,
124 training set in cluster one, 116 in cluster two, 52 in cluster three
and 68 in cluster 4. Each clusters rainfall-runoff patterns divided in
two sections for training and validation process of hybrid models. 70%
percent of each class pattern used for training and other residual used
for models validation. Four hybrid models, Cl 1-RBFNNs, Cl 2-RBFNNs, Cl
3-RBFNNs and Cl 4-RBFNNs, developed. Figure 4 show the
structure of Hybrid Cluster Base-RBFNNs.
||The structure of hybrid model (cluster base-RBFNNs)
|Rainfall-runoff RBFNN optimized topology without any preprocessing
RESULTS AND DISCUSSION
The rainfall-runoff processes usually have pronounced seasonal means,
variances and dependence structures and the under-lying mechanisms of
streamflow are likely to be quite different during low, medium and high
flow periods, especially when extreme events occur. Result of RBFNNs modeling
of all 360 training rainfall-runoff patterns without any preprocessing
show in Table 1. The optimized topology of training
and testing obtain by spread coefficient 0.4. Considering the Table
1 show that model accuracy using R2 for training and validation
phase is 76.21, 68.94% and RMSE for training and validation phase 0.8802
and 0.9808. this result show the seasonality modeling problem for time
series that even RBFNN has a restriction in training of relation between
precipitation input vector and discharge output.
After clustering of all datasets in 4 classes, each classes train and test
in different Radial Base Function Neural Networks (Fig. 5)
and Cluster Base-RBFNNS developed. Because of homogenous in training cluster
and reduction of seasonality in the hybrid model, accuracy of each class improved
to some extend in comparison to no action on dataset. Result of each hybrid
models Cl 1-RBFNNs, Cl 2-RBFNNs, Cl 3-RBFNNs, Cl 4-RBFNNs show in Table
2-5. The optimized topology of the Cl 1-RBFNNs in training
and testing obtain by spread 0.3. The Table 2 show that model
accuracy using R2 for training section is 90.95%, 82.37% and RMSE
is 0.0058 and 0.0987.
|Optimized topology of model 1 (cluster 1-RBFNN)
|Optimized topology of model 2 (cluster 2-RBFNN)
It is recognized that data preprocessing can have a significant effect on model
performance (Maier and Dandy, 2000). Results show that
Clustering of Rainfall-Runoff patterns and modeling of each dataset by different
RBFNNs has higher accuracy than no preprocessing of Patterns in prediction and
modeling of river flow. Considering Table 6 show the capability
of preprocessing of rainfall-runoff modeling using Cluster base-RBFNNs that
could demonstrate other preprocessing methods like data reduction using PCA
by Samani (2007).
|Optimized topology of model 3 (cluster 3-RBFNN)
|Optimized topology of model 4 (cluster 4-RBFNN)
||Monitoring of models accuracy in Training and testing process
using different spread coefficients
|Topology of all 5 models RBFNN and Cluster base-RBFNNs
Figure 6 show the Monitoring of spread coefficient of Gaussian
Function in 5 developed models. This result demonstrates that R square of each
cluster base models in selective spread has better accuracy than No preprocessing
in training and testing procedure.
There is a big difference between extreme data in the seasonal hydrological
time series that clustering could separate homogenous data. Using homogenous
training rainfall-runoff patterns modeling is much easier than no action.
Cluster base-RBFNNs could use as a hybrid model to overcoming of nonlinearity
modeling of river flow modeling. This comparison shows that preprocessing
of training dataset as the hybrid RBFNNs model has better optimization
than RBFNNs for simulating of the rainfall-runoff process in Nazloochaei
river basin. RBFNN has efficient training algorithm (vs. multi-layer NN)
but large training set is a problem. Cluster base-RBFNN hybrid model show
high capability for prediction. It would be interesting to further compare
the hybrid Cluster-based RBFNN approach with other hybrid techniques,
such as Bayesian-concept based MLP, fuzzy-logic NNs modeling and SOM-cluster
based hybrid modeling.
Abrahart, R.J. and L. See, 2000.
Comparing neural network and autoregressive moving average techniques for the provision of continuous river flow forecasts in two contrasting catchments. Hydrol. Process, 14: 2157-2172.CrossRef | Direct Link |
Baratti, R., B. Cannas, A. Fanni, M. Pintus, G.M. Sechi and N. Toreno, 2003.
River flow forecast for reservoir management through neural networks. J. Neurocomput., 55: 421-437.CrossRef |
Bhattacharya, B. and D.P. Solomatine, 2006.
Machine learning in sedimentation modeling. J. Neural Networks, 19: 208-214.CrossRef |
Cigizoglu, H.K., 2004.
Estimation and forecasting of daily suspended sediment data by multi-layer perceptrons. Adv. Water Resour., 27: 185-195.
Ceylan, R. and Y. Ozbay, 2007.
Comparison of FCM, PCA and WT techniques for classification ECG arrhythmias using artificial neural network. Expert Syst. Applic., 33: 286-295.CrossRef | Direct Link |
Chen, Y., B. Yang and J. Dong, 2006.
Time-series prediction using a local linear wavelet neural network. Neurocomputing, 69: 449-465.CrossRef |
Coulibaly, P. and C.K. Baldwin, 2005.
Nonstationary hydrological time series forecasting using nonlinear dynamic methods. J. Hydrol., 307: 164-174.CrossRef |
Dibik, Y.B. and D.P. Solomatine, 2001.
River flow forecasting using artificial neural networks. Hydrol. Oceans Atmos., 26: 1-7.CrossRef |
Dogan, E., 2005.
Suspended sediment load estimation in lower sakarya river by using artificial neural networks. Fuzzy logic neuro-fuzzy models. Elect. Lett. Sci. Eng., 1: 22-32.Direct Link |
Gao, Y. and E.M. Joo, 2005.
NARMAX time series model prediction: Feedforward and recurrent fuzzy neural network approaches. Fuzzy Sets Syst., 150: 331-350.CrossRef |
Hu, T.S., K.C. Lam and S.T. Ng, 2001.
River flow time series prediction with a range-dependent neural network. Hydrol. Sci. J., 46: 729-745.Direct Link |
Hsu, K.I., H.V. Gupta and S. Sorooshian, 1995.
Artificial neural network modeling of the rainfall-runoff process. Water Resour. Res., 31: 2517-2530.CrossRef | Direct Link |
Jain, A. and S. Srinivasulu, 2006.
Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques. J. Hydrol., 317: 291-306.CrossRef |
Jang, J.S.R., 1993.
ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern., 23: 665-685.CrossRef | Direct Link |
Jang, J.S.R., C.T. Sun and E. Mizutani, 1997.
Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. 1st Edn., Prentice Hall, Upper Saddle River, NJ., USA., ISBN: 0132610663
Kisi, O., 2005.
Suspended sediment estimation using neuro-fuzzy and neural network approaches. Hydrol. Sci. J., 50: 683-696.CrossRef | Direct Link |
Kisi, O., 2006.
Daily pan evaporation modelling using a neuro-fuzzy computing technique. J. Hydrol., 329: 636-646.CrossRef | Direct Link |
Kamel, M.S. and S.Z. Selim, 1994.
New algorithm for solving the fuzzy clustering problem. Pattern Recog., 27: 421-428.CrossRef |
Lorrai, M. and H.M. Sechi, 1995.
Neural nets for modeling rainfall-runoff transformations. Water Resour. Manage., 9: 299-313.CrossRef | Direct Link |
Maier, H.R. and G.C. Dandy, 2000.
Neural networks for the prediction and forecasting of water resources variables: A review of modeling issues and applications. Environ. Model. Software, 15: 101-124.CrossRef |
Mason, J.C., R.K. Price and A. Ternme, 1996.
A neural network model of rainfall-runoff using radial basis functions. J. Hydraulic Res., 34: 537-548.Direct Link |
Nayak, P.C., K.P. Sudheer, D.M. Rangan and K.S. Ramasastri, 2004.
A neuro-fuzzy computing technique for modeling hydrological time series. J. Hydrol., 291: 52-66.CrossRef |
Nilsson, P., C.B. Uvo and R. Bentsen, 2005.
Monthly runoff simulation: Comparing and combining conceptual and neural network models. J. Hydrol., 317: 1-20.CrossRef |
Nouri, M. and H. Abghari, 2007.
Simulation of rainfall-runoff using RBFNNs base on probabilistic neural network classification. Proceedings of the 3rd Conference on Watershed Management Kerman and Water Resources Management, May 10-11, 2007, Iranian Society of Irrigation and Water Engineering Press, pp: 1108-1113
Pal, N.R., S. Pal, J. Das and K. Majumdar, 2003.
SOFM-MLP: A hybrid neural network for atmospheric temperature prediction. IEEE Trans. Geosci. Remote Sensing, 41: 2783-2791.CrossRef | Direct Link | INSPEC
Samani, N., M. Gohari-Moghadam and A. Safavi, 2007.
A simple neural network model for the determination of aquifer parameters. J. Hydrol., 340: 1-11.CrossRef |
See, L. and S. Openshaw, 1999.
Applying soft computing approaches to river level forecasting. Hydrol. Sci. J., 44: 763-778.Direct Link |
Wang, W. Van, P. Gelder and J.K. Vrijling, 2006.
Forecasting daily stream flow using hybrid ANN models. J. Hydrol., 324: 383-399.CrossRef |
Wasserman, P.D., 1993.
Advanced Methods in Neural Computing. 1st Edn. John Wiley and Sons, Inc., New York, USA., ISBN:0442004613 pp: 250Direct Link |
Yurekli, K., K. Kurunc and H. Simsek, 2004.
Prediction of daily maximum streamflow based on stochastic approaches. J. Spatial Hydrol., 4: 1-12.Direct Link |
Zhang, B. and S. Govindaraju, 2000.
Prediction of watershed runoff using bayesian concepts and modular neural networks. Water Resour. Res., 36: 753-762.CrossRef | Direct Link |
Karunasinghe, D.S.K. and S.Y. Liong, 2006.
Chaotic time series prediction with a global model artificial neural network. J. Hydrol., 323: 92-105.CrossRef |