Cluster Analysis of Rainfall-Runoff Training Patterns to Flow Modeling Using Hybrid RBF Networks

Abghari, H.; Mahdavi, M.; Fakherifard, A.; Salajegheh, A.

ABSTRACT

The artificial intelligence modeling of nonstationary rainfall-runoff has some restriction in accuracy of simulation base on complexity and nonlinearity of training patterns. Statistical preprocessing of trainings could determine homogeneity of rainfall-runoff patterns before modeling in artificial intelligence. In this study, the new hybrid model of artificial intelligence in conjunction with statistical clustering is introduced. Statistical pre-processing effects of 360 rainfall-runoff patterns considered before modeling using Radial Basis Function Neural Networks (RBFNNs). In the first step all 360 monthly rainfall-runoff patterns classify by cluster analysis in 4 groups and each class modeled by different RBFNNs topology. Results of 4 cluster base-RBFNNs compare with no action one and the optimized structure of Hybrid Cluster base-RBFNN models of Nazloochaei river flow present. Results show that clustering of rainfall-runoff patterns and modeling of each dataset by different RBFNNs has higher accuracy than no preprocessing of patterns in prediction and modeling of river flow.

PDF Abstract XML References Citation

INTRODUCTION

The Bases of risk management is modeling and Prediction of Natural Hazard. Intelligence models are distributed parallel processors that learn relationship between input and output signals and present the optimized topology for simulation of systems. In practice, many of the real-world dynamical system signals exhibit two distinct characteristics: nonlinearity and non-stationary in the sense that statistical characteristics change over time due to either internal or external nonlinear dynamics (Coulibaly and Baldwin, 2005). The rainfall-runoff modeling has nonlinearity process according to the temporal and spatial distribution of Precipitation and other parameters. A major concern in prediction of hydrological events is whether a given process should be modeled as linear or as nonlinear. the evidence of nonstationarity of some existing long hydrological records has raised a number of questions to adequacy of the conventional statistical methods. There is a different kind of hydrological models to simulate discharge of river basin but complexity of rainfall runoff processing leads some restriction and problems in river flow modeling (Nouri and Abghari, 2007). According to this complexity that caused nonlinear relation between flow and rainfall, researchers try to found better methodology to develop more accurate modeling. Artificial Neural Networks (ANNs), which are found more suited to nonlinear input-output mapping. Recent reviews reveal that more than 90% of the applications of Artificial Neural Networks (ANNs) for water resources variables modeling is the standard feedforward Neural Networks (Coulibaly and Baldwin, 2005) but, also Radial basis Function Neural Networks (RBFNNs) has high capability to modeling of hydrological Process (Mason et al., 1996; Dibik and Solomatine, 2001; Nouri and Abghari, 2007). Preprocessing of dataset before training in artificial intelligence has better effect on training time and modeling accuracy (Nouri and Abghari, 2007).

Hu et al. (2001) developed Range-Dependent Hybrid Neural Networks (RDNN), which are virtually threshold ANNs, to forecast annual and daily streamflows. Pal et al. (2003) proposed a hybrid ANN model that combines the Self-Organizing Feature Map (SOFM) and the MLP network for temperature prediction, where the SOFM serves to partition the training data. Implementation of feedforward NNs (Hsu et al., 1995; Lorrai and Sechi, 1995), Recurrent NNs (Gao and Joo, 2005), Cluster Base Multi layer Perceptron (Wang et al., 2006), RBFNNs (Mason et al., 1996), Fuzzy NNs (Jang, 1993; Jang et al., 1997) fuzzy-logic based hybrid modeling (See and Openshaw, 1999) and SOM-cluster based hybrid modeling (Abrahart and See, 2000) Bayesian-concept based modular ANN (Zhang and Govindaraju, 2000; Nayak et al., 2004; Dogan, 2005; Kisi, 2005, 2006), models to Prediction of Time Series also proposed in Cigizoglu (2004), Baratti et al. (2003), Yurekli et al. (2004), Nilsson et al. (2005), Dulkashi et al. (2006), Chen et al. (2006), Bhattacharaya and Solomatine (2006) and Jain and Srinivasulu (2006). Wang et al. (2006) develop the hybrid Threshold base Multi layer perceptron and Cluster Base-MLP to prediction of daily streamflow. Mason et al. (1996) and Dibik and Solomatine (2001) show that accuracy of river flow prediction using RBFNNs is better than MLP and deterministic model, Hec-HMS. Samani et al. (2007) demonstrate that using principal Component analysis as a preprocessing of dataset and reduction of data dimension, MLP has more ability to modeling of Pump test in well. Also, Ceylan and Ozbay (2007) demonstrate that classification of ECG using some preprocessing of data like FCM, Wavelet Transformation and Principle component analysis Comparison with ANN is better than MLP modeling without preprocessing. In this study Cluster analysis consider for determine the homogeneity of training patterns and each class of rainfall-runoff patterns train and test in different RBF networks. Investigation of monthly homogeny training patterns to river flow modeling and comparison of result with no preprocess RBF modeling of nonstationary time series is main aim of this study.

MATERIALS AND METHODS

Study Area
One of the major tributary of Nazloochaei River Basin in North West of Iran selected for the current study (Fig. 1). The basin’s area is 2014 km² and main river drainage to Urmia Lake. Stream flow processes always influenced severely by irregular rainfall event. Three hundred and sixty monthly data records of 5 precipitation gauge and hydrometric station from 1976-2006 considered for RBF and Hybrid RBF modeling.

Radial Base Function Neural Networks
Activation function of hidden layer in the RBF Neural Networks defined as radial symmetric basis functions such as the Gaussian function and output Layer is linear Function (Wasserman, 1993). RBFNNs recommended by researcher Because of fast training time and generalization of RBF in hydrological process (Mason et al., 1996; Dibik and Solomatine, 2001). Figure 2 show the structure of RBFNNs model. Learning process in the RBFNs is updating of the matrix weight in each iteration base on model output and comparison of estimated discharge with observant one to determine error.


Fig. 1:	Nazloochaei River basin in the North West of Iran


Fig. 2:	The basic structure of radial base function neural networks

The output of neurons in hidden layer of RBFNNs (3) is radial difference function (2) of Precipitation gauge data Matrix as an inputs p_i = (p_1i, p_2i,..., p_ki) and weight vectors w_j = (w_1i, w_2i,..., w_mj) to minimize the error of estimate discharge by model (Mason et al., 1996). The RBFNNs topology programming was computed using the MATLAB R2008a software package.

Image for - Cluster Analysis of Rainfall-Runoff Training Patterns to Flow Modeling Using Hybrid RBF Networks

(1)


Fig. 3:	Effect of different spread coefficient in Gaussian function shape

(2)

There is different choice to Y_out = f(δ_j) but usual one is Eq. 3:

(3)

Which δ_j is radial difference between Precipitation p_i and weights w_ij and σ define as Spread Coefficient.

Neurons are added to the RBF network until the sum-squared error falls beneath an error goal. These types of networks tend to take more neurons than feedforward NNs with tansig or logsig neurons in the hidden layer. Because sigmoid neurons can have outputs over a large region of the input space, while RBF neurons only respond to relatively small regions of the input space. In addition to optimization parameter of networks like number of neurons, training algorithm, improvement of weights, there is extra parameter for optimization and flexibility of Gaussian activation function of RBFNs named as spread coefficient. Spread Coefficient (σ in Eq. 3) determines the selectivity of neurons. Using different among of Spread the Gaussian activation function could contract and expansion (Fig. 3). Therefore if spread is small the radial basis function is very steep and the neuron with the weight vector closest to the input will have a much larger output than other neurons. As spread becomes larger the radial basis function’s slope becomes smoother and several neurons can respond to an input vector. Thus the training process of RBFNNs is performed by deciding on how many hidden neuron there should be for modeling and the sharpness of the Gaussians using spread parameter. While the optimized weights are estimated using Simple Back propagation algorithm like for Multi Layer Perceptron approximation are kept fixed for RBF Networks modeling.

Models Performance Measures
There is an extensive literature on model forecasting evaluation indices (Wang et al., 2006). The R squared and equivalently Root Mean Squared Error (RMSE) is popular model measure because they are very sensitive to even small errors, which is good for comparing small differences of estimated and observed discharge on models.

(4)

(5)

Which n the numbers of observations, average discharge of river, Q_i the observed discharge of hydrometric station and is the estimated discharge of RBF model.

Cluster Analysis of Monthly Rainfall-Runoff Patterns
Seasonality in the water year period leads to some incongruous in dataset for training process of Neural Networks. Most of the hydrological models problems are overcoming seasonality on simulation. The objective of cluster analysis is the classification of objects according to similarities among them and organizing of data into groups. Different classifications can be related to the algorithmic approach of the clustering techniques. Variables that have high pairwise correlations are assigned to the same cluster, whereas those having low pairwise correlations are assigned to different clusters (Kamel and Selim, 1994). Generally, cluster analysis is based on two ingredients: Distance measure and Cluster algorithm. Distance can be measured among the data vectors themselves, or as a distance form a data vector to some prototypical object of the cluster.

One of the recommended distance measures for quantification of (dis-) similarity of objects is Euclidean method (Eq. 6). For cluster algorithm, K-means (Eq. 7) can be seen as an optimization problem which could minimize the sum of squared within-cluster distances. Each cluster in the partition is defined by its member objects and by its centroid. The centroid for each cluster is the point to which the sum of distances from all objects in that cluster is minimized. In the presented hybrid model K-means uses an iterative algorithm that minimizes the sum of distances from each rainfall-runoff training patterns to its cluster centroid, over all clusters. This algorithm moves training patterns between clusters until the sum cannot be decreased further. The result is a set of clusters that are as compact and well-separated as possible.

(6)

(7)

Which, d_e (i, j) is Euclidean distance between precipitation p_ij and p_jk, W(C) Minimize the sum of squared within-cluster distances.

Using this method 360 dataset of rainfall-runoff separated in 4 clusters, 124 training set in cluster one, 116 in cluster two, 52 in cluster three and 68 in cluster 4. Each cluster’s rainfall-runoff patterns divided in two sections for training and validation process of hybrid models. 70% percent of each class pattern used for training and other residual used for models validation. Four hybrid models, Cl 1-RBFNNs, Cl 2-RBFNNs, Cl 3-RBFNNs and Cl 4-RBFNNs, developed. Figure 4 show the structure of Hybrid Cluster Base-RBFNNs.


Fig. 4:	The structure of hybrid model (cluster base-RBFNNs)

Table 1:	Rainfall-runoff RBFNN optimized topology without any preprocessing of patterns

RESULTS AND DISCUSSION

The rainfall-runoff processes usually have pronounced seasonal means, variances and dependence structures and the under-lying mechanisms of streamflow are likely to be quite different during low, medium and high flow periods, especially when extreme events occur. Result of RBFNNs modeling of all 360 training rainfall-runoff patterns without any preprocessing show in Table 1. The optimized topology of training and testing obtain by spread coefficient 0.4. Considering the Table 1 show that model accuracy using R²for training and validation phase is 76.21, 68.94% and RMSE for training and validation phase 0.8802 and 0.9808. this result show the seasonality modeling problem for time series that even RBFNN has a restriction in training of relation between precipitation input vector and discharge output.

After clustering of all datasets in 4 classes, each classes train and test in different Radial Base Function Neural Networks (Fig. 5) and Cluster Base-RBFNNS developed. Because of homogenous in training cluster and reduction of seasonality in the hybrid model, accuracy of each class improved to some extend in comparison to no action on dataset. Result of each hybrid models Cl 1-RBFNNs, Cl 2-RBFNNs, Cl 3-RBFNNs, Cl 4-RBFNNs show in Table 2-5. The optimized topology of the Cl 1-RBFNNs in training and testing obtain by spread 0.3. The Table 2 show that model accuracy using R² for training section is 90.95%, 82.37% and RMSE is 0.0058 and 0.0987.


Fig. 5:	Dendrogram of cluster analysis of training dataset

Table 2:	Optimized topology of model 1 (cluster 1-RBFNN)

Table 3:	Optimized topology of model 2 (cluster 2-RBFNN)

It is recognized that data preprocessing can have a significant effect on model performance (Maier and Dandy, 2000). Results show that Clustering of Rainfall-Runoff patterns and modeling of each dataset by different RBFNNs has higher accuracy than no preprocessing of Patterns in prediction and modeling of river flow. Considering Table 6 show the capability of preprocessing of rainfall-runoff modeling using Cluster base-RBFNNs that could demonstrate other preprocessing methods like data reduction using PCA by Samani (2007).

Table 4:	Optimized topology of model 3 (cluster 3-RBFNN)

Table 5:	Optimized topology of model 4 (cluster 4-RBFNN)


Fig. 6:	Monitoring of models accuracy in Training and testing process using different spread coefficients

Table 6:	Topology of all 5 models RBFNN and Cluster base-RBFNNs

Figure 6 show the Monitoring of spread coefficient of Gaussian Function in 5 developed models. This result demonstrates that R square of each cluster base models in selective spread has better accuracy than No preprocessing in training and testing procedure.

CONCLUSION

There is a big difference between extreme data in the seasonal hydrological time series that clustering could separate homogenous data. Using homogenous training rainfall-runoff patterns modeling is much easier than no action. Cluster base-RBFNNs could use as a hybrid model to overcoming of nonlinearity modeling of river flow modeling. This comparison shows that preprocessing of training dataset as the hybrid RBFNNs model has better optimization than RBFNNs for simulating of the rainfall-runoff process in Nazloochaei river basin. RBFNN has efficient training algorithm (vs. multi-layer NN) but large training set is a problem. Cluster base-RBFNN hybrid model show high capability for prediction. It would be interesting to further compare the hybrid Cluster-based RBFNN approach with other hybrid techniques, such as Bayesian-concept based MLP, fuzzy-logic NNs modeling and SOM-cluster based hybrid modeling.

REFERENCES

Abrahart, R.J. and L. See, 2000. Comparing neural network and autoregressive moving average techniques for the provision of continuous river flow forecasts in two contrasting catchments. Hydrol. Process, 14: 2157-2172.
CrossRef Direct Link
Baratti, R., B. Cannas, A. Fanni, M. Pintus, G.M. Sechi and N. Toreno, 2003. River flow forecast for reservoir management through neural networks. J. Neurocomput., 55: 421-437.
CrossRef
Bhattacharya, B. and D.P. Solomatine, 2006. Machine learning in sedimentation modeling. J. Neural Networks, 19: 208-214.
CrossRef
Cigizoglu, H.K., 2004. Estimation and forecasting of daily suspended sediment data by multi-layer perceptrons. Adv. Water Resour., 27: 185-195.
Ceylan, R. and Y. Ozbay, 2007. Comparison of FCM, PCA and WT techniques for classification ECG arrhythmias using artificial neural network. Expert Syst. Applic., 33: 286-295.
CrossRef Direct Link
Chen, Y., B. Yang and J. Dong, 2006. Time-series prediction using a local linear wavelet neural network. Neurocomputing, 69: 449-465.
CrossRef
Coulibaly, P. and C.K. Baldwin, 2005. Nonstationary hydrological time series forecasting using nonlinear dynamic methods. J. Hydrol., 307: 164-174.
CrossRef
Dibik, Y.B. and D.P. Solomatine, 2001. River flow forecasting using artificial neural networks. Hydrol. Oceans Atmos., 26: 1-7.
CrossRef
Dogan, E., 2005. Suspended sediment load estimation in lower sakarya river by using artificial neural networks. Fuzzy logic neuro-fuzzy models. Elect. Lett. Sci. Eng., 1: 22-32.
Direct Link
Gao, Y. and E.M. Joo, 2005. NARMAX time series model prediction: Feedforward and recurrent fuzzy neural network approaches. Fuzzy Sets Syst., 150: 331-350.
CrossRef
Hu, T.S., K.C. Lam and S.T. Ng, 2001. River flow time series prediction with a range-dependent neural network. Hydrol. Sci. J., 46: 729-745.
Direct Link
Hsu, K.I., H.V. Gupta and S. Sorooshian, 1995. Artificial neural network modeling of the rainfall-runoff process. Water Resour. Res., 31: 2517-2530.
CrossRef Direct Link
Jain, A. and S. Srinivasulu, 2006. Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques. J. Hydrol., 317: 291-306.
CrossRef
Jang, J.S.R., 1993. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern., 23: 665-685.
CrossRef Direct Link
Jang, J.S.R., C.T. Sun and E. Mizutani, 1997. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. 1st Edn., Prentice Hall, Upper Saddle River, NJ., USA., ISBN: 0132610663.
Kisi, O., 2005. Suspended sediment estimation using neuro-fuzzy and neural network approaches. Hydrol. Sci. J., 50: 683-696.
CrossRef Direct Link
Kisi, O., 2006. Daily pan evaporation modelling using a neuro-fuzzy computing technique. J. Hydrol., 329: 636-646.
CrossRef Direct Link
Kamel, M.S. and S.Z. Selim, 1994. New algorithm for solving the fuzzy clustering problem. Pattern Recog., 27: 421-428.
CrossRef
Lorrai, M. and H.M. Sechi, 1995. Neural nets for modeling rainfall-runoff transformations. Water Resour. Manage., 9: 299-313.
CrossRef Direct Link
Maier, H.R. and G.C. Dandy, 2000. Neural networks for the prediction and forecasting of water resources variables: A review of modeling issues and applications. Environ. Model. Software, 15: 101-124.
CrossRef
Mason, J.C., R.K. Price and A. Ternme, 1996. A neural network model of rainfall-runoff using radial basis functions. J. Hydraulic Res., 34: 537-548.
Direct Link
Nayak, P.C., K.P. Sudheer, D.M. Rangan and K.S. Ramasastri, 2004. A neuro-fuzzy computing technique for modeling hydrological time series. J. Hydrol., 291: 52-66.
CrossRef
Nilsson, P., C.B. Uvo and R. Bentsen, 2005. Monthly runoff simulation: Comparing and combining conceptual and neural network models. J. Hydrol., 317: 1-20.
CrossRef
Nouri, M. and H. Abghari, 2007. Simulation of rainfall-runoff using RBFNNs base on probabilistic neural network classification. Proceedings of the 3rd Conference on Watershed Management Kerman and Water Resources Management, May 10-11, 2007, Iranian Society of Irrigation and Water Engineering Press, pp: 1108-1113.
Pal, N.R., S. Pal, J. Das and K. Majumdar, 2003. SOFM-MLP: A hybrid neural network for atmospheric temperature prediction. IEEE Trans. Geosci. Remote Sensing, 41: 2783-2791.
CrossRef Direct Link INSPEC
Samani, N., M. Gohari-Moghadam and A. Safavi, 2007. A simple neural network model for the determination of aquifer parameters. J. Hydrol., 340: 1-11.
CrossRef
See, L. and S. Openshaw, 1999. Applying soft computing approaches to river level forecasting. Hydrol. Sci. J., 44: 763-778.
Direct Link
Wang, W. Van, P. Gelder and J.K. Vrijling, 2006. Forecasting daily stream flow using hybrid ANN models. J. Hydrol., 324: 383-399.
CrossRef
Wasserman, P.D., 1993. Advanced Methods in Neural Computing. 1st Edn. John Wiley and Sons, Inc., New York, USA., ISBN:0442004613 pp: 250.
Direct Link
Yurekli, K., K. Kurunc and H. Simsek, 2004. Prediction of daily maximum streamflow based on stochastic approaches. J. Spatial Hydrol., 4: 1-12.
Direct Link
Zhang, B. and S. Govindaraju, 2000. Prediction of watershed runoff using bayesian concepts and modular neural networks. Water Resour. Res., 36: 753-762.
CrossRef Direct Link
Karunasinghe, D.S.K. and S.Y. Liong, 2006. Chaotic time series prediction with a global model artificial neural network. J. Hydrol., 323: 92-105.
CrossRef

Asian Journal of Applied Sciences

Research Article

Cluster Analysis of Rainfall-Runoff Training Patterns to Flow Modeling Using Hybrid RBF Networks

ABSTRACT

How to cite this article

Search

INTRODUCTION

MATERIALS AND METHODS

RESULTS AND DISCUSSION

CONCLUSION

REFERENCES

Search

Related Articles

Comments

BLISSAG Bilal Reply

Leave a Comment