HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2008 | Volume: 8 | Issue: 15 | Page No.: 2687-2694
DOI: 10.3923/jas.2008.2687.2694
A Novel Intelligent Method of Experiment Design for Modeling
S. Jafari, H.R. Abdolmohammadi, H. Eliasi, M.B. Menhaj and M.R. Rajati

Abstract: The aim of this study is to provide an experiment design method for modeling and function approximation. Modeling real-life systems is extremely of interest nowadays. Models could be useful in analysis of systems and help us understand their behavior. From a new point, models could be classified into three classes: black box models, gray box models and white box models. Our idea is related to black box modeling. Proper performance of a black box model depends on structure of the model as well as the data used to determine its parameters. Although one of the important factors affecting the richness of the dataset is the number of data, increasing the number of data points is limited in real problems. For instance gathering data from many systems imposes spending lots of time and cost. In this study, inspired by honey bee algorithm, we have designed a method which enriches the datasets for a known number of data, in comparison to other conventional data extraction methods. In such a method, after extracting some data by grid method, the other data points are extracted according to an intelligent analysis on available data. The results illustrate the efficiency of the proposed method.

Fulltext PDF Fulltext HTML

How to cite this article
S. Jafari, H.R. Abdolmohammadi, H. Eliasi, M.B. Menhaj and M.R. Rajati, 2008. A Novel Intelligent Method of Experiment Design for Modeling. Journal of Applied Sciences, 8: 2687-2694.

Keywords: modeling, evolutionary algorithms and Data extraction

INTRODUCTION

In many areas of science, system modeling is an important task. A model is a useful tool for system analysis and helps the designer obtain a better scope of the behavior of the system. Furthermore, a model enables us to simulate and predict the behavior of a system.

In engineering applications, models are especially needed for analysis and design of systems. For instance, advanced technologies for controller design, optimization, supervision and surveillance, fault detection and so on are based on models of the systems.

From a viewpoint, models could be divided into three groups: white box, black box and gray box.

Our idea is related to black box modeling. Black box models rely on experimental data and need no a-priori knowledge. The parameters and the structure of the model may have no relation with the actual structure of the system.

It is well-known that a model is always an approximation of the system`s real relationships and its response does not always correspond to the real responses of the system. This is more sensible in black-box models.

Actually, we cannot test the correctness of the model`s responses for all inputs. All efforts are performed to increase the probability of proper performance of the model. It is well-known that the proper performance of a model is dependent on the model structure and the data used to tune the model parameters. The richness of the training data to represent the system appropriately enhances the accuracy of the model.

One of the factors which influence the richness of the data is the number of the data points. However, in real-world problems, there are limitations on the increment of the number of data points. Acquiring data from many systems and processes requires lots of time and cost which are serious limits on increasing the number of data points. For instance, one can point to modeling the mechanical characteristics of an alloy in terms of its ingredients (Datta and Banerjee, 2006), estimation of the amount of mineral reservoirs (Tercan and Karayigit, 2001), modeling the dying properties of materials in the textile industry (Senthilkumar, 2006; Daneshvar et al., 2006; Keong et al., 2004; Dai et al., 2004; Hebbar et al., 2006; Khamis et al., 2006; Elkamel, 1998; Turkoglu et al., 1999).

Sometimes, it is necessary to model the system with an available set of data. This is not of interest of in this study. The idea is to obtain the proper set of data, which represents the system, appropriately.

Suppose, that the limitations permit us just to obtain M data points. We can obtain the data points with the following methods:

Selection of M data points in the domain of interest randomly
Griding each dimension of the input
Griding each dimension of the inputs and selecting the data randomly in each interval

The problem with all the above three methods is the batch nature of acquiring the data and there is no intelligence in obtaining them. On the other hand, it seems that when the data points are bring extracted, our knowledge of the system behavior increases and we could better judge the nature of the system.

The idea is using such kind of knowledge to enrich the procedure of data extraction to do this, inspired by the honey-bee algorithm (Seifipour and Menhaj, 2001), we have designed a method which analyzes the available data set and selects the best data point appropriately.

THE PROPOSED ALGORITHM

At the first step, we determine the number of the needed data points (M) according to the existing constraints (for example costs, time and so on). Then, we obtain a number of the data (N) according to the grid method to initialize the algorithm. Inspired by the honey bee algorithm, one of the data points should be selected as queen. The new data will be generated around the queen. To do this, we need a fitness function to choose the queen. It seems that the data in which the gradient is maximum is a suitable candidate. On the other hand, those part of the input space in which the number and density of data are less should be considered for data extraction. According to this, we selected the following fitness function:

(1)

Fig. 1: Neighbor selection algorithm`s flowchart

In which (xi, f(xi)) is the data point whose fitness is to be evaluated. H is the number of the data in xi`s neighborhood, f is the unknown function to be modeled and dj is the Euclidean distance between xi and its jth neighbor xj. The method of determination of a neighbor for each point in the input space is shown in Fig. 1.

It is notable that we could consider any other criterion and constraint in data selection in the fitness function. Although it is considered in the fitness function, the data could be avoided from being very dense in some regions of the input space by a tabu list. After the selection of the queen, according to the relative fitness function, the best individual is selected from its neighborhood as its mate. Considering the fact that the neighboring data (xi, f(xi)) which has the most distant f(xi) from the queen is a suitable candidate to breed the queen, we consider:

(2)

In which xi is the point of which the relative fitness function is to be calculated. Q is the queen and dj is the Euclidian distance between xj and q. the offspring is created by the crossover of the queen and its mate. The crossover could be performed by many methods. We used the arithmetical mean as the crossover operator.

Fig. 2: Flowchart of the proposed algorithm

This algorithm continues until the remainder M-N data points are generated. Figure 2 shows the flowchart of the proposed algorithm.

SIMULATION RESULT

To verify the algorithm, we extracted some data points to approximate the functions of Table 1 and 2. We employed four methods: random, grid, grid-random and the proposed method. On the other hand, from each function, we obtained a huge bulk of test data points (100 M) to examine the ability of the methods in providing rich data.

To approximate the function of Table 1, we used the following methods:

Training on MLP neural network with 5 neurons in the first layer and 3 neurons in the second layer and 1 neuron in the output layer with tangent-sigmoid stimulus functions in the first and second layer and a linear stimulus function in the output layer. This structure has been obtained by try-and-error and is not necessarily the optimal structure, but it is adequate for our purpose which is comparison of the methods

For each method for data point extraction, we trained the network 20 times and calculated the mean of network`s mean square error for the test data.

Piecewise Cubic Hermite Interpolating polynomial

The error criterion is MSE (Mean Square Error). It is seen that the proposed method is usually better than the other methods (Table 3).

Table 1: Single-variable functions used for data extraction

Table 2: A two-variable function used for data extraction

Table 3: The simulation results for the single-variable functions

Fig. 3:
(a) The diagram of the function f1, (b) data extracted from f1 by grid method, (c) data extracted from f1 randomly, (d) data extracted from f1 by the grid-random method, (e) extracted data from f1 by the grid method applied in proposed method as initial data and (f) data extracted from f1 by the proposed method

Table 4: The simulation result for the two-variable function

Fig. 4:
(a) The diagram of the function f4, (b) data extracted from f4 by the grid method, (c) data extracted from f4 randomly, (d) data extracted from f4 by the grid-random method (e) extracted data from f4 by the grid method applied in proposed method as initial data and (f) data extracted from f4 by the proposed method

To approximate the function shown in Table 2 we used two methods:

A three layer MLP neural network with 10 neurons in the first layer and 6 neurons in the second layer and tangent-sigmoid stimulus functions and 1 neuron in the output layer with a linear stimulus function
Triangle-based Cubic Interpolation (TCI)

It is easily seem that the error due to the proposed method is less than the other methods (Table 4).

For example in Fig. 3-5 the data extracted by the proposed method are shown and compared to the data extraction by other methods. Figure 6 shows more examples, which demonstrate efficiency of the proposed method.

Fig. 5:
(a) The diagram of the function f6, (b) data extracted from f6 by the Grid method, (c) data extracted from f6 randomly, (d) data extracted from f6 by the grid-random method, (e) extracted data from f6 by the grid method applied in proposed method as initial data and (f) data extracted from f6 by the proposed method

Fig. 6:
(a) The diagram of the function f5, (b) extracted data from f5 by the grid method applied in proposed method as initial data, (c) data extracted from f5 by the proposed method, (d) the diagram of the function f7, (e) extracted data from f7 by the grid method applied in proposed method as initial data and (f) data extracted from f7 by the proposed method

CONCLUSION

In this study a novel method is presented for data extraction for function approximation. The proposed method is explained completely and its efficiency is demonstrated theoretically. The algorithm is verified by simulation results. More efficient fitness functions and other evolutionary operators should be investigated in the forthcoming contributions. Other intelligent algorithm such as neural networks should be examined for data extraction in the future.

REFERENCES

  • Dai, Q.X., Z.Z. Yuan, M.X. Luo and X.N. Cheng, 2004. Numerical simulation of Cr2N age-precipitation in high nitrogen stainless steels. Mater. Sci. Eng., 385: 445-448.
    CrossRef    


  • Daneshvar, N., A.R. Khataee and N. Djafarzadeh, 2006. The use of artificial neural networks (ANN) for modeling of decolorization of textile dye solution containing C.I. basic yellow 28 by electro coagulation process. J. Hazard. Mater., 137: 1788-1795.
    CrossRef    


  • Datta, S. and M.K. Banerjee, 2006. Mapping the inut-output relationship in HSLA steels through expert neural network. Mater. Sci. Eng., 420: 254-264.
    CrossRef    


  • Elkamel, A., 1998. An artificial neural network for predicting and optimizing immiscible flood performance in heterogeneous reservoirs. Comput. Chem. Eng., 22: 1699-1709.
    CrossRef    


  • Hebbar, A., M. Mechmache D. Ouinas and A. Berras, 2006. Application of the experimental designs on the modeling of the combustion's parameters. J. Applied Sci., 6: 2599-2604.
    CrossRef    Direct Link    


  • Keong, K.G., W. Sha and S. Malinov, 2004. Artificial neural network modeling of crystallization temperatures of the Ni�P based amorphous alloys. Mater. Sci. Eng., 365: 212-218.
    CrossRef    


  • Khamis, A., Z. Ismail, K. Haron and A.T. Mohammed, 2006. Neural network model for oil palm yield modeling. J. Applied Sci., 6: 391-399.
    CrossRef    Direct Link    


  • Seifipour, N. and M.B. Menhaj, 2001. A GA-based algorithm with a very fast rate of convergence. 1st Edn. London, UK., pp: 185-193


  • Senthilkumar, M., 2006. Modeling of CIELAB values in vinyl sulphone dye application using feed-forward neural networks. Dyes Pigments, 75: 356-361.
    CrossRef    


  • Tercan, A.E. and A.I. Karayigit, 2001. Estimation of lignite reserve in the Kalburcayiri field, Kangal basin, Sivas, Turkey. Int. J. Coal Geol., 47: 91-100.
    CrossRef    


  • Turkoglu, M., I. Aydin, M. Murray and A. Sakr, 1999. Modeling of a roller-compaction process using neural networks and genetic algorithms. Eur. J. Pharm. Biopharm., 48: 239-245.
    CrossRef    PubMed    Direct Link    

  • © Science Alert. All Rights Reserved