INTRODUCTION
Data classification methods provide an efficient means of resolving the “data
overload” problem faced by many decisionmakers. Typically, data classification
methods are designed to identify the unique characteristics of the input data
so that they can be assigned to appropriate classes. Common data classification
methods include the nearest neighbor classifier, ID3 classification trees, fuzzy
decision trees and Artificial Neural Networks (ANNs) (Lotfi
et al., 2006; Giordano et al., 2007;
Umer and Khiyal, 2007; Wu et
al., 2008; Gketsis et al., 2009; Saravanan
and Ramachandran, 2010; Wang et al., 2011).
ANN schemes have been widely applied for such uses as forecasting and classification
(Sun et al., 2000; Zhang
et al., 2001; SrivareeRatana et al.,
2002; Ambrogi et al., 2007; Fahimifard
et al., 2009; Solaimani, 2009; Gu
et al., 2011; Tahir and Manap, 2012). In
addition, Backpropagation Neural (BPN) networks are one of the most commonly
employed of all ANNs and have been successfully applied to solve many data classification
problems (Salchenberger et al., 1992; Nussbaum
et al., 1995; Liu et al., 2001; Wakaf
and Saii, 2009). However, BPN networks have notoriously unreliable performance
when trained using the Gradient Steepest Descent Learning Algorithm (GSDLA).
GSDLAtrained BPN networks have two major drawbacks, namely: (1) the GSDLA converges
slowly to the optimal solution and (2) when trapped into some local areas, the
GSDLA tends to converge to local suboptimal solutions during the iterative
optimization procedure (Ikuno et al., 1994).
Genetic Algorithms (GAs) have a proven ability to improve the classification
performance of ANN schemes (Sexton et al., 1998;
Goffe et al., 1994; Ramasamy
and Rajasekaran, 1996; Panda et al., 2007;
Soltani et al., 2007; Hsu
et al., 2009; Liu, 2010). GAs have also been
successfully applied to identify the optimal BPN network topology and parameter
settings Sexton compared the optimization performance between a geneticalgorithmderived
BPN network and conventional BPN network and found that in most cases genetic
algorithms could improve the BPN network (Sexton et al.,
1998). Sexton et al. (1999) further compared
the performance of optimizing the BPN network derived using Simulated Annealing
(SA) and GAs on certain topologies of a BPN network model. The values assigned
to the GA’s parameters, i.e. the crossover rate, the mutation rate and
the size of the population, have a direct impact on the results obtained for
a particular optimization problem (Kirkpatrick et al.,
1983; Sexton et al., 1999).
In decisionmaking, the data type and the distribution pattern of the data
values have a direct bearing on the choice of statistical test used for their
evaluation. It is therefore important to recognize or at least to immunize the
attribute noise of the data points in a sample drawn to represent the entire
population. Broadly speaking, these attributes may be categorized as continuous,
ordinal or nominal (Sheats and Pankratz, 2002).
Choosing a suitable classification model and assigning appropriate level settings to each factor within the model is a difficult task for even experienced decisionmakers. Generally, trialanderror experimental techniques are used to identify the significant factors and to assign appropriate factor level settings. However, such approaches are invariably timeconsuming and thus expensive. Therefore, a straightforward and systematic technique for calibrating the parameters of a GA such that it can then be used to optimize the topology and parameter settings of a BPN network classification scheme is required.
Taguchi Orthogonal Arrays (OAs) (Taguchi, 1986) enable
robust, noiseimmune design solutions to be obtained from a minimum number of
experimental trials. Many studies have demonstrated the use of OAs in calibrating
heuristic algorithms. For example, Gupta (1999) considered
the problem of a tabu searchbased heuristic mechanism designed to solve the
twostage bow shop problem. In the proposed approach, the Taguchi method was
used to analyze the effects of four factors, namely the initial solution, the
type of move, the neighborhood size and the list size, on the performance of
the heuristic mechanism. The results enabled the optimal factor settings to
be determined and therefore improved the effectiveness of the heuristic algorithm.
Chien and Tsai (2003) developed an analytical model
for the prediction of tool flank wear and then used a GAbased optimization
scheme, in which the parameters were calibrated using the Taguchi method, to
determine the optimal cutting conditions when machining 174PH stainless steel.
They showed that the model was capable of successfully predicting tool flank
wear.
The Taguchi method has emerged as the method of choice for analyzing the main
and interaction effects of the individual factors of a design problem and for
screening and ranking these factors such that optimal design solutions can be
obtained at a minimal experimental cost (Roy, 1990). In
the present study, the Taguchi method is employed to calibrate the parameters
of a GA scheme used to optimize the topology and parameter settings of a BPN
network designed to classify data of different types.
The objectives of the present study can be summarized as follows:
• 
To provide decisionmakers with the ability to immunize the
noise of data types robustly such that they can establish the data collection
method and determine the construction of a highly efficient classification
model 
• 
To calibrate the GA used to design the BPN network parameters and topology
to enhance its optimization performance 
• 
To investigate the optimal combination of the GA’s controllable factors
and their level settings by using a robust noiseimmune experimental design 
• 
To use ANOVA and the Analysis of the Mean (ANOM) (Roy,
1990) to rank and screen the controllable factors of the GA to reduce
the experimental cost 
MATERIALS AND METHODS
This section describes the use of the Taguchi method to screen and rank the
main and interaction effects of the controllable factors in the GAbased BPN
classification model so that the network’s classifying performance is rendered
robust when the data types are immune to noise. The design process starts by
identifying the various attributes of the data types of common interest to decisionmakers.
Appropriate level setting ranges are then specified for the controllable factors
of the GA; i.e. the crossover rate, the mutation rate and the size of the population.
These factors are then arranged in a Taguchi orthogonal array so that their
main and interaction effects can be systematically examined when the GA is applied
to optimize the topology and parameter settings of three BPN networks designed
to classify data of different types. Figure 1 shows a schematic
overview of the design methodology applied in the present study. The basic steps
in this framework can be summarized as follows:
Step 1: 
Identify the attributes of different data types: In
general, data classification problems involve the processing of a different
number of attributes. In practice, this implies that the topology and parameter
settings of the BPN network employed to carry out the data classification
process depend on the nature of the input data. Accordingly, this step of
the proposed methodology identifies the attributes of three representative
datasets (containing continuous, ordinal and nominal data, respectively)
downloaded from the server of the Department of Information and Computer
Science (ICS) at the University of California 
Step 2: 
Identify the BPN network topology and parameters: The BPN network
parameters include the initial weight value, learning rate, momentum factor
and number of hidden layers, number of neurons in the first hidden layer
and number of neurons in the second hidden layer. In this step, the calibrated
genetic algorithm is used to specify appropriate network values for the
BPN. Appropriate ranges are assigned to each of these BPN network parameters
to form an initial search space for the GA optimization procedure 
Step 3: 
Code the BPN network parameters in binary form: The chromosomes
of the genetic algorithm should be in the form of a simple binary sequence
comprising a series of 0 and 1 sec. This study thus uses a binary method
to encode and decode the parameter and topology settings of the BPN model,
which are optimized using a GA 
Step 4: 
Identify the parameters of the GA used to optimize the BPN network
model: Appropriate level settings are assigned to the crossover rate,
mutation rate and size of the population parameters of the GA to establish
an initial search space for the Taguchi design procedure 
Step 5: 
Rank and screen the GA factors: Employing the ANOVA statistical
method, the effects of each of the main GA parameters in the BPN network
(and the interactions between them) on the classification performance are
analyzed so that the parameters with the greatest effect can be identified 
Step 6: 
Calibrate and Pool the GA factors: Based on the ANOVA results obtained
in step 5, the effects of pooling one or more of the controllable factors
are considered so that an informed decision can be made regarding a potential
tradeoff between the quality of the robust design solution and the experimental
cost 
Step 7: 
Optimize the BPN parameters and topology: The calibrated GA is
used to optimize the BPN topology and parameter settings 

Fig. 1: 
Framework of proposed robust design procedure for GAbased
BPN network 
Definition of different data types: Broadly speaking, data can be classified
as either continuous or categorical in nature (Table 1). Qualitative
data are always categorical, whereas quantitative data are generally continuous
but may be categorical in some cases. Categorical data may be further classified
as either ordinal or nominal. Ordinal data (e.g., ranking or grading data) have
an order to their levels of assignment, with each data point being more (or
less) severe than the previous one. Conversely, nominal data (e.g., race or
gender) have no inherent order to their levels of assignment (Sheats
and Pankratz, 2002). Consequently, the data points in the sample used in
this study include continuous data, ordinal data and nominal data.
Identifying the BPN network parameters and specifying appropriate ranges:
Many studies have demonstrated the ability of GAs to optimize the topology and
parameter settings of BPN networks so that their classification performance
can be improved. However, before the GA can be applied to carry out this optimization
process, it is first necessary to specify exactly which BPN network parameters
are to be considered in the optimization procedure and to assign an appropriate
range to each one. As discussed in step 2 above, six BPN network parameters
are considered in the current GA optimization procedure, namely the initial
weight value, the learning rate, the momentum factor, the number of hidden layers,
the number of neurons in the first hidden layer and the number of neurons in
the second hidden layer. Table 2 indicates the ranges assigned
to each of these six parameters.
Table 1: 
Overview of basis data types 

Table 2: 
BPN network parameters and setting ranges 

In accordance with conventional BPN network theory, the initial weight value,
the learning rate and the momentum factor parameters are assigned ranges of
0~1.0. Although one hidden layer is generally sufficient in most BPN networks,
two hidden layers may be required for more complex classification problems.
Accordingly, in the current study, the hidden layer parameter is specified as
either 1 or 2. The number of neurons in each hidden layer is related to the
total number of hidden layers in the BPN structure (Zurada,
1995; Khaw et al., 1995; Sexton
et al., 1998). As a result, in the current study, the number of neurons
in the first and second hidden layers is specified in a more expanded range
of 1~63 in both cases.
Code the BPN parameters in binary form: In general, GA optimization
procedures commence with a population of randomlychosen chromosomes, where
each chromosome represents a potential solution to the specified problem (Khoo
and Loi, 2002; Kamrani and Gonzalez, 2003; Siani
and de Peretti, 2007). For computational simplicity, Goldberg
(1989) proposed that these chromosomes should be expressed in the form of
a simple binary sequence comprising a series of 0 and 1 sec. The search space
for the current GAbased BPN optimization process is defined by the BPN network
parameter settings specified in Table 2. Therefore, in accordance
with the GA methodology, each parameter range is transformed into a binary sequence
and the individual binary sequences are then concatenated to form a single chromosome
string. As shown in Fig. 2, this process results in the construction
of a string with a total of 39 bits of binary code. The initial weight range
(i.e., 0~1.0) is mapped using a sevenbit substring. When decoding this string
back into a real number, the maximum decoded value is given by 2^{7}1(=
127). Therefore, having identified the optimal chromosome via the GA iterative
procedure, the corresponding real value of the weight factor is given by the
quotient obtained when decoding the optimal sevenbit substring and then dividing
the result by 127. As shown in Fig. 2, the learning rate,
momentum factor, number of hidden layers, number of neurons in the first hidden
layer and number of neurons in the second hidden layer are coded using eightbit,
tenbit, twobit, sixbit and sixbit substrings, respectively. The real optimal
values of each parameter are obtained using the same decoding and division procedure
as that described above for the weight factor.
Construct orthogonal design: The OAs used in the Taguchi method enable
the optimal design parameters to be established from a minimum number of experiments
and therefore reduce the total experimental cost. Moreover, the use of an OA
enhances the reproducibility of the experimental results and enables an objective
setting of the experimental factor levels. Various OA layouts are commonly employed
and the choice of an appropriate array depends on the degrees of freedom of
the particular experiment. The OA notation L_{4} (2^{3}) indicates
that the experiment requires a total of four trial runs, where each trial run
involves a maximum of three factors, the values of which can be assigned on
one of two different levels. Similarly, an L_{9} (3^{4}) OA
prescribes nine experimental runs, each involving a maximum of four factors
with three permissible level settings. As shown in Table 3,
the Taguchi procedure used to calibrate the controllable factors of the GA is
performed using an OA of the latter type. In this table, the four columns correspond
to the three controllable factors of the GA (i.e., the crossover rate, mutation
rate and the size of population and the interaction between the crossover rate
and the mutation rate).

Fig. 2: 
Binary code representations of BPN network parameters 
The table indicates that in the first experimental run, the three controllable
factors and the interaction parameter are all assigned their Level 1settings.
Similarly, in the second experimental run, the crossover rate is assigned its
Level1 setting, while the mutation rate, size of population and interaction
effect are all assigned their Level2 settings. The remaining trial runs are
arranged in such a way as to achieve an orthogonal property (Roy,
1990).
The quality of the design solution achieved in each experimental run is assessed
using the following signaltonoise metric:
where, MSD is the meansquared deviation of the experimental result from the
target value of the specified quality characteristic, i.e., MSD = (y_{1}^{2}+y_{2}^{2}+y_{3}^{2}+…)/n
where y_{1}, y_{2}… etc. are the deviation of the experimental
result y_{i} from the target value (where i = 1…n) and n is the
total number of repetitions of y_{i}.
Rank selected factors: Having performed the experimental trials prescribed in the OA, the experimental outcomes are analyzed using conventional ANOVA and ANOM statistical methods. In the analysis procedure, the ANOVA statistical technique is applied to rank the controllable factors of the GA (and the interaction effect between them) in terms of the extent to which they affect the classification performance of the BPN. The ANOM approach is then used to establish the optimal level settings for each of the controllable factors and the interaction term.
Pool selected factors: The ANOVAbased ranking and screening results
may indicate that one or more of the controllable factors can be pooled, since
variations in their values have a negligible effect on the BPN network classification
performance.
Table 3: 
L_{9} (3^{4}) orthogonal array for current
classification model of GA 

A: Crossover rate, B: Mutation rate, C: Size of population,
AB: Intraction between crossover and mutation rate 
Having pooled the least influential factors, a further optimization experiment
is then conducted using the remaining controllable factors to ensure that the
quality of the solution is not seriously degraded.
Optimize the BPN network topology and parameters using calibrated GA: In the discussions above, in the calibration process the GA parameters are assigned in accordance with the level settings in each row of the orthogonal array and the GA is then used to search for the optimal BPN parameter settings for each of the classification problems (Glass, Dermatology and HayesRoth in this study). Finally, the classification performance of each BPN network is evaluated by computing (E_{RMSN}), as shown in Table 3. Once the GA has been calibrated, it is used to optimize the BPN networks used to solve the three data classification problems.
In general, the quality of the BPN network classification results is evaluated using the rootmeansquare normalized error (E_{RMSN}) metric:
where, P is the number of input examples, K is the number of output neurons,
d_{ij} is the expected output value of the jth neuron in the ith example
and o_{ij} is the actual output value of the jth neuron in the ith example.
Illustrated example: In this study, the performance of GAoptimized BPN networks was evaluated by considering their effectiveness in solving data classification problems involving three different types of data, namely continuous, ordinal and nominal.
Table 4 summarizes the characteristics of the databases downloaded for evaluation purposes from the server of the Department of Information of Computer Science (ICS) at the University of California. The major features of these three databases are discussed.
Glass database: Continuous data values; 100 training samples and 258
testing samples. Glass splinters left at the scene of a crime may provide investigators
with valuable clues. Therefore, a typical data classification problem involves
analyzing the properties of a glass splinter such that its type can be identified.
The nine attributes of the data records in this database include nine input
attributes, namely the refractive index, sodium, magnesium, aluminum, silicon,
potassium, calcium, barium and iron and one output attribute, the glass type
(floatprocessed building windows, nonfloat processed building windows, floatprocessed
vehicle windows, nonfloat processed vehicle windows, containers, tableware
or headlamps).
Table 4: 
Problem data types 

Therefore, in using the GA to optimize the structure of a BPN to perform this
classification process, a total of nine neurons are specified in the input layer
and one neuron is specified in the output layer.
Dermatology database: Ordinal values; 100 training samples and 114 testing samples. In the dermatological field, performing differential diagnoses of erythematosquamous diseases is a common requirement. The dataset includes 34 input attributes and a single output attribute. As a result, during the GA optimization procedure, the input and output layers are assumed to have 34 neurons and one neuron, respectively. In the dataset created for this domain, the input samples are classified in a range of 0 to 3 to indicate the presence or absence of 34 features. Every feature used for classification purposes is assigned a degree in the range of 0 to 3, where 0 indicates that the feature is not present, 1 and 2 indicate the relative intermediate values and 3 indicates the largest amount possible. The seven diseases classified in the output neuron are psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, cronic dermatitis and pityriasis rubra pilaris.
HayesRoth database: Nominal values; 100 training samples and 32 testing samples. Each record in this dataset has four input attributes, namely hobby, age, education and marital status and one output attribute corresponding to three different classes. As a result, during the GA calibration procedure the BPN is specified as having four nodes in the input layer and a single node in the output layer.
Calibration of GA control factors using Taguchi method: The controllable
factors of the GA were calibrated by performing a Taguchi design process in
which the GA was used to identify the optimal topologies and parameter settings
of the BPNs required to solve the classification problems described above. In
performing the Taguchi design process, the controllable factors of the GA were
assigned three levels, i.e., crossover rate (0.5, 0.7, 0.95) (Maniezzo,
1994; Srinivas and Patnaik, 1994), mutation rate
(0.001, 0.01, 0.05) (Perez and Holzmann, 1997) and size
of population (10, 20, 30) (Maniezzo, 1994; Perez
and Holzmann, 1997). The treatment levels of these controllable factors
are summarized in Table 5.
Table 5: 
Factor levels of GA 

RESULTS AND DISCUSSION
As discussed in Previous Section, the controllable factors of the GA were calibrated using a Taguchi design procedure based on an L_{9} (3^{4}) OA. As shown in Table 3, this array prescribes nine experimental trials involving four factors, each with three level settings. Columns 1, 2 and 4 are assigned to the controllable factors as follows: Column 1: Factor A (crossover rate), Column 2: Factor B (mutation rate) and Column 4: Factor C (size of population). Column 3 is used to investigate the effect of the interaction between factors A and B. The numerals in columns A, B and C correspond to the factor level settings shown in Table 5. In the calibration procedure, the optimal BPN network structure for each of the three data classification problems was identified using the factor level settings specified in the nine experimental trials. The quality of the corresponding solution (i.e., the classification performance of the resulting BPN network) was evaluated using the E_{RMSN} metric. The corresponding results are presented in the Glass, Dermatology and HayerRoth columns of Table 3.
Ranking selected factors: Having conducted the experimental trials,
the E_{RMSN} data were analyzed using the ANOVA statistical method to
rank the controllable factors and the interaction factor in terms of their relative
influences on the classification performance of the BPN network. The full details
of this analysis process are presented in Roy (1990) and
are therefore omitted here. The ANOVA analysis results for the current Taguchi
experiments are presented in Table 68,
in which the column headings are defined as follows:
f 
= 
Degree of freedom 
S 
= 
Sum of squares 
S’ 
= 
Pure sum of squares (pooled) 
V 
= 
Mean squares (variance) 
F 
= 
Variance ratio 
P 
= 
Percentage contribution 
Note that in the discussions which follow, the term “interaction”
(indicated by the symbol “x” between the two interacting factors)
describes the condition in which the effect of one factor’s influence upon
the result is dependent on the level setting of the other. The righthand column
of Table 6 indicates that the percentage contributions of
factors A, B and C are 4.4, 14.2 and 69.8%, respectively, while that of the
interaction term AxB is 11.6%.
Table 6: 
ANOVA analysis for GA controllable factors 

In other words, these factors can be ranked in terms of reducing influence
as follows: factor C, factor B and factor A. To evaluate the potential for reducing
the experimental cost and difficulty, the following section considers the stepbystep
pooling of factor A, the interaction term AxB and Factor B, respectively and
evaluates the corresponding effects on the quality of the design solution.
Factor pooling analysis: Table 7 presents the ANOVA
analysis results for the case where Factor A and interaction factor AxB are
pooled. It can be seen that factor C (with an F value of 8.7332 in Table
7) has a significantly greater effect on the classification performance
of the BPN than factor B. Conducting an Ftest, the F values are found to be
at a 95% confidence level (F Table F_{0.05}(f_{1}, f2), 95%
confidence). Table 7 shows that F_{B} = 1.7774 and
F_{C} = 8.7332. Since, F_{B} = 1.7774 is less than the value
computed in the Ftest, i.e., ,
factor B can also be pooled with factor A and the interaction factor AxB. The
corresponding ANOVA analysis results are presented in Table 7
and 8. The results indicate that the value of Factors A and
B (i.e., the crossover rate and mutation rate) can be arbitrarily assigned within
the defined range without significantly influencing the performance of the GAoptimized
BPN network classification scheme.
Using calibrated GA to identify the BPN network parameters and topology:
As described previously, the optimal levels of each GA parameter are obtained
using the ANOM statistical method. Observing the entries in Column 1 of Table
3 corresponding to factor A, it can be seen that “Level 1” occurs
in experimental runs 1, 2 and 3. The average effect of A_{1} (i.e.,
the Level1 setting of factor A) can be calculated by processing the S/N ratios
corresponding to these three experiments as follows:
The average effects of the level2 and level3 settings for factor A can be
computed in a similar manner. The procedure is then repeated for the three level
settings of factors B and C and the interaction term. The corresponding results
are summarized in Table 9 and it can be seen that the S/N
ratio of factor A is maximized when the level3 setting is specified.
Table 7: 
ANOVA analysis for GA controllable factors (factors A, AxB
pooled) 

Table 8: 
ANOVA analysis for GA controllable factors (factors A, B,
AxB pooled) 

Table 9: 
ANOM analysis of main effects (S/N) 

In the Taguchi experiments, a higher S/N ratio indicates improved BPN network
classification performance. Hence, the results presented in Table
3 indicate that the GA parameters should be specified as follows: Factor
A: Level 3 (crossover rate of 0.95); Factor B: Level 3 (mutation rate of 0.05);
and Factor C: Level 3 (size of population of 30).
Effects of calibrating the GA factors on the BPN classification performance: The experimental results were performed using MATLAB version 6.5 (Math Works) running on a Pentium(R) CPU 1.73 GHz PC.
It can be deduced that when all three of the GA’s controllable factors are calibrated, the BPN network classification performance is 0.0105 (simulation test). When only Factors B and C are calibrated (i.e., as considered in Table 7), the BPN classification performance is found to be 0.0111. When only Factor C is calibrated (i.e., as considered in Table 8), the BPN network classification performance is 0.0114. Finally, when none of the controllable factors are calibrated, the BPN network classification performance is 0.0134. RMSE is reduced from 0.0134 to 0.0105; the error rate represented by RMSE is effectively reduced by about 21.64%.These results confirm that the use of the orthogonal array in calibrating the GA’s controllable factors extends the solution search space, thereby increasing the probability of achieving improved performance.
CONCLUSION
This study employed the Taguchi design method to calibrate the controllable
factors of a GA used to optimize the parameters of BPN networks designed to
classify data of different types, namely continuous, ordinal and nominal. Overall,
the results show that the calibrated GA yields a significant improvement in
the classification performance of the BPN networks.
Furthermore, the ANOM and ANOVA results not only indicate appropriate level settings for each of the controllable factors in the GA but also indicate which of the controllable factors have the greatest effect on the BPN performance, thereby enabling decisionmakers to identify those factors which need to be carefully controlled.
Tracking the genetic algorithm used in this study increases the search space to raise the possibility of finding the optimal solution. In conclusion, the approach proposed in this study provides a method for developing robust noiseimmune classification models for a variety of data types.