Artificial Neural Networks (ANNs) are increasingly used in problem domains
involving classification. They are adept at finding commonalities in a set of
seemingly unrelated data and for this reason are used in a growing number of
classification tasks. The, one of their major drawbacks is that their knowledge
is not easily interpretable by a human, the main challenge to the use of supervised
neural networks in data mining applications is to get explicit knowledge from
these models. Generally speaking, there are two types of approaches to extracting
rules from multi-layer networks: local and global. In the global methods (Markowska-Kaczmar,
2008; Markowska-Kaczmar and Mularczyk, 2006; Thrun,
1995), the neural network is treated as a black box, a set of global rules
characterize the output classes directly in terms of the inputs (these), the
decompositionnel or local approaches, methods go into details of the neural
networks structure, describing each neuron separately in terms of rules, followed
by concatenation of them (Kamruzzaman and Monirul Islam,
2010; Taha and Ghosh, 1999; Andrews
and Geva, 1994; Towell and Shavlik, 1993; McMillan
et al., 1991). Extracting rules from complex ANNs may therefore be
intractable (Zhou et al., 2003). In this study
we propose a novel, evolutionary global approach which integrates traditional
ANNs with genetic algorithms for extracting simple, intelligible and useful
rules from trained ANNs. The genetic algorithm uses chromosomes which can be
mapped directly onto intelligible rules (phenotypes). In brief, the study proposes
the use of a genetic algorithm to search the best rules for classification which
represents the knowledge in the trained neural networks.
GenRGA (generation of rules by genetic algorithm) system is composed of three
(03) main parts: genetic module, neural networks module and rules simplification
module. The first stage of the explanation is to create a list of initial rules,
The initial population of rules is chosen randomly from the truth table of n
variables, where n is the number of attributes. An initial population consists
of different types of genotypes, each one of them codifying the rules. This
population is passed to neural network module for evaluation in order to determine
the fitness of each individual. Afterwards, this group is made to evolve repeatedly
by means of different genetic operators (selection, crossover, mutation) until
a determined termination criteria is satisfied (for example, a sufficiently
good individual is obtained, or that a predetermined maximum number of generations
is reached). The obtained rules are simplified by module of simplifying the
rules based on the exact method of QuineMccluskey algorithm. The general idea
of has been presented in Fig. 1.
Genetic module: The extraction algorithm is based on the Pitt approach which means that every individual encodes a entire sets of rules.
Individual representation: The chromosome consists of a vector of rules.
So, the length of the vector is equal to the number of rules (Fig.
2). The length of the chromosome is initially fixed and can be changed by
user. The premises can take different values as follows:
||-1 means that the attribute is not involved in the rule
||0 means that the attribute is written not (x) in the rule generated
||1 means that the attribute is written as (x) in the rule generated
Genetic operators: Two genetic operators are used: crossover and mutations.
Crossover: it generates new chromosomes by exchanging the whole sets of membership
functions for a randomly selected attribute of the parent chromosomes. The mutation
consists just of randomly choosing a character in a string and change it . Mutation
is more complicated, since it must allow modifying the contents of a given set
by adding or removing premises (attributes) of rules and exclude them from the
|| GenRGA architecture
|| Chromosome representation in GenRGA
The next generation is created from the current population by using crossover
and mutation. The individuals from the current generation are chosen on the
basis of a roulette wheel.
Fitness function: In this study, two fitness measures are used: fidelity
and comprehensibility. Fidelity is calculated for each individual passed in
the neural network for classification, the percentage of the good answers is
the value of fidelity associated to the individual. Comprehensibility is calculated
for each rule as follows: (Markowska-Kaczmar and Chumieja,
If the value compr is close to 1, we say this rule has a high comprehensibility.
In each generation, the fitness of every individual in the population is evaluated
and the best individual moves to the next generation. After many generations
this process should tend to produce individuals with increasing fitness.
Neural networks module: Neural networks are regarded commonly as black
boxes but can be used to provide simple and accurate sets of logical rules.
The neural network we used is the multilayer perceptron, During phase of the
evaluation of rules, the attributes whose value is equal to -1 (not involved
in the rule ) are omitted and therefore are not included by retro-propagation
algorithm. We changed the standard backprobagation algorithm (developed by Rumelhart
Hinton, Williams (Rumelhart et al., 1986), so
that inactive attributes (which are not involved in the rule, value = -1) are
omitted in the calculation (Fig. 3). So, we gain in computational
Simplification rules: The rules obtained by the genetic module are simplified
by the method of Quine-Mc-cluskey (McCluskey and Schorr,
1956) which reduces the number of rules and premises in the rule and removes
rules that do not add new information (Yedjour et al.,
|| A modified backpropagation algorithm
MC CLUSKEY METHOD FOR SIMPLIFYING RULES
Theoretical background of Quine-McCluskey algorithm: Quine-McCluskey
Algorithm (QMC) is a method used for minimization of Boolean functions. This
involves two main stage: 1) Find all the prime implicants: The first step is
to express the function as a sum of minterms, to group minterms by the number
of 1s, to represent each minterm by its minterm number in binary form,
finally Eliminate as many attributes as possible from each term by applying
xy+xy' =‘x’. The resulting terms are called prime implicants. if G is number
of group then this stage ends after (G-1) iterations. 2) Find all essential
prime implicants: the table with the minterms in the top row and the reduced
terms (the prime implicants) in the left column was built. The first step is
to scan the columns for minterms that are covered by only a single unique prime
implicant. The column is called distinguished column and the row in which the‘x’ occurs is called essential row. The second step is to determine
the distinguished columns and essential rows (if any exists). The third step
is to draw a line through each column which contains a‘x’ in any
of the essential rows, since inclusion of the essential rows in the solution
will guarantee that these columns contain at least one‘x’. for the
rest of the minterms that are yet to have a‘x’, choose PIs as economically
as possible to cover the remaining minterms. Finally, write out the final solutions
as a set of PIs (The detail of the QM algorithm can be found by McCluskey
and Schorr (1956).
|| Modified QMC algorithm (stage1)
|| Algorithm of optimisation of rules
|| Relationship with algorithm QMC and rule simplification module
The modified Quine-Mccluskey algorithm: The computational and spatial complexity of QMC algorithm is exponential and can quickly become unreasonable for complex formulas involving many variables. It is guaranteed to find a solution with the minimum number of terms. Thats why its it is preferable to used as a post-processing of the solution generated by the genetic algorithm. A set of rules generated by the genetic algorithm could be expressed as combinations (or strings) of ones and zeros and -1 of the corresponding inputs, an attribute in true-form is denoted by 1, in inverted-form by 0 and the absence attribute is represented by (-1). Unlike the method of QMC which works with minterms, our method can start by any term (Table 1). This reduces the runtime complexity of the algorithm.
The second stage of QMC algorithm is modified as follows: each minterms is replaced by training samples and prime implicants by rules generated by above algorithm (Table 3). The idea underlying this step is to identify dominant rules. Table 2 shows how to obtain the final set of rules as small as possible in the whole population, rules covers the most of the training samples is chosen. Table 3 shows the correspondence with QMC algorithm and rule simplification module.
EXPERIMENTS AND RESULTS
This section evaluates GenRGA in the context of 03 problem domains: breast cancer, Iris and diabetes databases. The results are compared with those obtained by other methods.
Wisconsin breast cancer (uci repository): This database contains 699 samples belonging to 02 classes (458 are benign and 241 are malignant). Each pattern is described by nine attributes; each one takes an ordinal integer value from 1 to 10. Half of each class is randomly selected for training and the remaining set is used for testing.
Iris basis (uci repository): It consists of 150 iris divided in similar proportions in 03 classes: setosa, versicolor and virginica, each flower is described by 4 quantitative variables measuring the length and width of the sepal and the length and width of its petals. Table 4 shows the results for the data sets.
Pima indian diabetes database: This data set was obtained from the UCI Repository of Machine Learning Databases. The data set was selected from a larger data set held by the National Institutes of Diabetes and Digestive and Kidney Diseases. All patients in this database are Pima-Indian women at least 21 years old and living near Phoenix, Arizona, USA. The binary response variable takes the values 0 or 1, where 1 means a positive test for diabetes and 0 is a negative test for diabetes. There are 268 (34.9%) cases in class 1 and 500 (65.1%) cases in class 0. There are eight clinical findings: 1. Number of times pregnant 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3. Diastolic blood pressure (mm Hg) 4. Triceps skin fold thickness (mm) 5. 2-Hour serum insulin (mu U/ml) 6. Body mass index 7. Diabetes pedigree function 8. Age (years).
If the original attribute values are not binary, then a binarization process
(2) will be applied to non-binary data (Taha and Ghosh, 1999):
where, xi is the value of the attribute Xi, ui is the
average value of Xi and yi is the corresponding binary value.
The parameters of the genetic algorithm play an important role on the networks
convergence. Several experiments are conducted to achieve the optimum values.
For example, for each value of the probability of mutation, we calculate the
accuracy and comprehensibility of the rules extracted, Fig. 4
shows the average results obtained by fifteen 15 experiences after 500 generations.
||Influence of parameter on the accuracy and comprehensibility
The following parameters are obtained in the same way: probability of mutation
= 0.2, the probability of crossover = 0.8, population size = 30, the number
of individuals = 15 rules.
The initial population is created on the basis of the table of truth, containing all possible combinations of inputs. The method used in selection is the roulette wheel. The next generation is created from the current population by using crossover and mutation. For mutation, the gene has a high chance (2 / 3) to switch from the value "1" or "0" value to "-1", it can also switch from "-1" to "1" or "0" with a probability of 1 / 3. The chromosomes with a higher fitness value survive and participate in the creation of new populations. The population continuously evolves toward better fitness and the algorithm converges to the best chromosome after several generations. The modified algorithm of QMC allows a horizontal simplification (fewer rules) and vertical simplification (reduction of attributes in the resulting rule), so the comprehensibility grows significantly. the finally phase of modified QMC, eliminate the useless rules and keeps only those rules that cover the majority of data sets.
The final set of rules is given by:
|Breast cancer database
|| Performance comparison between Nn and Rule sets
|| Comparison of GenRGA with Gex and full-Re
|| Comparison of GenRGA with Gex and Santos methods
||Correct classification (%) of the rule sets extracted by different
||Performance comparison of GenRGA with other algorithms for
Table 4 shows number of extracted rules and rules accuracy for three benchmark problems. In most of the cases GenRGA produces fewer rules with better accuracy. The aim of experiments was to test the quality of the rule extraction made by GenRGA for data sets with different types of attributes.
COMPARING WITH RELATED WORK
Table 5 and 6 show the performance of the
extracted rules on the test and the training database for different techniques.
The distribution of samples in each base plays an important role on the performance
(Markowska-Kaczmar and Chumieja, 2004).
Table 7 compares the classification rates obtained using the rules extracted by four techniques (Bio-Re, Partial-Re, Full-Re, Neurorule, C4.5rules and GEX). The rules extracted by GenRGA are more comprehensible than those extracted by the other techniques. The results presented in this paper indicate that GenRGA is able to extract a set of rules of better performance from trained networks.
Table 8 compares GenRGA results of diabetes data with those produced by REANN, NN RULES, C4.5, NN-C4.5, OC1 and CART algorithms. GenRGA achieved 91,% accuracy although REANN was closest second with 76.56% accuracy. Due to the high noise level, the diabetes problem is one of the most challenging problems in our experiments. GenRGA has outperformed all other algorithms.
A new global method of rule extraction from neural networks called GenRGA, has been presented in the study. The purpose of GenRGA is to produce a set of rules that describe a networks performance with the highest fidelity possible, taking into account its ability to generalize, in a comprehensible way. The approach combines both metaheuristics (genetic algorithms) and exact algorithms (Quine-Mc-cluskey) to extract the binary rules of the form if-then). The computational results have shown that the performance of the rules extracted by GenRGA, is very high. With the rules extracted by the method introduced here, ANNs should no longer be regarded as black boxes. Finally the current paper may lead to future work in data mining in general and in rule extraction, in particular these rules may be used as an initial theory for a similar problem. In the near future, we want applied GenRGA to continuous data.