INTRODUCTION
Data mining uses a variety of data analysis tools to discover patterns
and relationships of physical variables in different data sets. This technique
is used in multidisciplinary fields. Clustering is one of many powerful
data mining techniques that can discover intentional structures in data
(Kao et al., 2008). It groups instances which have similar features
and builds a concept hierarchy. As a result, it often extracts new knowledge
from a database. Because of the ability to find the intentional descriptions,
it is one of the very important techniques in managing databases (Lu et
al., 2008). Clustering problems arise in many different applications
such as knowledge discovery, data compression and vector quantization
and pattern recognition and classification. There are many methods in
clustering. The existing methods can be roughly divided into hierarchical,
iterative and optimization methods. Iterative methods are usually based
on the Kmeans algorithm, which applies two optimization steps iteratively
(Amir and Lipika, 2007): (i) calculate optimal partitions for the given
codebook and (ii) calculate a new codebook as the cluster centroids. Kmeans
has two shortcomings: dependency on the initial state and convergence
to local optima and also global solutions of large problems cannot find
with reasonable amount of computation effort (Fathian et al., 2007).
In order to overcome local optima problem lots of studies have done in
clustering. In these studies, researchers have used metaheuristic algorithms
like genetic algorithm, simulated annealing and tabu search. Over the
last decade, modeling the behavior of social insects, such as ants and
bees, for the purpose of search and problem solving has been the context
of the emerging area of swarm intelligence.
The ant colony optimization (ACO) paradigm was first proposed by Dorigo
(2006). Since then, a number of studies based on ant colony algorithm
have been carried out in order to solve some optimization problems such
as the traveling salesman problem, unit commitment, etc. (Ho et al.,
2006; Huang, 2001; Merkle et al., 2002; Kwang and Sun, 2003); however,
in these studies, the search domains are in discrete form. In addition,
a number of methods based on ant colony have been presented to solve the
optimization problem in both continuous and discrete search domains such
as economic dispatch, reactive power pricing in restructured networks
(Niknam et al., 2005a, b). The main problem associated with these
methods is that the time of convergence increases drastically when the
number of variables is increased. In this study, a new hybrid algorithm
based on the ant colony algorithm and simulated annealing is presented
to solve optimization problems in continuous and discrete domains simultaneously.
Since the SA is a good local search algorithm, in the proposed hybrid
algorithm we use SA as a local search in ACO. The proposed hybrid algorithm
is applied for optimal clustering N object into K clusters. The simulation
results show that this algorithm not only has a better response but also
converges more quickly than ordinary evolutionary methods like ACO, SA
and Kmeans.
CLUSTER ANALYSIS PROBLEM
Cluster analysis is an effective tool in scientific or managerial
inquiry. The kmeans clustering method is one of the simplest unsupervised
learning algorithms for solving the wellknown clustering problem. The
goal is to divide the data points in a data set into K clusters fixed
a priori such that some metric relative to the centroids of the clusters
(called the fitness function) is minimized. The algorithm consists of
two stages: an initial stage and an iterative stage. The initial stage
involves defining K initial centroids, one for each cluster. These centroids
must be selected carefully because of differing initial centroids causes
differing results. One policy for selecting the initial centroids is to
place them as far as possible from each other. The second, iterative stage
repeats the assignment of each data point to the nearest centroid and
K new centroids are recalculated according to the new assignments. This
iteration stops when a certain criterion is met, for example, when there
is no further change in the assignment of the data points. Given a set
of n data samples, suppose that we want to classify the data into K groups,
the algorithm aims to minimize a fitness function, such as a squared error
function defined as:
where, x_{i}^{(j)}c_{j}^{2} is a
chosen distance measure between the i^{th} data point of the data
sample x_{i}^{(j)} (which was classified into the j^{th}
group) and the j^{th} cluster center c_{j}, 1≤ i≤
n, 1≤= j≤K and is an indicator of the distance of the n data samples
from their respective cluster centroids. The kmeans clustering algorithm
is summarized in the following steps (Fathian et al., 2007):
Step 1: Place K points into the space represented by the objects
that are being clustered. These points represent the initial group centroids.
Step 2: Assign each object to the group that has the closest centroid.
Step 3: When all objects have been assigned, recalculate the positions
of the K centroids.
Step 4: Repeat steps 2 and 3 until a certain criterion is met,
such as the centroids no longer moving or a preset number of iterations
have been performed.

Fig. 1: 
An example of finding the shortest path by ants 
This results in the separation of objects into groups
for which the score of the fitness function is minimized.
Ant colony algorithm: Ants are insects which live together. Since
they are blind animals, they find the shortest path from their nest to
food with the aid of pheromone. Pheromone is the chemical material deposited
by the ants, which serves as critical communication medium among ants,
thereby guiding the determination of the next movement. On the other hand,
ants find the shortest path based on intensity of pheromone deposited
on different paths 1523. Assume that ants want to move from point A to
B (Fig. 1).
At first, if there is no obstacle, all of them will move along the straight
path (AB) (Fig. 1a). At the next stage, assume that
there is an obstacle; in this case, ants will not be able to follow the
original trail in their movement. Therefore, randomly, they turn to the
left (ACB) or to the right (ADB) (Fig. 1b). Since ADB
path is shorter than ACB, the intensity of pheromone deposited on ADB
is more than the other path. So ants will be increasingly guided to move
on the shorter path (Fig. 1c). This behavior forms the
fundamental paradigm of the ant colony system.
As was indicated in Fig. 1, the intensity of deposited
pheromone is one of the most important factors for ants to find the shortest
path. Generally, the intensity of pheromone and path length are two important
factors that should be used to simulate the ant system. To select the
next path, the state transition probability is defined as follows:
where, and L_{ij} are the intensity of pheromone
and the length of path between nodes j and i, respectively. γ_{1}
and γ_{2} are the control parameters for determining the
weight of trail intensity and length of path, respectively. N is the number
of ants.
After selecting the next path, the trail intensity of pheromone is updated
as:
In the above equation, ρ is a coefficient such that (1ρ) represents
the evaporation of the trail between time k and k+1 and is the amount of pheromone trail added to by ants.
To apply the ACO algorithm for clustering, the following steps have to
be taken:
Step 1: Generate the initial population and trail intensity
Step 2: Generate the initial population and trail intensity for
ants in each colony (local search)
Step 3: Determine the next position
Step 4: Check the convergence condition
Simulated annealing algorithm: Simulated annealing is a generalization
of a Monte Carlo method for examining the equations of state and frozen
states of nbody systems. The concept is based on the manner in which
liquids freeze or metals recrystalize in the process of annealing. In
an annealing process, a melt is initially at high temperature and disordered
and is slowly cooled so that the system at any time is approximately in
thermodynamic equilibrium. As cooling proceeds, the system becomes more
ordered and approaches a frozen ground state at zero temperature. Hence,
the process can be thought of as an adiabatic approach to the lowest energy
state. If the initial temperature of the system is too low, the system
may become quenched (TaiHsi et al., 2008).
The original Metropolis scheme was that an initial state of a thermodynamic
system was chosen at energy E and temperature T. Holding T constant, the
initial configuration is perturbed and the change in energy, ΔE,
is computed. If the change in energy is negative, the new configuration
is accepted. If the change in energy is positive, it is accepted with
a probability factor exp(ΔE/T). This process is then repeated sufficient
times to give good sampling statistics for the current temperature and
then the temperature is decremented and the entire process repeated until
a frozen state is achieved at T = 0.
The general procedure for the SA algorithm can be summarized as follows
(Christober et al., 2007):
Step 1: Select an initial solution X and an initial temperature
T.
Step 2: Find another solution, namely X_{next}, by modifying
the last answer X.
Step 3: Calculate the energy differential ΔE = f(X_{next})f(X)
Step 4: If ΔE< 0 go to Step 9.
Step 5: Generate a random number, namely R, between 0 and 1.
Step 6: If R<exp(ΔE/T) go to Step 9.
Step 7: Repeat Steps 26 for a number of optimization steps for
the given temperature.
Step 8: If no new solution, X_{next} is accepted and go
to Step 10.
Step 9: Decrease the temperature T, replace X with X_{next}
and go to Step 2.
Step 10: Reheat the environment with setting T to a higher value.
Step 11: Repeat Steps 1 through 10 until no further improvement
obtained.
APPLICATION OF ACOSA TO CLUSTERING
This section presents a new hybrid approach based on combining the
ant colony optimization and simulated annealing algorithms for the clustering
problem. To apply the hybrid algorithm, named ACOSA, on the clustering
the following steps should be repeated:
Step 1: Generate an initial population: An initial population
is randomly generated as follows:
where, Center_{j} is the j^{th} cluster center for the
i^{th} individual. X_{i} is position of the i^{th}
individual. d is the dimension of each cluster center. X_{i}^{max}
and x_{i}^{min} (each feature of center) are the maximum
and minimum value of each point belonging to the j^{th} cluster
center, respectively.
Step 2: Generate an initial trail intensity: At initialization
phase, it is assumed that the trail intensity between each pair of ants
is the same and is generated as follows:
where, are trail intensity
between the i^{th} and j^{th} ants and initial trail intensity,
respectively.
Step 3: Determination of the next path: To determine the next
path, at first, for each X_{i} a set A_{i} is defined
by:
where D(k) is visibility. . is defined by:
For determination of the next path, there are 2 cases:
Case 1: A_{i} is empty: In this case, the best solution
(Y) is determined in the neighborhood of X_{i} by SA.
The neighborhood is defined as follows:
where α is a positive parameter between 0 and 1.
The neighborhood of X_{i }is determined by simulated annealing
as follows:
Step 1: Select an initial solution X and an initial temperature
T.
Step 2: Find another solution, namely X_{next}, by modifying
the last answer X.
Step 3: Calculate the energy differential ΔE = f(X_{next})f(X)
Step 4: If ΔE< 0 go to step 9.
Step 5: Generate a random number, namely R, between 0 and 1.
Step 6: If R<exp(ΔE/T) go to step 9.
Step 7: Repeat steps 26 for a number of optimization steps for
the given temperature.
Step 8: If no new solution, X_{next} is accepted and
go to step 10.
Step 9: Decrease the temperature T, replace X with X_{next}
and go to step 2.
Step 10: Reheat the environment with setting T to a higher value.
Step 11: Repeat Steps 1 through 10 until no further improvement
obtained.
After that SA finds the best local solution (Y), X_{i} is replaced
by Y and then the trail intensity matrix is updated as follows:
where r is a positive parameter between 0 and 1.
Case 2: A_{i} is not empty: In this case, selection of
the next path is based on Eq. 2. Since in some optimization problems,
L_{ij} is not known, we can define the inverse of L_{ij}
as follows:
In this algorithm, we consider X_{i} as a selection. In this
case φ_{ii} and are defined by:
where m is the number of elements of A_{i}.
The transition probabilities are defined as:
where γ_{1} and γ_{2} are the control parameters.
The roulette wheel is used for the stochastic selection. If the selection
result is P_{ij}, then:

Fig. 2: 
Flowchart of the hybrid algorithm 
If the selection result is P_{0}, the best solution (Y) is found
in the neighborhood of X_{i} by simulated annealing as mentioned
in case 1(A_{i} is empty). After that X_{i} is replaced
by Y and then the trail intensity matrix is updated as follows:
Step 4: Check of the convergence condition: After all ant colonies
(X_{i}) find the next path, the convergence condition is checked
by:
If the convergence condition is not satisfied, cancel the new results
and then go to step 3. If all ants are guided to one path after NN iterations,
stop and print the results. If the results are not changed after NNN iterations,
increase the visibility (D(k)) and neighborhood of search (α). Here
NN, NNN are coefficients. Figure 2 shows the flowchart
of the hybrid algorithm.
RESULTS AND DISCUSSION
Experimental results comparing the ACOSA clustering algorithm with
several typical stochastic algorithms including the ACO, SA and Kmeans
algorithms are provided for six reallife data sets (Iris, Wine, Vowel,
Contraceptive Method Choice(CMC), Wisconsin breast cancer and Ripley`s
glass), which are described as follows: Iris data (N = 150, d = 4, K =
3): This is the iris data set. These data set with 150 random samples
of flowers from the iris species setosa, versicolor and virginica collected
by Anderson. From each species there are 50 observations for sepal length,
sepal width, petal length and petal width in cm. This data set was used
by Fisher in his initiation of the lineardiscriminantfunction technique.
Wine data (N = 178, d = 13, K = 3): This is the wine data set,
which also taken from MCI laboratory. These data are the results of a
chemical analysis of wines grown in the same region in Italy but derived
from three different cultivars. The analysis determined the quantities
of 13 constituents found in each of the three types of wines. There are
178 instances with 13 numeric attributes in wine data set. All attributes
are continuous. There is no missing attribute value.
Contraceptive method choice (N = 1473, d = 10, K = 3): This data
set is a subset of the 1987 National Indonesia Contraceptive Prevalence
Survey. The samples are married women who were either not pregnant or
do not know if they were at the time of interview. The problem is to predict
the current contraceptive method choice (no use, longterm methods, or
shortterm methods) of a woman based on her demographic and socioeconomic
characteristics.
Vowel data set (N = 871, d = 3, K= 6): This data set consists
of 871 patterns. There are six overlapping vowel classes and three input
features.
Wisconsin breast cancer (N = 683, d = 9, K = 2), which consists of 683
objects characterized by nine features: clump thickness, cell size uniformity,
cell shape uniformity, marginal adhesion, single epithelial cell size,
bare nuclei, bland chromatin, normal nucleoli and mitoses. There are two
categories in the data: malignant (444 objects) and benign (239 objects).
Ripley`s glass (N = 214, d = 9, K = 6), for which data were sampled from
six different types of glass: building windows float processed (70 objects),
building windows nonfloat processed (76 objects), vehicle windows float
processed (17 objects), containers (13 objects), tableware (9 objects)
and headlamps (29 objects), each with nine features, which are refractive
index, sodium, magnesium, aluminum, silicon, potassium, calcium, barium
and iron.
The algorithms are implemented by using Matlab 7.1 on a Pentium IV, 2.8
GHz, 512 GB RAM computer. The parameters required for implementation of
the ACOSA algorithm are γ_{1}, γ_{2}, ρ,
D_{0} and T. In this study, the best values for the aforementioned
parameters are γ_{1} = γ_{2} = 1.0, ρ =
0.99, D_{0} = 10 and T = 100 determined by 10 runs of the algorithm.
Table 16 present a comparison among the results of ACOSA,
ACO (Fathian et al., 2007; Shelokar et al., 2004), SA (Fathian
et al., 2007) and Kmeans (Kao et al., 2008) for 100 random tails
on the mentioned data sets.
For Iris data (Table 1) it is found that the ACOSA
clustering algorithm provides the optimal value of 96.6602, most of the
times. The ACO, SA and Kmeans algorithms attain 96.753, 97.4573 and 97.333
respectively. The standard deviation of the fitness function for ACOSA
clustering algorithm is 0.12196, which it significantly is smaller than
other methods. The number of function evaluations of ACOSA, ACO, SA and
Kmeans are 3,629, 4,931, 5314 and 120, respectively. The best value obtained
by ACOSA clustering algorithm for Wine data (Table 2)
is 16,298.628 (which is obtained in 90% of the total runs). The ACO, SA
and Kmeans algorithms, on the other hand, fail to attain this value in
any of their runs. From Table 3 for CMC data, it is
found that the ACOSA was able to find the optimum in most of runs. The
ACOSA clustering algorithm attains 5,696.6075 that it is better than
the best value of ACO, SA and Kmeans algorithms (5,701.923, 5,849.038
and 5,842.20). For the vowel data set (Table 4), the
best global solution, the worst global solution, the average and the standard
deviation of the ACOSA are 148,978.6294, 149,183.1834, 149,057.8251and
75.539868, respectively. In terms of the number of function evaluations,
Kmeans needs the least number of function evaluations. The results for
Wisconsin breast cancer data are shown in Table 5. The
best value that the ACOSA attains is 2,965.97 and the worst is 2,969.01,
both of which are better than those obtained by ACO, SA and Kmeans algorithms.
For Ripley`s glass Data, the ACOSA algorithm attains the best value of
199.89. The results of other algorithms are inferior to that of ACOSA.
The number of function evaluations of ACOSA is less than ACO and SA but
it is greater than Kmeans algorithm.
Table 1: 
Results obtained by the five algorithms for 100 different
runs on Iris data 

Table 2: 
Results obtained by the five algorithms for 100 different
runs on Wine data 

Table 3: 
Results obtained by the five algorithms for 100 different
runs on CMC data 

Table 4: 
Results obtained by the five algorithms for 100 different
runs on Vowel data 

Table 5: 
Results obtained by the five algorithms for 100 different
runs on Wisconsin breast cancer 

Table 6: 
Results obtained by the five algorithms for 100 different
runs on Ripley`s glass 

The simulation results in the Table 6 demonstrate that
the proposed hybrid evolutionary algorithm converges to global optimum
with a smaller standard deviation and less function evaluations and leads
naturally to the conclusion that the ACOSA algorithm is a viable and
robust technique for data clustering.
CONCLUSION
In this study a new hybrid evolutionary algorithm is proposed to
find optimal or near optimal solutions for clustering problems of allocating
N objects to k clusters. The proposed algorithm is based on a combination of ACO and SA algorithms.
The ACOSA algorithm is tested on several well known real life datasets.
The experimental results indicate that the proposed optimization algorithm
is at least comparable to the other algorithms in terms of function evaluations
and standard deviations. Regardless of robustness and efficiency of ACOSA
algorithm, it is applicable when the number of clusters is known a priori.
The results illustrate that the proposed ACOSA optimization algorithm
can be considered as a viable and an efficient heuristic.