HOME JOURNALS CONTACT

Pakistan Journal of Biological Sciences

Year: 2014 | Volume: 17 | Issue: 2 | Page No.: 266-271
DOI: 10.3923/pjbs.2014.266.271
Ant-cuckoo Colony Optimization for Feature Selection in Digital Mammogram
J.B. Jona and N. Nagaveni

Abstract: Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.

Fulltext PDF Fulltext HTML

How to cite this article
J.B. Jona and N. Nagaveni, 2014. Ant-cuckoo Colony Optimization for Feature Selection in Digital Mammogram. Pakistan Journal of Biological Sciences, 17: 266-271.

Keywords: support vector machine, Cuckoo search, ant colony optimization and ROC curve

INTRODUCTION

Breast cancer is the second leading causes of mortality in women. So far mammography is the only effective screening method for detection of breast cancer in early stage. There exist two errors like false negative and false positive which may arise in interpretation of mammography by the radiologist. To overcome such errors in the interpretation of mammography the researchers developed Computer Aided Diagnosis (CADx) which evaluate or assess mammographic abnormality by automating the segmentation, detection, feature extraction and classification processes.

During the past two decades the focus of researchers falls on the nature inspired metaheuristic algorithms. They concentrated more on the Nature-Inspired Computation (NIC). The NIC are computational algorithms that are mimicked from the natural phenomena and biological models of animals and birds to solve a problem. Many nature inspired algorithms had evolved in the past few decades. The most well-known NIC are the artificial neural networks, Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Artificial Immune System (AIS).

ACO is a population based optimization technique which was first introduced by Marco Dorigo, as Ant System (Dorigo et al., 1996; Dorigo and Gambardella, 1997a). In 1999 it was redefined as Ant Colony Optimization and used first to solve the travelling salesman problem (Dorigo and Stutzle, 2004; Dorigo and Gambardella, 1997b). It is inspired by the behavior of ants in finding shortest paths from the colony to its food. Al-Ani used ACO for feature selection (Al-Ani, 2005). Cuckoo Search (CS) is an optimization technique developed by Xin-She Yang and Suash Deb (Yang and Deb, 2009, 2010; Tuba et al., 2011) which was inspired from the egg laying and breeding behavior of the cuckoo bird. Cuckoo search is simple and takes few parameters compared to other metaheuristic algorithms. They are proved that “Cuckoo search outperforms PSO”. The major disadvantage of ACO is that the local search it performs is not much faster. So Cuckoo search is proposed to carry out the local search of ACO. In this paper a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search are proposed.

AN OVERVIEW OF CUCKOO SEARCH ALGORITHM

Cuckoo Search (CS) is population based metaheuristic optimization technique. The basic behavior of cuckoo is that it lays its eggs in some other bird’s (host bird’s) nest. If the cuckoo’s egg is similar to the host bird’s egg then the egg is not discovered by the host bird. If the egg is recognized by host bird then the egg will be destroyed by the host bird. So the cuckoo mimics the color and pattern of their own eggs to match that of their host. If the cuckoo’s egg hatches first then the cuckoo chick destroy the eggs of the host. By imitating the egg laying behavior of the cuckoo an optimization algorithm has been developed.

Cuckoo optimization algorithm is based on three simple rules stated:

Cuckoo lays only one egg at a time and it dumps its egg in a randomly chosen nest
Only the best nests with high quality eggs will be passed into the next generation
The number of available host nests is fixed. Egg laid by a cuckoo bird is discovered by the host bird with a probability Pd. In this case, the host bird has two options. It can either throw the egg away, or it may abandon the nest and build a brand new nest at nearby location

CUCKOO SEARCH FOR FEATURE SELECTION

In cuckoo search optimization algorithm each egg in a nest represents a solution and a cuckoo egg represents a new solution. The aim of the CS is to use the new and potentially better solutions to be replaced by not-so-good solution in the nests. In the simplest form, each nest has one egg. The host nest is represented by N and the number of cuckoos is referred as S. Each cuckoo’s egg represents the selected feature. For each host nest, the quality of eggs is either 1 or 0, which represent whether the selected feature will carry on to the generation or not.i.e. whether that particular feature is selected or not. Probability that the egg laid by a cuckoo is discovered by the host bird is Pd. A minimum value of Pd leads to discarding the feature and high value of Pd takes that particular feature to the next generation. The fitness of a nest is obtained by evaluation of fitness function fi.

To start the optimization algorithm, a candidate matrix of size SxN is generated. Initialization of the nests and the random initial solution is performed. Evaluate the fitness of the initial solution and get the current best solution. The current best solution performed using Eq. 1:

(1)

where, β denotes Levy distribution parameter, β=1.5 and Γ denotes gamma function. Generate new solution but keep the current best nest. A new solution x(t+1) for cuckoo i is generated using a Lévy flight:

(2)

where, α>0 is the step size, in most cases α = 1 and ^ represents entry-wise multiplications. The algorithm perform random walk using the Levy flight of Mantegna’s algorithm with step size. Step size is calculated using Eq. 3. From the normal distribution:

(3)

Evaluate the new solution to obtain the current best nest. Destroy the nest which has the probability less than Pd and retain remaining nests to the next generation. Perform the new nest generation by retaining the current best nest of the previous generation, evaluate the fitness of each nest, find current best nest and eliminate the nest which has a value less than Pd. All these steps are repeated until it meets the maximum number of generations. The pseudo code of the cuckoo search is given below.

Algorithm 1: CS algorithm

The advantage of the Cuckoo search is that it takes only one parameter in addition to the population size. Cuckoo search is so simple and more efficient than other metaheuristic algorithms. The randomization is more efficient as the step length is heavy-tailed and any large step is possible.

METHODOLOGY

The mammographic images from MiniMIAS database is used in this research. Seventy eight Gray Level Co-occurrence Matrix (GLCM) textural features are extracted. The extracted feature set is reduced by the proposed hybrid Ant-Cuckoo Colony optimization technique. SVM classifier is used along with ACO to classify the normal mammogram from an abnormal mammogram. The performances of all the proposed techniques are compared by Receiver Operating Characteristic curve (ROC).

Feature extraction: Feature extraction is the process of reducing the original mammogram image into a set of features, by measuring certain properties or features that distinguish one input pattern from another pattern. GLCM features are extracted from the mammogram. GLCM features are calculated based on the haralick’s texture feature. The haralick features are namely energy, correlation, inertia, entropy, inverse difference moment, sum average, sum variance, sum entropy, difference average, difference variance, difference entropy, information measure of correlation 1 and information measure of correlation 2 are extracted at four directions (0°, 45°, 90°, 135°). The mean and variance of each of the thirteen haralick feature at four directions are extracted making a total of 78 features (Haralick et al., 1973).

Feature selection using ant-cuckoo colony optimization: ACO is inspired by the foraging behavior of real ants. While walking from food sources to the nest and vice versa, ants deposit a chemical substance called pheromone on the ground. When they decide about a direction to go, they choose probabilistically paths marked by strong pheromone concentrations. This behavior is the basis for a cooperative interaction which leads to the emergence of shortest paths between food sources and their nest. In ACO algorithms, the artificial ants incrementally construct a solution by adding appropriately defined solution components to the current partial solution. Each of the construction steps is a probabilistic decision based on local information, which is represented by the pheromone information.

The pseudo code of the ACO algorithm is shown in Algorithm 2. The algorithm is initialized with the No. of features, No. of iteration, No. of ants, No. of selected features, trail intensity (Ti), evaporation rate, etc., In the first iteration, each ant will randomly choose a feature subset of m features. Only the best k subsets, (k<na), will be used to update the pheromone trial and influence the feature subsets of the next iteration. In the second and following iterations, each ant will start with m-p features that are randomly chosen from the previously selected k-best subsets, where p is an integer that ranges between 1 and m-1. This process of feature selection is a sequential process, which may delay the entire system of feature selection. So the Cuckoo search of Algorithm 1 selects the optimal feature. The feature selected by the cuckoo search is now carried to the next generation. In this way, the features that constitute the best k subsets will have more chance to be present in the subsets of the next iteration. However, it will still be possible for each ant to consider other features as well. For a given ant j, those features are the ones that achieve the best compromise between previous knowledge, i.e., pheromone trails and the current best of the cuckoo search.

The problem of feature selection can be stated as follows: given the feature set, F, of n features, find the feature subset S, which consists of m features where m<n and S⊂F), such that the classification accuracy is maximized. The feature selection problem representation exploited by the artificial ants includes the following:

n features that constitute the original set, F = {f1, …, fn}
na, the number of artificial ants to search through the feature space
Ti, the intensity of pheromone trail associated with feature fi
For each ant j, a list that contains the selected feature subset, Sj = {s1, …, sm}

Algorithm 2: Ant-Cuckoo Colony Optimization algorithm for feature selection

Support vector machine classifier: Support Vector Machine (SVM) is a classification techniques based on statistical learning theory (Bevilacqua et al., 2000; El-Naqa et al., 2002). SVM are Statistical Learning Theory (SLT) problems used for classification. The SVM algorithm constructs a separating hypersurface in the input space by transforming the input space into a high dimensional feature space through some nonlinear mapping chosen a priori (Kernel). It constructs the maximal margin hyperplane in the feature space and the support vectors that lies on this hyperplane. The discriminant function of SVM using the Kernel function is:

(6)

where, xi is the support vector, yi is the classification output (+1 for benign and -1 for malignant), αi and b are quadratic programmimg coefficients. Radial Basis Function Kernel (RBF) is used and it is defined as:

(7)

where γ, is kernel parameter.

EXPERIMENTAL RESULTS AND DISCUSSION

Image database: In this research, mammograms from the Mammographic Image Analysis Society (MIAS), a Mini Mammographic Database is used. Each mammogram image has a spatial resolution of 1024x1024 pixels. This database is chosen since it contains various types of abnormalities such as calcification, well-defined, circumscribed masses, spiculated masses, ill-defined masses, architectural distortion, asymmetry and normal. Each of these abnormalities has been diagnosed and confirmed by a biopsy.

Experimental setup: The experiments implemented in MATLAB. These techniques are experimented on 100 mammogram images with various abnormalities, 50 abnormal images with microcalcification, spiculation, circumscribed and 50 normal mammogram images. Table 1 shows the parameters used in the algorithms. The parameters of the Ant-Cuckoo Colony Optimization algorithm are listed below.

Experimental results: The features selected by ACO, PSO and the proposed Ant-Cuckoo Colony Optimization are listed in Table 2. The 78 GLCM features are considered for the experiment and only the best 5 features are extracted.

Table 1: Parameters of Ant-Cuckoo colony optimization

Table 2: Features selected by ACO, PSO and Ant-Cuckoo colony optimization

With the selected features the testing is performed on the same set of 100 mammograms. The SVM classification results are pictorially depicted in Fig. 1.

Receiver Operating Characteristic (ROC) is a statistical tool used in medical decision making and is graphical plot of Sensitivity Vs 1-Specificity for a classifier. Sensitivity is the ratio of malignant samples which have a positive test result. Specificity is ratio of benign samples which have a negative test result. The True Positive Fraction (TPF) (or True Positive Ratio (TPR) or Sensitivity), False Positive Fraction (FPF) (or False Positive Ratio (FPR) or 1-Specificity) and accuracy are defined as:

(8)

(9)

(10)

Where:
TP = True positives is the No. of mammograms correctly classified as abnormal mammogram
FP = False positive is the No. of mammograms incorrectly classified as abnormal mammogram
TN = True negative is the No. of mammograms correctly classified as normal mammogram
FN = False negatives is the No. of mammograms incorrectly classified as normal mammogram

The confusion matrix is constructed from the obtained results is shown in Table 3. The accuracy of the hybrid technique is shows in Table 4. The results are compared with the traditional ACO and PSO. Figure 2 plots the excel graph to compare the performances of the algorithms.

Fig. 1: Result of Ant-Cuckoo Colony Optimization-SVM

Fig. 2: Performance measure of the proposed techniques

Table 3: Confusion matrix

Table 4: Performance of the proposed techniques

ROC curve plotted with TPR against FPR in Fig. 3 compares the accuracy percentage of all the three classifiers.

From Table 2, it is inferred that the features correlation and sum variance are selected by all the techniques. Energy, inertia and information measures of correlation 2 are the features selected by atmost two of the techniques. The overall performance of the hybrid techniques is better than both PSO and ACO.

Fig. 3: ROC curves of ACO, PSO and Ant-Cuckoo colony optimization

Ant-Cuckoo Colony Optimization shows 2% better accuracy than ACO and 4% better accuracy than PSO.

CONCLUSION

This study presents the hybrid of Ant-Cuckoo colony optimization to select the best feature of digital mammogram. The local search of ACO, stimulated by the cuckoo search works well and gives an accuracy of 95%. The SVM classifier is the added feature of the proposed system. The SVM classifier plays dual role; it goes along with the Ant-Cuckoo Colony optimization and also used in testing the performance of total system. The hybrid algorithm shows promising accuracy and is better than the ACO and PSO algorithm.

REFERENCES

  • Al-Ani, A., 2005. Feature subset selection using ant colony optimization. Int. J. Comput. Intell., 2: 53-58.


  • Bevilacqua, A., D. Bollini, R. Brancaccio, R. Campanini and N. Lanconelli, 2000. Automatic detection of clustered microcalcifications in digital mammograms using an SVM classifier. Proceedings of the European Symposium on Artificial Neural Networks, April 26-28, 2000, DBLP., Bruges, Belgium, pp: 195-200.


  • El-Naqa, I., Y. Yang, M.N. Wernick, Nikolas and P. Galatsanos et al., 2002. A support vector machine approach for detection of microcalcifications. IEEE Trans. Med. Imaging, 21: 1552-1563.


  • Dorigo, M. and L.M. Gambardella, 1997. Ant colonies for the traveling salesman problem. Bio Systems, 43: 73-81.
    PubMed    Direct Link    


  • Dorigo, M. and L.M. Gambardella, 1997. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE. Trans. Evol. Comput., 1: 53-66.
    CrossRef    


  • Dorigo, M. and T. Stutzle, 2004. Ant Colony Optimization. MIT Press, Cambridge, MA., USA


  • Dorigo, M., V. Maniezzo and A. Colorni, 1996. Ant system: Optimization by a colony of cooperating agents. IEEE Trans. Syst. Man Cybern. Part B: Cybern., 26: 29-41.
    CrossRef    Direct Link    


  • Tuba, M., M. Subotic and N. Stanarevic, 2011. Modified cuckoo search algorithm for unconstrained optimization problems. Proceedings of the 5th European Conference on European Computing, April 28-30, 2011, Paris, France, pp: 263-268.


  • Haralick, R.M., K. Shanmugam and I.H. Dinstein, 1973. Textural features for image classification. IEEE Trans. Syst. Man Cybern., SMC-3: 610-621.
    CrossRef    Direct Link    


  • Yang, X.S. and S. Deb, 2009. Cuckoo search via Levy flights. Proceedings of the World Congress on Nature and Biologically Inspired Computing, December 9-11, 2009, Coimbatore, India, pp: 210-214.


  • Yang, X.S. and S. Deb, 2010. Engineering optimisation by cuckoo search. Int. J. Math. Modell. Numer. Optim., 1: 330-343.
    Direct Link    

  • © Science Alert. All Rights Reserved