HOME JOURNALS CONTACT

Information Technology Journal

Year: 2012 | Volume: 11 | Issue: 1 | Page No.: 134-140
DOI: 10.3923/itj.2012.134.140
A Novel Pattern Recognition Approach Based on Immunology
Shulin Liu, Yinghui Liu, Youfu Tang and Ruihong Jiang

Abstract: Artificial Immune System (AIS) inspired from Biological Immune System was widely used in many fields. In this research work, a novel pattern recognition approach was proposed based on AIS. In process of antigen epitopes recognition in antibody, it was actually combination between corresponding chemical bonds. The recognition rate was correlated positively with combination forces of chemical bonds. The pattern recognition algorithm was tested using famous benchmark Fisher’s Iris data and wine data. Preliminary results demonstrated that the new approach had better performances comparing to other pattern recognition methods.

Fulltext PDF Fulltext HTML

How to cite this article
Shulin Liu, Yinghui Liu, Youfu Tang and Ruihong Jiang, 2012. A Novel Pattern Recognition Approach Based on Immunology. Information Technology Journal, 11: 134-140.

Keywords: pattern recognition, recognition algorithm, Artificial immune system, local binding energy and local binding sites

INTRODUCTION

In recently years, researchers in many fields have paid considerable attention to Artificial Immune System (AIS) which inspired from immune system (Timmis et al., 2008). Metaphor of Immune System was firstly applied in the area of fault diagnosis by Ishida (1990). the famous Negative Selection Algorithm (NSA) applying in the field of computer security and virus detection proposed by Forest et al. (1994) which seemed to set off a renaissance in further investigation of immune system. Most existing AIS algorithms mimic one of the following metaphors of the immune system: Negative selection, immune network, clonal selection, danger theory, etc. Ji and Dasgupta (2009) proposed detectors with variable-sized and variable-shaped which reduced false alarm rate as well as raised coverage in non-self space. Inspired from immune network, MOBAIS and GAIS which applied in multi-objective optimization problems were proposed (Castro and Von Zuben, 2008, 2010). Recently, some new immune theories are proposed including Pattern Recognition Receptors (PRRs) Model and Danger Theory. Conserved Self Pattern Recognition Algorithm (CSPRA) is proposed inspired by the PRRs Model, by which false positive error rate is reduced greatly (Yu and Dasgupta, 2009). Danger Theory solved some phenomenon which Immune Negative Selection (INS) and Self-Nonself Selection (SNS) can’t explain (Matzinger, 1994). Inspired from Danger Theory, the DCA is proposed and it has been evaluated by the KDD Cup’s 99 which is referred by Muda et al. (2011) and the experimental result is convincing (Greensmith et al., 2010). Yap et al. (2011) proposed Hybrid AIS (HAIS) combining the good features of AIS and Particle Swarm Optimization (PSO), which solved the rate of convergence and the local minima. The immune based approach has also been used in finding pure Nash equilibrium and mixed Nash equilibrium (Cheheltani and Ebadzadeh, 2010).

Pattern recognition has caused extensive concern in many fields. In language pattern recognition or classification, Artificial Neural Network is widely used (Lotfi et al., 2006; Khanale, 2010; Khanale and Chitnis, 2011). Jalil et al. (2003) proposed an unsupervised learning algorithm in feature extracting, which is a basic of pattern recognition. Al-Bashish et al. (2011) proposed a Neural-networks -based method to plant leaf diseases classification. Al-Daoud (2009) has done some comparisons between three Neural Network models for classification problems.

These days, attention has been paid in combination metaphors between antigen and antibody. Relationships between paratopes of antibody and epitopes of antigen are considered by Ji-zhong and Bo (2005). According to calculate sub-affinity, high affinity antibodies can be obtained more rapidly. In process of epitopes of antigen recognition in antibody, it is actually combination between corresponding chemical bonds. Chemical bonds include hydrogen bonds, electrostatic bonds, Van der Waals forces and hydrophobic bonds. The combination forces of chemical bonds influence the recognition rate. Based on those metaphors, antibody and epitopes of antigen are considered to be feature vectors, chemical bonds are attributes. The higher combination forces of chemical bonds, the better recognition of antigen epitopes. The main purpose of this paper is to enhance the recognition rate according to secondary recognition by evaluating combination forces of chemical bonds. Preliminary results demonstrate the new approach has better performances comparing to other methods (Srinivasa et al., 2007; Chang and Lilly, 2004) under the same experimental condition.

THEORETICAL BASIS OF IMMUNOLOGY

Immune system is a typical defense system which effectively resists and kills infectious agents. The substances which can provoke an immune response are antigens. There are nonspecific (innate) and specific (adaptive) basic types of immunity. The adaptive immune system mainly consists of B and T lymphocytes cells. These cells play an important role in recognizing and destroying antigens. Each B cell secretes multiple copies of one kind of antibody. Activated B cells become memory cells or plasma cells, the latter actively secret antibodies (Fig. 1) (Dasgupta and Nino, 2009). For antigen, there are a set of epitopes which have the distinct molecular surface features that bound by an antibody. Epitopes in one antigen (Ag1) are always different from another (Ag2) and some antigens (Ag3) even have reduplicative epitopes. Especially, a type of antibody can recognize only one kind of antigen epitopes (Fig. 2) (Zhou, 2002). Combination of antigen epitopes and antibody is a complex chemical process and the combination forces are all non-covalent in nature. The combination forces between antigen epitopes and antibody ensures that the antigen will be bound tightly to the antibody (Zhou, 2002). Chemical bonds between antibody and antigen are shown in Fig. 3.

As mentioned above, antibody and epitopes of antigen are considered to be feature vectors, chemical bonds are attributes. Firstly, the affinity between antibody and antigen epitopes is calculated. Later, the combination forces of chemical bonds are considered.

Fig. 1: Differentiation process of B cell

Fig. 2: Combination between antigen apitopes and antibody

Fig. 3: Chemical bonds between antigen epitopes and antibody

An algorithm is proposed to describe the process.

Fig. 4: Flowchart for the proposed approach

PROPOSED PATTERN RECOGNITION APPROACH

Some definitions:

Antigen epitopes: Samples to be recognized
Antibody: Instances to recognize antigen epitopes
Affinity: Euclidean distance between antigen epitopes and antibody
Local binding sites: Corresponding chemical bonds
Local binding energy: Amount of combination forces between corresponding local binding sites
Match coefficient: Coefficient which evaluate whether local binding energy is big enough to combination

Description of pattern recognition approach: The feature space T will be expressed by a set of antibodies X, there are q classes in X, X = [X1, X2, ..., Xq]T. For each class, There are antibodies, Xi = [Xi1, Xi2, ..., Xikm]T, i = 1,2, ..., q. For each antibody, there are n local binding sites, Xik = (xik1, xik2, ..., xikn) εRn, I = 1,2, ..., q, k = 1,2, ..., m, where xikl, l = 1,2, ..., n, represents local binding site l of k-th antibody in class i. Each local binding site should reflect the main information of antibody.

Antigen epitope Yj1 = (yj1, yj2, ..., yjn), j = 1,2, ..., p, Affinity between and every antibody is calculated to recognize what class it is. The affinity is calculated by Euclidean distance as follows:

(1)

where, |YX|ikj, represent affinity between antigen epitope Yj and the k-th antibody in class i. The mean affinity between antigen epitope Yj and all of the antibodies in i-th class as follows:

(2)

From Eq. 2, the mean affinity is obtained and the antigen epitope will be divided into the class which has maximum mean affinity value.

But there are still some of antigen epitopes classified in error. As stated above, the local binding energy is considered whether the local binding sites between antibodies and antigen epitopes meet match coefficient. In what follows, it will be illustrated in detail.

For antigen epitope Yj1 = (yj1, yj2, ..., yjn), j = 1,2, ..., p, Antibody in k-th class Xik = (xik1, xik2, ..., xikn) ∈Rn, k = 1,2, ..., m, the local binding energy between yjl and xikl is calculated, l = 1,2, ..., n, i = 1,2, ..., p. If |Yjl-xikl|<sx|xikl| comes true, where s is the match coefficient. The antibody and its location l are stored inmatrix C, then sum l up. The more locations do antibody have, the more possible do antigen epitope belongs to the class where the antibody in. The algorithm process outlined below.

Define the ith class antibody set Xi = [Xi1, Xi2, ..., Xim]T, i = 1,2, ..., q. For each antibody Xik, there are n local binding sites, Xik = (xik1, xik2, ..., xikn)∈Rn. Antigen epitopes Yj = (yj1, yj2, ..., yjn), j = 1,2, ..., p.

Calculate affinity between antigen epitopes Yj and every antibody Xik in ith class with Eq. 1. Then put every affinity in matrix F where f111 represents affinity between Y1 and X11, fqmp represents affinity between Yp and Xqm and so on:

The mean affinity can be obtained with Eq.2 and put them in matrix . Where represents mean affinity between X1 and the first class antibody samples, represents mean affinity between Yp and the q-th class antibodies. Antigens classified according to the max mean affinity:

For the wrong classified antigen epitopes Yw = (yw1, yw2, ..., ywn), w∈[1,p]. Local binding energy is calculated and locations meet match coefficient is accumulated, storing them in matrix C. W= w+1, then do step 3 again until reach termination.

The flow chart of the algorithm is displayed in Fig. 4.

Discussion the range of match coefficients: Iris, wine and breast cancer data are used to decide the best values of match coefficient s, which is firstly chosen from 0.13 to 0.7. Different data of pattern recognition accuracy curve is shown in Fig. 5.

Fig. 5: Curves of recognition accuracy with match coefficient s

Integrated effect of different curves is considered. At last, s is chosen from 0.18 to 0.3 according to the mean effect curves where the recognition rate is relatively high.

NUMERICAL RESULTS AND COMPARISONS

The famous benchmark Fisher’s data (Fisher, 1988) is used to illustrate and test the proposed approach. The Iris data set is almost the best known database in the area of pattern recognition. The data set contains 3 classes, where each class refers to a type of iris plant. There are 50 instances in each class and 4 attributes in each feature vector. The first 10 instances in each class are chosen as antibodies and the remaining 40 instances as antigen epitopes.

The affinity between every antigen epitope in Yj and antibodies in Xik are calculated by Eq. 1. Most of them can be classified in correct class, but some of them still classified in error. The value in italic indicates correct class, but have been classified in wrong class in bold (Table 1).

In Table 1ocal binding sites meet match coefficient is accumulated. Take the Y81 wrong classified antigen epitope:

for example, local binding energy between Y81 and k-th class antibody paratope Xik, i = 1,2, ..., 30 is calculated. Then matrix C can be obtained which shows how many local binding sites in the antibody meet match coefficient. The results which 4 local binding sites are all meet match coefficient are in Table 2. The match coefficient s = 0.2.

Table 1: Wrong classified antigen epitopes and there mean affinity with each class (iris data)
Values in italic indicate correct class. Values in bold indicate wrong class

Table 2: Antibodies in Xik having 4 local binding sites all meet match coefficient bind wrong classified antigen epitopes (iris data)
Values in atlic indicate correct class. Values in bold indicate wrong class

According to depiction before, antigen epitopes Y1 to y40 belong to class 1, antigen epitopes Y41 to Y80 belong to class 2 and the rest belong to class 3. From Table 2, antibodies X22,3, X23,3, X24,3, X25,3, X28,3 are all having 4 local binding sites meet match coefficient to bind antigen epitope Y81. Consequently, antigen epitope Y81 should be classified in class 3. Antigen epitope Y90 is classified in class 3 as the same way. Antibody X17,2 belongs to class 2 but X22,3, X23,3, X24,3, X25,3, X29,3, are all in class 3. There is 83.33% that conclude the antigen epitope Y90 should be classified in class 3. Antibodies X11,2, X12,2, X13,2, X15,2, X16,2, X17,2, X19,2 are all in class 2, antibodies X24,3, X28,3, X29,3, in class 3, 70% shows the antigen epitope possibility should be classified in class 2. Antigen epitope Y140 classified in wrong class in this way.

Overall, the wrong classified antigen epitopes are reduced to 1 and accuracy is enhanced from 95% to 99.17% (one misclassification). Then do the test in another way, the middle 10 instances are chosen in each Iris class as antibodies and the remaining 40 instances as antigen epitopes. Most of the antigen epitopes can be recognized correctly, but there are still some of them wrong classified antigen epitopes which are shown in Table 3. The match coefficient s = 0.2.

Antigen epitope Y64 should be in class 2 but classified in class 3, antigen epitope Y87, Y100, Y104 should be in class 3 but classified in class 2 incorrectly. Antibodies in Xik whose 4 local binding sites are all meet match coefficient are shown in Table 4.

Antigen epitopes Y1 to Y10 belong to class 1, antigen epitopes Y11 to Y20 belong to class 2 and the rest belong to class 3. According to Table 4, antibodies X11,2, X13,2, X16,2, X17,2, X18,2, X19,2, X24,3, X26,3, X27,3, X28,3, X30,3 are all having 4 local binding sites meet match coefficient to bind error classified antigen epitope Y64.

Table 3: Wrong classified antigen epitopes and there mean affinity with each class (iris data)

Table 4: Antibodies in Xik having 4 local binding sites all meet match coefficient bind wrong classified antigen epitopes (iris data)

Table 5: Classification accuracy of iris data with different methods

Table 6: Antigen epitopes classified in error and there mean affinity with each class (wine data)
Values in atlic indicate correct class. Values in bold indicate wrong class

Table 7: Maximum local binding sites meet match coefficient and antibodies which the location in (wine data)

From above results, antigen epitope Y64 can be classified in class 2 and the same to Y87. The same way to analysis other wrong classified antigen epitopes. The recognition rate is 98.33% (two misclassifications). The average recognition rate for iris data set is 98.75%. There are some comparisons with other pattern recognition method is illustrated in Table 5.

The wine data is also used to test the algorithm. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars, so there are 59 instances in class 1, 71 instances in class 2 and the rest 48 instances in class 3. There are 13 attributes in the data set. The first 40 instances are chosen from each class as antibodies and the remaining of each class are deemed to antigen epitopes. The mean affinity between antigen epitopes and antibodies are calculated. The results are listed in Table 6.

The affinity between wrong classified antigen epitopes and antibodies are calculated. According to the antibody whose local binding sites have the maximum number meeting match coefficient, the antigen epitopes is classified again. Match coefficient s = 0.25. The result is illustrated in detail in Table 7.

For wine data set, the first 40 instances from each class are chosen as antibodies and the remaining are deemed to antigen epitopes. There are 13 local binding sites in antigen epitope meet match coefficient to bind antibody, so antigen epitope should be classified in class 1. Similarly, antigen epitope belongs to class 1 because it has 12 local binding sites meet match coefficient to bind antibody. So other false classified antigen epitopes are all classified again in right class except antigen epitope. Anyway, the recognition accuracy rate is 98.25% (one misclassification).

CONCLUSION

Biological immune system as the most important part of organisms plays an indispensable role in defending invaders. Relationships between antibodies and antigen epitopes, especially, the chemical bonds of them are considered more. The higher combination forces of chemical bonds, the better recognition of antigen epitopes. In this study, a novel pattern recognition approach is proposed and also the range of match coefficient is discussed. Iris and wine data sets from UCI machine learning repository are used to test the proposed approach. The numerical results show patterns can be recognized effectively. In our future work, some real-world data should be applied to the proposed pattern recognition approach.

ACKNOWLEDGMENT

This work is supported by National Natural Science Foundation of China (Grant No. 51175316), the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20103108110006), Shanghai Talent Development Fund (Grant No. 047) and Innovation Fund of Shanghai University.

REFERENCES

  • Al-Daoud, E., 2009. A comparision between three neural network models for classification problems. Int. Artif. Intell., 2: 56-64.
    CrossRef    Direct Link    


  • Al-Bashish, D., M. Braik and S. Bani-Ahmad, 2011. Detection and classification of leaf diseases using K-means-based segmentation and neural-networks-based classification. Inform. Technol. J., 10: 267-275.
    CrossRef    Direct Link    


  • Castro, P.A.D. and F.J. Von Zuben, 2008. MOBAIS: A bayesian artificial immune system for multi-objective optimization. Artif. Immune Syst., 5132: 48-59.
    CrossRef    


  • Castro, P.A.D. and F.J. Von Zuben, 2010. GAIS: A gaussian artificial immune system for continuous optimization. Artif. Immune Syst., 6209: 171-184.
    CrossRef    


  • Cheheltani, S.H. and S.M. Ebadzadeh, 2010. Immune based approach to find mixed Nash equilibrium in normal form games. J. Applied Sci., 10: 487-493.
    CrossRef    Direct Link    


  • Chang, X. and J.H. Lilly, 2004. Evolutionary design of a fuzzy classifier from data. IEEE Trans. Syst. Man Cybernetics B Cybernetics, 34: 1894-1906.
    PubMed    


  • Dasgupta, D. and L.F. Nino, 2009. Immunological Computation: Theory and Applications. In: Immunology Basics, Dasgupta, D. and L.F. Nino (Eds.). Auerbach Publications, USA., pp: 4-4


  • Forrest, S., A.S. Perelson, L. Allen and R. Cherukuri, 1994. Self-nonself discrimination in a computer. Proceedings of the IEEE Computer Society Symposium on Security and Privacy, May 16-18, 1994, Oakland, CA, USA., pp: 202-212.


  • Greensmith, J., U. Aickelin and G. Tedesco, 2010. . Information fusion for anomaly detection with the dendritic cell algorithm. Inform. Fusion, 11: 21-34.
    CrossRef    


  • Zhou, G.Y., 2002. Immunology. In: Introduction to Immune System Zhou, G.Y. (Ed.). People's Medical Publishing House, China, Pages: 496


  • Ishida, Y., 1990. Fully Distributed Diagnosis by PDP learning algorithm: Towards immune network PDP Model. Proceedings of the IEEE International Joint Conference on Neural Networks, June 17-21, 1990, San Diego, USA., pp: 777-782.


  • Ji, Z. and D. Dasgupata, 2009. V-Detector: An efficient negative selection algorithm with "Probably Adequate" detector coverage. Inform. Sci., 179: 1390-1406.
    CrossRef    


  • Ji-zhong, L. and W. Bo, 2005. AIS hypermutation algorithm based pattern recognition and its application in ultrasonic defects detection. Proceedings of the 2005 International Conference on Control and Automation, June 27-29, 2005, Budapest, Hungary, pp: 1268-1272.


  • Jalil, A., I.M. Qureshi, A. Naveed and T.A. Cheema, 2003. Feature extraction by using non-linear and unsupervised neural networks. Inform. Technol. J., 2: 40-43.
    CrossRef    Direct Link    


  • Khanale, P.B. and S.D. Chitnis, 2011. Handwritten devanagari character recognition using artificial neural network. J. Artif. Intell., 4: 55-62.
    CrossRef    Direct Link    


  • Khanale, P.B., 2010. Recognition of marathi numerals using artificial neural network. J. Artif. Intell., 3: 135-140.
    CrossRef    Direct Link    


  • Lotfi, F., F. Nadir and B. Mouldi, 2006. Arabic words recognition by fuzzy classifier. J. Applied Sci., 6: 647-650.
    CrossRef    Direct Link    


  • Matzinger, P., 1994. Tolerance, danger and the extended family. Annu. Rev. Immunol., 12: 991-1045.
    CrossRef    Direct Link    


  • Fisher, R.A., 1988. Iris data set. UCI, Machine Learning Reposotory, Center for Machine Learning and Intelligent Systems.


  • Srinivasa, K.G., M. Jagadish, K.R. Venugopal and L.M. Patnaik, 2007. Data mining based query processing using rough sets and genetic algorithms. Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining, March 1-April 5, 2007, Honolulu, HI., pp: 275-282.


  • Timmis, J., P. Andrews, N. Owens and E. Clark, 2008. An interdisciplinary perspective on artificial immune systems. Evol. Intell., 1: 5-26.
    CrossRef    


  • Yu, S. and D. Dasgupta, 2009. Conserved self pattern recognition algorithm with novel detection strategy applied to breast cancer diagnosis. J. Artif. Evol. Appl.,
    CrossRef    


  • Yap, D.F.W., S.P. Koh, S.K. Tiong and S.K. Prajindra, 2011. Particle swarm based artificial immune system for multimodal function optimization and engineering application problem. Trends Applied Sci. Res., 6: 282-293.
    CrossRef    Direct Link    


  • Muda, Z., W. Yassin, M.N. Sulaiman and N.I. Udzir, 2011. A K-means and naive bayes learning approach for better intrusion detection. Inform. Technol. J., 10: 648-655.
    CrossRef    Direct Link    

  • © Science Alert. All Rights Reserved