Subscribe Now Subscribe Today
Research Article

Application of Improved AAM and Probabilistic Neural network to Facial Expression Recognition

N. Neggaz, M. Besnassi and A. Benyettou
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail

Automatic facial expression analysis is an interesting and challenging problem and impacts important applications in many areas such as human–computer interaction. This study discusses the application of improved Active Appearance Model (AAM) based on evolutionary feature extraction in combination with Probabilistic Neural Network (PNN) for recognition of six different facial expressions from still pictures of the human face. Experimental results demonstrate an average expression recognition accuracy of 96% on the JAFFE database, which outperforms the rate of all other reported methods on the same database. The present study, therefore, proves the feasibility of computer vision based on facial expression recognition for practical applications like surveillance and human computer interaction.

Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

  How to cite this article:

N. Neggaz, M. Besnassi and A. Benyettou, 2010. Application of Improved AAM and Probabilistic Neural network to Facial Expression Recognition. Journal of Applied Sciences, 10: 1572-1579.

DOI: 10.3923/jas.2010.1572.1579

Received: January 28, 2010; Accepted: April 02, 2010; Published: June 26, 2010


A facial expression is a visible manifestation of the affective state, cognitive activity, intention, personality and psychopathology of a person; it plays a communicative role in interpersonal relations. Facial expressions and other gestures, convey non-verbal communication cues in face-to-face interactions. These cues may also complement speech by helping the listener to elicit the intended meaning of spoken words. With the availability of low cost imaging and computational devices, automatic facial recognition systems now have a potential to be useful in several day-to-day application environments like operator fatigue detection in industries, user mood detection in Human Computer Interaction (HCI) and possibly in identifying suspicious persons in airports, railway stations and other places with higher threat of terrorism attacks. Early work in this area are due mainly to Darwin proposed an evolutionary theory of facial expressions (Ekman et al., 1969).

In the 19th century, Guillaume Duchenne was the first individual to locate the various facial muscles by electrical activation. He was the first to deliver a set of photographs showing the activation of different facial muscles. Paul Ekman, a psychologist, was interested in the mid 60’s to expressions and human emotions (Yingli et al., 2001), proposed a system for automatic analysis of facial expressions by units action recognition. The classification was done by neuron networks. Pantic and Rothkrantz (2004) proposed a system of units action recognition from static pictures of faces (front and profile), the analysis is performed in part by a contouring of the face on the picture profile.

Viola used Adaboost method to solve computer vision problems such as image retrieval and face detection (Viola and Jones, 2001), which can select features in the learning phase using a greedy strategy. AdaBoost method does not perform well in a small sample case (Guo and Dyer, 2003), which is the case in our experiments. Yacoob and Davis (1994) used the inter-frame motion of edges extracted in the area of the mouth, nose, eyes and eyebrows. Bartlett et al. (1996), used the combination of optical flow and principal components obtained from image differences. Hoey and Little (2000) approximated the flow of each frame with a low dimensional vector based on a set of orthogonal Zernike polynomials and applied their method to the recognition of facial expressions with hidden Markov models (HMMs), Lyons et al. (1998, 1999), Zhang (1999) and Zhang et al. (1998) used Gabor wavelet coefficients to code face expressions. In their work, they first extracted a set of geometric facial points and then use multi-scale and multi-orientation Gabor wavelets filters to extract the Gabor wavelet coefficients at the chosen facial points. Similarly, Wiskott et al. (1997) used a labeled graph, based on the Gabor wavelet transform, to represent facial expression images. They performed face recognition through elastic graph matching.

Lyons et al. (1999) proposed a method for classifying facial images automatically based on labeled elastic graph matching, 2D Gabor wavelet representation and Linear Discriminant Analysis (LDA). The recognition rate was 92%. Gao et al. (2003) used the structural and geometrical features of a user-sketched expression model to match the Line-Edge Map (LEM) descriptor of an input face image. The Active Appearance Model (AAM) proved to be effective for interpreting images of deformable objects (Cootes et al., 1992; Cootes and Taylor, 2004). Davoine et al. (2004), used an AAM for facial expression recognition and synthesis, which can normalize the facial expression of a given face and artificially synthesize new expressions on the same face.


Facial feature extraction attempts to find the most appropriate representation of face images for recognition and is for instance the key step in facial expression recognition. For this phase we use the Active Appearance Model (AAM) basic or improved by the Differential Evolution (DE).

Basic active appearance model: AAM (Cootes et al., 1998) uses PCA to encode both shape and texture variation of the training dataset. The shape of an object can be represented by vector s and the texture (gray level) by vector g. We apply one PCA on shape and another PCA on texture to create the model, given by:


where, si and gi are shape and texture, and are mean shape and mean texture. øs and øg are vectors representing variations of orthogonal modes of shape and texture respectively. bs and bg are vectors representing parameters of shape and texture. i is the image dataset index. By applying a third PCA on:


H is a matrix f dc eigenvectors obtained by PCA and c is the appearance vector.

The modifications of c parameters change both shape and texture of the object. Each object is defined by the appearance vector c and poses vector t:


where, tx and ty are x and y axis translation, θ is the angle of orientation and S is the Scale.

AAM learns the linear regression models which gives us the predicted modifications of model parameters δC and δt:


RC and Rt are the appearances and pose regression matrix respectively. The model search is driven by the residual G of the search image and model reconstruction.

Required memory: column number of regression matrix is equal to number of model pixels.

Row number is product of number of experiment q with the number of parameter to be optimized: 4 for Rt (Eq. 3) and as much as parameter as eigenvector retained in matrix ø (Eq. 2) for RC.

The search algorithm in new image is as follow:

Generate texture gm and form s according to c parameter (initially equal to 0)
Calculate gi, the image texture which is in the form s.
Evaluate δgo = gi-gm and E0 = |δgo|2
Predict the modification δco = RCxδgo and δto = Rtxδgo which has to be given to the model
Find the first attenuation coefficient k (among [1 0.5 0.25]) generate an error Ei<E0

With Ei = |δg1|2 = |gm-gm1|, gm is the texture created by c1 = c-kxδco and gm1 is the texture of image being in the form sm1 (form given by c1).

If error Ei is not stable, the difference Ei-Ei-1 is higher than a threshold ξ defined previously, return to step 1 and replace c by c1

When the convergence is reached, the searching form and texture face is generated with model given by gm and s represented using c1. The number of iterations required is function of error Ei stabilization.

Differential evolution: Differential algorithm is a new heuristic approach which has three basic advantages: It finds the true global minimum of classical benchmarks such as Sphere, Rosenbrock’s and Rastrigin’s functions optimization task regardless of the initial parameter values; it has a fast convergence rate and uses few control parameters (Karaboga and Akay, 2009). DE algorithm is a population based like the genetic algorithms that use similar operators, crossover, mutation and selection but they are some differences, for example at the level of selection, operator GA uses stochastic selection (RWS) and DE uses deterministic selection (Storn and Price, 1997). Furthermore, the structure of mutation in DE algorithm is different in comparison to genetic algorithm and the crossover at the level of genetic algorithm allows producing two new individuals.

The algorithm DE supports the mutation operator that is described below (Price et al., 2005):

DE/rand/1: This notation specifies that the vector to be perturbed is randomly chosen and that the perturbation consists of one difference vector

For each individual xi,G i = 0, 1, ...NP-1 a mutated individual is generated according to:


DE/best/1: Unlike the previous strategy the individual of the next generation is generated by the best member of the population using the following formula:


DE/best/2: This strategy uses two difference vectors as a perturbation:


DE/rand to best/1: This strategy places the perturbation at a location between the randomly chosen population member and the best population member:


DE/rand/2: This strategy is generated by:



xi,G : ith: Individual of the current generation G
x : Set of population
F : Mutation constant ε [0,2]

xbest,G xr1G, xr2G, xr3G, xr4G, xr5g are the randomly selected population in the current generation. After mutation, crossover is applied to the individuals by the following rule:


Where, pc is a crossover constant.

All the solutions in the population can be selected as parents without depending on their fitness value and then the produced offspring after the mutation and crossover operations is evaluated. After that, the performance of the child vector and its parent are compared with the selection of the best one (Neggaz and Benyettou, 2009).

Finally, the parent is retained in the population if it still represents the best element.

The general pseudo code of DE is summarized in Algo2 (Lampinen and Zelinka, 2000)

Algo 1: Pseudo-code for a general DE

Improved active appearance model: We propose to adapt the Differential Evolution to optimize the AAM parameters in the segmentation phase. When we have to align an object (detecting characteristics points and texture) with AAM, we must find a vector v that minimizes the sum of quadratic errors E with:



where, c is the appearance vector, M is the number of pixels of the model and ei the error in the pixel i.

Differential evolution neither needs additional required memory space necessary to store the model nor information on the directions to a priori to minimize the error E. The size of the two matrices (of appearance RC and of pose Rt) is equal to Mx(dc+dt) Bytes, M being the number of pixels contained in texture of the model (dc and dt are the c (Eq. 2) and t Size (Eq. 3)).


In this section we propose classification method: probabilistic neural networks PNN. Several architectures have been proposed to evaluate the performance of facial expression recognition using connectionist models, varying the input parameters (shape, texture, texture + shape).

The overall objective of our study is to construct a system of automatic recognition of facial expressions (joy, sadness, anger, disgust, fear, surprise) the JAFFE database (Press et al., 1988).

Probabilistic neural networks Mechanism of PNN: Neural networks are widely used in pattern classification since they do not need any information about the probability distribution and the a priori probabilities of different classes.

PNNs are basically pattern classifiers. It combines the well known Bayes decision strategy with the Parzen non-parametric estimator of the probability density functions (pdf) of different classes. PNNs have gained interest because they yield a probabilistic output and are easy to implement.

Taking a two categories situation as an example. We should decide the known state of nature θ to be either θA or θB. Suppose a set of measurements is obtained as p-dimensional vector x = [x1, …, xp], the Bayes decision rule becomes:


Here, fA(x) and fB(x) are the PDF for categories A and B, respectively. lA is the loss function associated with the wrong decision d(x) = θB when θ = θA, lB is the loss function associated with the wrong decision d(x)= θA when θ = θB and the losses associated with correct decisions are taken to be zero. hA and hB are the a priori probability of occurrence of patters from category A and B, respectively.

In a simple case that assumes the loss function and a priori probability are equal, the Bayes rule classifies an input pattern to the class with a higher PDF. Therefore, the accuracy of the decision boundaries depends on what the underlying PDFs are estimated. Parzen’s results can be extended to estimate in the special case where the multivariate kernel is a product of univariate kernels. In the particular case of the Gaussian kernel, the multivariate estimates can be expressed as:


Here, m is the number of training vectors in category A, p is the dimensionality of the training vectors, xAi is the ith training vector for category A and σ is the smoothing parameter. It should be noted that fA(x) is the sum of small multivariate Gaussian distributions centered at each training sample, but the sum is not limited to being Gaussian (Specht, 1990).

PNN structure: Figure 1 shows the outline of a PNN. When an input is presented, the first layer computes distances from the input vector to the Input Weights (IW) and produces a vector whose elements indicate how close the input is to the IW. The second layer sums these contributions for each class of inputs to produce as its net output a vector of probabilities. Finally, a compet transfer function on the output of the second layer picks the maximum of these probabilities and produces a 1 for that class and a 0 for other classes (Gill and Sohal, 2005). Figure 1 shows a PNN (R, Q and K represent the number of elements in the input vector, the input/target pairs and classes of the input data, respectively. IW and LW represent the input weight and the layer weight, respectively.

The mathematical expression of the PNN can be expressed as:



In this study, the radbas is selected as:

Fig. 1: Architecture of a Probabilistic Neural Network (PNN)


The compet function is defined as:


This type of setting can produce a network with zero errors on training vectors and obviously it does not need any training.


Excessive features increase computation times and storage memory. Furthermore, they sometimes make classification more complicated, which is called the curse of dimensionality. It is required to reduce the number of features.

Principal Component Analysis (PCA) is an efficient tool to reduce the dimension of a data set consisting of a large number of interrelated variables while retaining most of the variations. It is achieved by transforming the data set to a new set of ordered variables. This technique has three effects: it orthogonalizes the components of the input vectors so that uncorrelated with each other, it orders the resulting orthogonal components so that those with the largest variation come first and eliminates those components contributing the least to the variation in the data set.

It should be noted that the input vectors should be normalized to have zero mean and unity variance before performing PCA, which is shown in Fig. 2.

The normalization is a standard procedure. Details about PCA could be seen in Ref (Zhang et al., 2009).


The database contains 213 picture of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models (Jaffe). The database has been planned and assembled by Miyuki Kamachi, Michael Lyons and Jiro GYOBA. The photos were taken in the department of psychology at the University of Kyushu (Lyons et al., 1998, 1999). The learning base contains 110 pictures and test database contains 103 images with 6 facial expressions in different poses.

Fig. 2: Using normalization before PCA


Results of the AAM:Figure 3 shows an original picture to the left, its corresponding shape in the middle and texture right.

Figure 4 shows the iterative process of the AAM.

Architecture and parameters: Our network of facial expression recognition is composed of two layers. An input layer of 84 neurons (84 items related to the number of principal components obtained by PCA on the results obtained by the active appearance model), an output layer of 6 neurons representing 6 expressions (joy, sadness, anger, disgust, fear, surprise). The input parameters are determined by the active appearance model and its variants. These parameters are represented by:

Face texture: a square matrix [340 340] pixels
Face shape: vector 68 points
Texture face + face shape: concatenation between texture and shape vector

For PNN, we used the results of active appearance model:

Basic Active Appearance Model
Active appearance Model improved by Differential Evolution

Table 1 shows recognition rates for both types of neural networks.

According to results shown in Table 1, we note that the input architecture PNN, the texture of the face, gave good results compared to the other two architectures. And the Differential Evolution method has great advantage over the active appearance basic model.

The memory has been reduced and we obtained the shape and texture of faces in different poses, without the Matrix experience.

Table 1: The input features recognition rates vs. classification approaches

Fig. 3: Original left picture, its corresponding shape in the middle and right texture

Fig. 4: Iterative process on the face of a sad expression

Table 2 shows the confusion matrix of the facial expression recognition when the PNN classifier is used. It shows that most of the happy, anger, disgust and sadness expressions were classified properly but some of the fear and surprise expression are misclassified as the disgust / fear expression due to their similarity.

Table 2: Confusion matrix using PNN based on improved AAM using DE

Table 3: Comparison of recognition rate on JAFFE database

Comparative study: Table 3 compares the facial expression recognition performances using different types of classifiers: the k-NN, MLP, RBF and SOM (Zhang, 1999; Praseeda and Sasikumar, 2008; Gu et al., 2010). Table 3 shows that PNN based on texture is known as one of the best classifiers, with a recognition rate of 96%. The MLP classifier based on contour outperforms other classifiers like KNN and SOM, with a recognition rate of 90.29%.


The aim of this study is a robust classification of facial expressions. Various inputs are used the PNN network by results of the AAM (the shape of the face, the face texture, texture + shape of the face). After several experiments we found that the texture of the face improves the classification results. For the AAM we used the differential evolution optimization method to limit the memory space and obtain forms and textures of realistic face in different images.

The results are encouraging enough to explore real-life applications of facial expression recognition in fields like surveillance and user mood evaluation. Like future work, we are interested in applying artificial bee algorithm with probabilistic neural network for dynamic facial expression recognition.

1:  Bartlett, M., P. Viola, T. Sejnowski, L. Larsen, J. Hager and P. Ekman, 1996. Classifying Facial Action. In: Advances in Neural Information Processing Systems, Touretzky, D., M. Mozer, M. Hasselmo (Eds.). Vol. 8, MIT Press, Cambridge, Massachusetts, pp: 823-829.

2:  Cootes, T.F., D.H. Cooper, C.J. Taylor and J. Graham, 1992. A trainable method of parametric shape description. Image Vision Comput., 10: 289-294.

3:  Cootes, T.F., G.J. Edwards and C.J. Taylor, 1998. Active appearance models. Proceedings of the 5th European Conference on Computer Vision, June 2-6, 1998, Freiburg, Germany, pp: 484-498.

4:  Cootes, T. and C. Taylor, 2004. Statistical Models of Appearance for Computer Vision. Imaging Science and Biomedical Engineering, USA.

5:  Davoine, F., B. Abboud and M. Dang, 2004. Analyse de visages et d`expressions faciales par modèle actif d`apparence. Traitement du Signal, 21: 179-193.
Direct Link  |  

6:  Ekman, P., E.S. Richard and V.F. Wallace, 1969. Pan-cultural elements in facial displays of emotion. Science, 164: 86-88.
CrossRef  |  

7:  Gao, Y., M.K.H. Leung, S.C. Hui and M.W. Tananda, 2003. Facial expression recognition from line-based caricatures. IEEE Trans. Syst. Man Cybernet-Part A Syst. Human, 33: 407-412.

8:  Gill, G.S. and J.S. Sohal, 2005. Battlefield decision making: A neural network approach. J. Theor. Applied Inform. Technol., 4: 697-699.
Direct Link  |  

9:  Gu, W.F., Y.V. Venkatesh and C. Xiang, 2010. A novel application of self-organizing network for facial expression recognition from radial encoded contours. Soft Comput., l14: 113-122.
CrossRef  |  

10:  Guo, G. and C.R. Dyer, 2003. Simultaneous feature selection and classifier training via linear programming: A case study for face expression recognition. IEEE Conf. Comput. Vision Pattern Recogn., 1: I-346-I-352.
CrossRef  |  

11:  Hoey, J. and J.J. Little, 2000. Representation and recognition of complex human motion. IEEE Conf. Comput. Vision Pattern Recogn., 1: 752-759.
CrossRef  |  

12:  Karaboga, D. and B. Akay, 2009. A comparative study of artificial bee colony algorithm. Applied Math. Comput., 214: 108-132.
CrossRef  |  Direct Link  |  

13:  Lampinen, J. and Y. Zelinka, 2000. On stagnation of the differential evolution algorithm. Proceedings of the 6th International Conference on Soft Computing, June 7-9, 2000, Brno, Czech Republic, pp: 76-83.

14:  Lyons, M.J., J. Budynek and S. Akamatsu, 1999. Automatic classification of single facial images. IEEE Trans. Pattern Anal. Mach. Intell., 21: 1357-1362.
CrossRef  |  

15:  Lyons, M., S. Akamatsu, M. Kamachi and J. Gyoba, 1998. Coding facial expressions with Gabor wavelets. Proceedings of the 3rd International Conference on Automatic Face and Gesture Recognition, April 14-16, 1998, Institute of Electrical and Electronics Engineers (IEEE), pp: 200-205.

16:  Neggaz, N. and A. Benyettou, 2009. Hybrid models based on biological approaches for speech recognition. Artificial Intel. Rev., 32: 45-57.
CrossRef  |  Direct Link  |  

17:  Pantic, M. and L.J.M. Rothkrantz, 2004. Facial action recognition for facial expression analysis from static face images. IEEE Trans. Syst. Man Cybernet. Part B, 34: 3-3.
Direct Link  |  

18:  Praseeda, L.V. and M. Sasikumar, 2008. A neural network based facial expression analysis using gabor wavelets. World Acad. Sci. Eng. Technol., 42: 563-568.
Direct Link  |  

19:  Press, W.H., B.P. Flannery, S.A. Teukolsky and W.T. Vettering, 1988. Numerical Recipes in C. Cambridge University Press, Cambridge, UK.

20:  Price, K.V., R.M. Storn and J.A. Lampinen, 2005. Differential Evolution: A Practical Approach to Global Optimization. 1st Edn., Springer-Verlag, New York, ISBN-10: 3540209506, pp: 538.

21:  Specht, D.F., 1990. Probabilistic neural networks. Neural Network, 3: 109-118.

22:  Viola, P. and M. Jones, 2001. Rapid object detection using a boosted cascade of simple feature. IEEE Conf. Comput. Vision Pattern Recogn., 1: 511-518.
Direct Link  |  

23:  Wiskott, L., J.M. Fellous, N. Kruger and M.C. Vonder, 1997. Face recognition by bunch graph matching. IEEE Trans. Pattern Anal. Machine Intell., 19: 775-779.

24:  Yacoob, Y. and L. Davis, 1994. Recognizing facial expressions by spatiotemporal analysis. Proc. Int. Conf. Pattern Recog., 1: 747-749.
CrossRef  |  

25:  Tian, Y.L., T. Kanade and J.F. Cohn, 2001. Recognizing action units for facial expressions analysis. IEEE Trans. Pattern Anal. Mach. Intell., 23: 97-115.
CrossRef  |  

26:  Zhang, Z., M. Lyons, M. Schuster and S. Akamatsu, 1998. Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron. Proceedings of the International Conference Automatic Face and Gesture Recognition, April 14-16, 2005, IEEE Computer Society Washington, DC., USA., pp: 454-459.

27:  Zhang, Z., 1999. Feature-based facial expression recognition: Sensitivity analysis and experiments with a multi-layer perceptron. J. Pattern Recogn. Artif. Intell., 13: 893-911.

28:  Zhang, Y., L. Wu, N. Neggaz, S. Wang and G. Wei, 2009. Remote-sensing image classification based on an improved probabilistic neural network. Neural Networks Sensors, 9: 7516-7539.
CrossRef  |  

29:  Storn, R. and K. Price, 1997. Differential evolution-A simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim., 11: 341-359.
CrossRef  |  Direct Link  |  

©  2021 Science Alert. All Rights Reserved