
Research Article


NonLinear Principal Component Embedding for Face Recognition 

Eimad Eldin Abdu Ali Abusham
and
Wong Eng Kiong



ABSTRACT

A new face recognition method, based on the local nonlinear mapping, is proposed in this study. Face images are typically acquired in frontal views and often illuminated by a frontal light source. Unfortunately, recognition performance is found to significantly degrade when the face recognition systems are presented with patterns that go beyond from these controlled conditions. Face images acquired under uncontrolled conditions have been proven to be highly complex and are nonlinear in nature; thus, the linear methods fail to capture the nonlinear nature of the variations. The proposed method in this study is known as the Nonlinear Principal Component Embedding (NPCE) which is aimed to solve the limitation of both linear and nonlinear methods by extracting discriminant linear features from highly nonlinear features; the method can be viewed as a linear approximation which preserves the local configurations of the nearest neighbours. The NPCE automatically learns the local neighbourhood characteristic and discovers the compact linear subspace which optimally preserves the intrinsic manifold structure; a principal component is then carried out onto low dimensional embedding with reference to the variance of the data. To validate the proposed method, Carnegie Mellon University Pose, Illumination and Expression (CMUPIE) database was used. Experiments conducted in this research revealed the efficiency of the proposed method in face recognition as follows: (1) extract discriminant linear features from highly nonlinear features based on the local mapping and (2) Runtime speed is improved as face feature values are reduced in the embedding space. The proposed method achieves a better recognition performance in the comparison with both the linear and nonlinear methods.







INTRODUCTION
The stateoftheart face recognition systems are found to yield satisfactory
performance under controlled conditions, i.e., where face images are typically
acquired in frontal views and often illuminated by the frontal light source.
Unfortunately, recognition performance is found to significantly degrade when
the face recognition systems are presented with patterns which go beyond these
controlled conditions. Some examples of unconstrained conditions include illumination,
pose variations, etc. In particular, variations in face images have been proven
to be highly complex and nonlinear in nature. Linear subspace analysis has
been extensively applied to face recognition. A successful face recognition
methodology is largely dependent on particular choice of features used by the
classifier. Although, linear methods are easy to understand and are very simple
to implement, the linearity assumption does not hold in many realworld scenarios.
A disadvantage of the linear techniques is that they fail to capture the characteristics
of the nonlinear appearance manifold. This is due to the fact that the linear
methods extract features only from the input space without considering the nonlinear
information between the components of the input data. However, nonlinear mapping
can often be approximated using a linear mapping in a local region. This has
motivated the design of the nonlinear mapping methods in this study. The history
of the nonlinear mapping is long; it can be traced back to nonlinear mapping
(Sammon, 1969). Over time, different techniques have
been proposed such as the projection pursuit (Friedman and
Tukey, 1974), the projection pursuit regression (Friedman
and Stuetzle, 1981), selforganizing maps or SOM, principal curve and its
extensions (Hastie and Stuetzle, 1989; Kegl
et al., 2000; Smola et al., 2001;
Tibshirani, 1992), autoencoder neural networks (Baldi
and Hornik, 1989; DeMers and Cottrell, 1993) and
generative topographic maps or GTM (Bishop et al.,
1998). A comparison of some of these methods can be found by Mao
and Jain (1995). Recently, a new line of the nonlinear mapping algorithms
was proposed based on the notion of manifold learning. Given a data set which
is assumed to be lying approximately on the manifold in a high dimensional space,
dimensionality reduction can be achieved by constructing a mapping which respects
certain properties of the manifold. Manifold learning has been demonstrated
in different applications including face pose detection (Hadid
et al., 2002; Li et al., 2001), high
dimensional data discrimnant analysis (Bouveyron et al.,
2007), face recognition (Yang, 2002; Zhang
and Wang, 2004), analysis of facial expressions (Chang
et al., 2004; Elgammal and Lee, 2004), human
motion data interpretation (Jenkins and Mataric, 2004),
gait analysis (Elgammal and Lee, 2004a,
b), visualization of fibre traces (Brun et al.,
2003), wood texture analysis (Niskanen and Silvn, 2003)
and kernel fractiona lstep discriminant analysis (KFDA) for the nonlinear
feature extraction and dimensionality reduction by Guang
et al. (2006). Recently, Li et al. (2008)
proposed the nonlinear DCT discriminant feature which analyzes the nonlinear
discriminabilities of the DCT frequency bands and selects appropriate bands.
Nevertheless, these methods still lack discriminant features representation,
based on the local structure of data which is very important for recognition
when variations of face images are present. Therefore, the aim of this study
to device local nonlinear discriminant feature representations which are reliable
and have more discriminative power face recognition.
MATERIALS AND METHODS
Preprocessing: Face preprocessing and normalization is a significant
part of the face recognition systems. Changes in lighting conditions have been
found to dramatically decrease the performance of face recognition. Therefore,
all images have been preprocessed to obtain a representation of the face which
is invariant to illumination, while keeping the information necessary to allow
a discriminative recognition of the subjects. Gaussian kernel has been used
to estimate the local mean and standard deviation of images to correct nonuniform
illumination. The local normalization is computed as follows:
where, f(x,y) is the original image, while m is an estimation of the local
mean of f and s is an estimation of the local SD. Figure 1
below illustrates the block diagram of the developed method.
The NPCE algorithm: This method finds reconstruction weight by capturing the intrinsic geometry of the neighbourhood. The NPCE creates a locally linear mapping from the high dimensional coordinates to the low dimensional embedding as shown in Fig. 2
.
Compute the average weight which represents every face data by its neighbours.
where, x_{i} refers the ith unknown sample and x_{ij} is the corresponding training sample according to the Kvalues (the nearest neighbours).
Compute the lowdimensional embedding D, the following cost function is minimized:
where, N is the number of training and K is the number of the nearest neighbours.
Then, the principal component of the training is calculated as follows:

Fig. 1: 
Block diagram of the NPCE 

Fig. 2: 
Reconstruction weights 
where,
is the mean and C is the covariance matrix; {P_{1}, P_{2},…,P_{N}}
are the eigenvectors of C. The eigenvectors then play a role which projects
a vector in the lowdimensional face subspace into discriminatory feature space
that can be formulated as follows:
Once, the weighted values of each neighbour sample of the unknown sample are
obtained, the mapping formulate can be seen as follows:
where, q_{ij} is the closely training sample and the neighbour indices
are the same as that of the sample in the original high dimensional space and
is the corresponding one of the unknown samples in the discriminant space.
RESULTS AND DISCUSSION
CMUPIE database: This is one of the largest datasets developed to investigate
the affect of pose, illumination and expression. It contains images of
68 people; each under 13 different poses, 43 different illumination conditions
and 4 different expressions (Sim et al., 2002).
In the experiments conducted in this study, 6 out of 13 poses were selected
for each person. Out of 43 illumination configurations, 21 were selected to
typically span the set of variations and these covered the left to the right
profile.
Nonlinear Principal Component Embedding (NPCE): In this set of experiments,
the Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA)
are two powerful tools used for dimensionality reduction and feature extraction
in most of pattern recognition applications; these tools were used to assess
the efficiency of the method proposed in this study.

Fig. 3: 
The recognition rates of the PCA, LDA and NPCE 
Table 1: 
The average error rates (%) of the PCA, LDA, NPCE, across
ten tests and four dimensions 

Figure 3 shows that the dimensions used for testing the NPCE range (between 10 and 150)
and the proposed method was found to significantly outperform the PCA and LDA. More experiments were conducted on the reduced selected dimensions (65, 75, 90 and 110) to assess the performance of the NPCE. For this, good recognition rates were obtained; the recognition rates of 49.66, 55.9 and 82.87% were obtained by the PCA, LDA and NPCE, with feature dimensions of 110, 67 and 100, respectively. As for the LDA, the maximum feature dimension cannot be more than 67, which is C1 (number of classes1).
Table 1 shows the average recognition error rates across ten tests and four dimensions (65, 75, 90 and 110). From these results, the NPCE was found to achieve the lowest error rate, as compared to the standard linear methods of PCA and LDA.
Figure 4 shows the results when the NPCE was used, as compared
to the KPCA and LDA (Jian et al., 2005), as well
as the Generalized Discriminant Analysis (GDA) (Baudat and
Anouar, 2000). The method was shown to achieve 82.87% accuracy and had significantly
outperformed the KPCA plus LDA and GDA; the later methods achieved the maximum
accuracy of 77.22 and 79.92%, respectively.
The proposed method was developed to learn embedding in the nonlinear manifold
based on the knearest neighbour method and preserve the local geometry of the
original highdimensional data in a lowdimensional space as good as possible.
In addition to these, the NPCE was found to minimize the reconstruction error
of the neighbour weights for every data point in the lowdimensional space.
The training sets are projected into the intrinsic lowdimensional space to
improve their classification ability and runtime speed, while the principal
components are projected into the lowdimensional embedding, with reference
to the variance of the data, as given in Eq. 5. As a result,
the maximum feature dimension can be more than C1 (number of classes1).

Fig. 4: 
The recognition rate of the KPCA plus LDA, GDA and NPCE 
Therefore, this is considered as a solution to Small Sample Size (SSS) problem, where the size of sample is always smaller than the dimension of sample. In addition, the performance of the proposed method is compared with several different stateoftheart nonlinear methods. Based on the results presented in Table 1 and Fig. 4, the feature representations are proven to have more discriminative power, while the NPCE achieves a better recognition performance as compared to the linear and nonlinear methods.
CONCLUSION
A new Nonlinear Principal Component Embedding (NPCE) for face recognition has been introduced and proposed in this research. The proposed method is based on the local nonlinear discriminant representation, which is particularly robust against the SSS problem as compared to the traditional one used in LDA. NPCE utilize a novel discriminant principal component to estimate the face feature values in the reduced embedding space. At the same time, the proposed methods have been found to perform an implicit reduction over the whole set of features, as shown by the results derived from the experiments. Therefore, the researchers regard this as significant due to the fact that the runtime speed is as important as the actual recognition rate, i.e., if only a subset of the features is used. The experiments conducted in this study clearly reveal that the proposed method is superior to the stateofthe art methods. Thus, the future study will concentrate on achieving continuous improvement for the devised method and extending it so as to incorporate more local features of the subjects.
ACKNOWLEDGMENT
This study is conducted as a part of my Ph.D at Multimedia University from 2004 to 2007. The author is thankful to the Multimedia University for providing the research facilities and financial support which enable us to carry out this research successfully.

REFERENCES 
1: Baldi, P. and K. Hornik, 1989. Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks, 2: 5358. CrossRef 
2: Baudat, G. and F. Anouar, 2000. Generalized discriminant analysis using a kernel approach. Neural Comput., 12: 23852404. CrossRef  Direct Link 
3: Bishop, C.M., M. Svensen and C.K.I. Williams, 1998. GTM: The generative topographic mapping. Neural Comput., 10: 215234. Direct Link 
4: Bouveyron, C., S. Girard and C. Schmid, 2007. High dimensional data clustering. Comput. Stat. Data. Anal., 52: 502519. CrossRef 
5: Brun, A., H.J. Park, H. Knutsson and C.F. Westin, 2003. Colouring of DTMRI fiber traces using Laplacian eigenmaps. Lecture Notes Comput. Sci., 2809: 518529. CrossRef  Direct Link 
6: Chang, Y., C. Hu and M.T. Matthew, 2004. Probabilistic expression analysis on manifolds. Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition, June 27July 2, 2004, IEEE Computer Society, Washington, DC., USA., pp: 520527.
7: DeMers, D. and G. Cottrell, 1993. Nonlinear dimensionality reduction. Adv. Neural Inf. Process. Syst., 5: 580587. Direct Link 
8: Elgammal, A. and C.S. Lee, 2004. Inferring 3D body pose from silhouettes using activity manifold learning. Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition, June 26July 2, 2004, Washington, DC., USA., pp: 681688.
9: Elgammal, A. and C.S. Lee, 2004. Separating style and content on a nonlinear manifold. Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition, June 26July 2, 2004, Washington, DC., USA., pp: 478489.
10: Friedman, J.H. and J.W. Tukey, 1974. A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput., 23: 881890. CrossRef  Direct Link 
11: Friedman, J.H. and W. Stuetzle, 1981. Projection pursuit regression. J. Am. Stat. Assoc., 76: 817823.
12: Guang, D., D. Yeunga and Y. Qianb, 2006. Face recognition using a kernel fractionalstep discriminant analysis algorithm. Pattern Recogn., 40: 229243. CrossRef 
13: Hadid, A., O. Kouropteva and M. Pietikainen, 2002. Unsupervised learning using locally linear embedding: Experiments in face pose analysis. Proceedings of the 16th International Conference on Pattern Recognition, August 1115, 2002, University of Oulu, Finland, pp: 111114.
14: Hastie, T. and W. Stuetzle, 1989. Principal curves. J. Am. Stat. Assoc., 84: 502516.
15: Jenkins, O. and M. Mataric, 2004. A spatiotemporal extension to Isomap nonlinear dimension reduction. Proceedings of the 21st International Conference on Machine Learning, July 48, 2004, Banff, Alberta, Canada, pp: 441448.
16: Jian, Y., F. Alejandro, J.Y. Frangi, Z. David and J. Zhong, 2005. KPCA plus LDA: A complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE Trans. Pattern Anal. Mach. Intell., 27: 230244. CrossRef  Direct Link 
17: Kegl, B., A. Krzyzak, T. Linder and K. Zeger, 2000. Learning and design of principal curves. IEEE Trans. Pattern Anal. Mach. Intell., 22: 281297. CrossRef  Direct Link 
18: Li, S.Z., X. Lv and H. Zhang, 2001. Viewsubspace analysis of multiview face patterns. Proceedings of the ICCV Workshop on Recognition, Analysis and Tracking of Faces and Gestures in RealTime Systems, July 13, 2001, Vancouver, Canada, pp: 125132.
19: Li, S., Y. YongFang, J. XiaoYuan, S. ZhuLi, Z. David and Y. JingYu, 2008. Nonlinear DCT discriminant feature extraction with generalized KDCV for face recognition. Proceedings of the 2nd International Symposium on Intelligent Information Technology Application, December 2022, 2008, TBA Shanghai, China, pp: 338341.
20: Mao, J. and A.K. Jain, 1995. Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Networks, 6: 296317. CrossRef  Direct Link 
21: Niskanen, M. and O. Silvn, 2003. Comparison of dimensionality reduction methods for wood surface inspection. Proceedings of the 6th International Conference on Quality Control by Artificial Vision, Gatlinburg, May 1923, 2003, Tennessee, USA., pp: 178188.
22: Sim, T., S. Baker and M. Bast, 2002. The CMU pose, illumination and expression (PIE) database. Proceedings of the International Conference on Automatic Face and Gesture Recognition, May 2021, 2002, Washington, DC., USA., pp: 5358.
23: Smola, A.J., S. Mika, B. Scholkopf and R.C. Williamson, 2001. Regularized principal manifolds. J. Mach. Learn Res., 1: 179209. CrossRef 
24: Tibshirani, R., 1992. Principal curves revisited. Stat Comput., 2: 183190. Direct Link 
25: Yang, M.H., 2002. Face recognition using extended isomap. Proceedings of the International Conference on Image Processing, August 1821, 2002, Guangzhou, pp: 117120.
26: Zhang, J., S.Z. Li and J. Wang, 2004. Nearest manifold approach for face recognition. Proceedings of the 6th International Conference on Automatic Face and Gesture Recognition, May 1719, 2004, IEEE Computer Society Washington, DC., USA., pp: 223228.
27: Sammon, Jr. J.W., 1969. A nonlinear mapping for data structure analysis. Trans. Comput., 18: 401409. CrossRef 



