Detection of Pose and Scale Variant Human Faces in Color and Gray Images

Park, Chang-Woo; Lee, Taegun; Kim, Young-Ouk; Sung, Ha-Gyeong

Research Article

Detection of Pose and Scale Variant Human Faces in Color and Gray Images

Chang-Woo Park
Korea Electronics Technology Institute, 401-402 B/D 193 Yakdae-Dong, Wonmi-Gu, Puchon-Si, Kyunggi-Do, 420-140, Korea

Taegun Lee
Korea Electronics Technology Institute, 401-402 B/D 193 Yakdae-Dong, Wonmi-Gu, Puchon-Si, Kyunggi-Do, 420-140, Korea

Young-Ouk Kim
Korea Electronics Technology Institute, 401-402 B/D 193 Yakdae-Dong, Wonmi-Gu, Puchon-Si, Kyunggi-Do, 420-140, Korea

Ha-Gyeong Sung
Korea Electronics Technology Institute, 401-402 B/D 193 Yakdae-Dong, Wonmi-Gu, Puchon-Si, Kyunggi-Do, 420-140, Korea

ABSTRACT

This study present an effective method of face and facial feature detection under pose variation of user face in complex background. The approach is a flexible method that can be performed in both color and gray facial image and is also feasible for detecting facial features in quasi real-time. Based on the characteristics of the intensity of neighborhood area of facial features, new directional template for facial feature is defined. From applying this template to input facial image, novel Edge-like Blob Map (EBM) with multiple strength intensity is constructed. Regardless of color information of input image, using this map and conditions for facial characteristics, it is shown that the locations of face and its features-i.e., two eyes and a mouth-can be successfully estimated. Without the information of facial area boundary, final candidate face region is determined by both obtained locations of facial features and weighted correlation values with standard facial templates. Experimental results from many color images and well-known gray level face database images authorize the usefulness of proposed algorithm.

PDF Abstract XML References Citation

INTRODUCTION

The intelligent service robot has emerged as one of the recent trends in the research of robotics. One of the important functions of this service robot is the realization of the Human-Robot Interaction (HRI). The facial image interface in HRI, that uses a camera, requires only the minimal amount of user cooperation, meanwhile, is capable of acquiring sufficient information about people. So, HRI using facial image is also suitable for realizing high level functions of interaction such as emotional communication between the human and the robot. For this to work, user face and facial expression recognition play a vital role and these require face detection technology and a precise estimation method of facial features’ location. This study proposes a new detection method of face location and the facial features estimation methods that are suitable for HRI.

Previous face detection research^[1-4] mostly use a fixed camera; however, the face detection technology for HRI has its own unique properties in its embodiment. First, the face in the acquired image has significant pose variation, due to the robot platform’s mobility. Second, the user’s face image that is located in the surroundings of robot includes a complex background and a significant amount of illumination changes. Third, when the environment circumstances surrounding the robot changes spatially as time goes by, the detection must be done in quasi-real time in order to locate the user instantaneously.

To overcome such limitations, this study proposes a new method of face and feature detection. First, the study will propose a novel directional template for effective estimation of the locations of facial features; such as the two eyes and mouth. This template will be applied to the input image to propose a new edge-like blob map with multiple intensity strengths. This approach will show that this edge-like blob map is robust in both facial pose and some illumination changes relatively and that it is also capable of estimating detailed locations of the facial features even without rough information on the facial area. Using this approach, the face image acquired through the camera becomes independent to whether it is a color or gray image. Therefore, a more flexible method of face detection will be possible. This study will show the appropriateness of the proposed method through experiments using the well-known gray-level database of facial images and various color images.

The main goal of face detection is locating position of face in an uncontrolled environment. Previous approaches^[1,2,5] limit their research goal to enhancing the detection performance as an individual process of face detection. In our approach, meanwhile, face detection process is taken as a prior step of face recognition or expression recognition, considered from a viewpoint of entire Human-Robot Interactions process. For this reason, locating facial features in face area is also a necessary task as well as detecting face in uncontrolled environments. Both steps of detecting face and locating its feature precisely in variant pose may be required to perform as fast as possible, in a quasi-real time.

In many research works related to facial features detection, the use of facial color characteristics^[6] is a typical recent approach for pose-variant facial image. In the above study, using facial color information with lighting compensation technique, the chromatic maps of facial features are created. From the specific color properties of each features-eyes and mouth-in these maps, their locations can be estimated in facial area of these maps. But in our experiments, for the case of moderate or lower quality of color chrominance images, it is difficult to clearly distinguish facial features from each chromatic map of eye or mouth. Moreover, if facial area is tiny or the eye (mouth) features in face are small, facial features can be concealed in facial color region and cannot be found easily.

On the other hand, in these cases of unclearness of facial features, the detail shapes of them can be effectively presented in the edge image that is created by 2D differential mask operations. Many approaches use edge information for feature detection and several related research works are proposed recently, like the various type of ‘edge map’ images. For example, method of using Edge Orientation Map (EOM) information that can parameterize the local features in facial area^[7] and Line Edge Map (LEM) are defined and applied to recognition of obtained facial area images^[8]. However, these researches compose edge maps which are determined on the bases of frontal face edge figures and their similarity measures are computed from these frontal normal maps. Therefore, in the case of pose variant or unknown viewpoint facial images, correct detection rate is considerably decreased.

Face detection in color and gray images: We will present our method for detecting face and its feature locations. One of the characteristics of our method is that this algorithm can be adapted to gray image as well as color image inputs. According to the image type, additional step for preprocessing facial image is included so that facial features can be detected more effectively.

Estimation of candidate face region for color images: If input image type from camera is color image, the preprocessing step using color information can be carried out. The probable location of candidate facial region is determined by chromatic property of facial color. This step effectively reduces searching range of faces at obtained image so that facial feature detection may be accomplished in quasi-real time. According to race difference, individual peculiar skin tone and different light conditions, the skin color appearance is widely varied. For effective segmentation of facial characteristic colors under various change of facial color in different light conditions, we use YcbCr color space transform^[3,9]. As shown in Fig. 1, when facial color pixel distribution is transformed from RGB to YCbCr space, facial color pixels in YCbCr space are more clustered rather than those in RGB space. So, the bounds between facial color pixels and non-facial color pixels become clear, it is easy to distinguish facial color regions from obtained color image^[9]. In YCbCr color space, from the simple boundary conditions of facial color pixels, binary facial-color image is created as in Fig. 2.

However, it may not be supposed that these binary regions are all candidates of facial regions. Even though certain region is a part of white area of binary image, too small or large a region considered from facial area size is not suitable region of candidate face. Due to the difference of light reflecting effect with varying poses of face, some regions in facial area can be excluded that may contain important facial information like an eye or mouth region (Fig. 2).

To treat these problems effectively, we use hierarchical transformed image and modified morphology technique for obtaining candidate face regions, which can be adapted to various scale of face area and select probable regions pliably.

To begin with, binary facial image is reduced hierarchically. Pyramid image is used with Haar Wavelet Transform technique, the simplest wavelet kernel, for fast computation of image transform. From these reduced binary images, suitable face candidate regions can be effectively found by small number of scale variations, rather than be searched in original size binary image with all scale changes^[3,9].

For low-frequency region of above wavelet transformed image, pliable scaled regions of candidate faces are determined with morphology technique. Previous method for scale changes of face^[9] may find wide-range scales of face areas accurately, but needs a lot of computational burden. In our specific case that the precise scales of face regions are not required in preprocessing step of coarse detection, modified morphology technique is adopted for finding ample and approximate regions of face. That is, using modified closing operation (Eq. 1), pliable and spacious region that contains some area around facial color regions is made up for probable range of face.


Fig. 1:	Facial color distribution in the color space and its projection plane (1st row: RGB space, 2nd row: YCbCr space)


Fig. 2:	Original color images and binary image from skin color characteristics


Fig. 3:	A preprocessed image for locating candidate face regions
(a) original image (b) binary image by facial color (c) tolerant region of candidate face location by modified closing operation

More details in morphological operations can be referred to^[10]. After this operations, as in Fig. 3, variable geometric area (white area in Fig. 3) is obtained, which does not limit face region as ellipse shape and including near facial area to probable regions tolerantly.

Image for - Detection of Pose and Scale Variant Human Faces in Color and Gray Images

(1)

Edge-like blob map for facial feature detection: Regardless of color type of input image in above step, following four steps are common processes for face and facial feature detection. In this section, we present a method for selecting probable blob regions for facial features using newly proposed edge-like blob map.

In human face, several biometric facial features exist such as two eyes, eyebrows, nose, nose tip, mouth, ears, etc. Among these features, eyes are the most prominent features and most important clue in many face detection and recognition researches^[11-13]. Mouth is also salient feature, but has very changeable shape than any other facial features, so it is difficult to modeling mouth by general geometric descriptors of biometric features. A few related studies that attempt to model mouth features are introduced. There is a research^[4] that two horizontal end tips of mouth are found in high resolution image and used for prior knowledge of face recognition procedures. Other approach^[6] uses color chromaticity that red pixels are frequent near mouth, to locate mouth region.

We present a novel approach that uses gray intensity of facial features, irrespective of their color characteristics. Two eyes have such an intensity property that the eye region is darker than near facial area. Ordinary near-eye region has distinctive shape of intensity. That is, the width of small darker intensity area of eye is longer than the height of area; such shape is just similar to horizontal ‘blob’ edge.

Near-mouth region also has some darker intensity area like near-eye region. Despite the variety of mouth’s region size, darker regions of mouth are not generally different from those of eyes.


Fig. 4:	Example of directional template and the intensity differences of 8-directions from center pixel of template

Besides, horizontal blob edge is more distinct around the mouth region. Therefore, considering these intensity characteristics of facial features, we propose the new ‘directional blob template’ and estimate candidate locations of features.

Template size is determined according to facial area size, which is intended as appropriate area to detect in this method, as big as the eye blob region. Width of template is larger than height like as Fig. 4.

At the pixel P(x, y) in obtained intensity image (size: WxH) the center pixel P_cent(x_c, y_c) is defined as the pixel at which the template is placed. From this pixel P_cent, average intensity (Eq. 2,3) of eight-directions of feature template (size: h_FFxw_FF) are obtained and intensity differences between and I_cent (intensity value of P_cent) are also found as Eq. 4. Example of directional template of facial features is shown in Fig. 4. The intensity value that has most large magnitude of intensity gradient is defined as principal intensity I_pr (Eq. 5).

(2)

(3)

(4)

(5)

Now, using principal intensity value I_Pr, edge-like blob map with multiple strength intensity is created as follows. For all locations of pixel P(x, y) in the entire obtained intensity image, the masking operation with above directional template is applied to intensity image. Using a threshold value that is weighted on principal intensity I_Pr multiple intensity strength of each pixel in entire image is determined. For intensity difference ΔI_width of both sides of horizontal direction at a pixel P(x, y) if certain pixel intensity is larger than the α_Pr-w eighted threshold (Eq. 6), +1 level intensity strength is assigned. Next, for the vertical direction; if pixel intensity is larger than the β_Pr-weighted threshold (Eq. 6), then +1 level edge strength is assigned, again. Similarly, for two diagonal directions at a P(x, y) (Fig. 4); if pixel intensity is larger than the γ_Pr-weighted threshold then +1 level edge strength is assigned also. From this process, the entire gray intensity image is converted into edge-like blob map image that has different 4-level intensity strengths. Most bright edge-like blob pixels have its intensity level of level +4. Each intensity value of strength level is defined as 40, 80, 120 and 200. Figure 5c is a negative image of edge-like blob map for clearly showing differences of edge strengths rather than original blob map image (Fig. 5b).

(6)

Finding eye pairs: From the previously obtained edge-like blob map, eye analogue blob regions are marked and all probable eye-pair region sets are selected.

The eye-like region has more dark intensity property than other feature regions i.e. mouth. So, we choose level 4 edge strength pixels only for candidate eye pixels. Above all, multiple intensity level blobs are divided into level 4 and level 1-3 and some properties of each small region (blob) is acquired from component labeling technique^[9]. Basic geometric conditions (Eq. 7) are applied to all candidate eye blob regions, suitable eye blobs are marked only. If the width and height of bounding rectangle of eye analogue blobs (width_E.B., height_E.B.) is smaller than 1.5 times either the width or height of previous feature template, except too noisy area (area_E.B. is below 6 pixels), these blobs are selected for candidate eyes.

(7)

All probable eye-pairs are composed from above selected eye blobs and only candidate eye pairs are selected according to whether facial geometric conditions could be satisfied. As shown in Fig. 6, length of eye pair distance, direction of eye-pair constructed vector and ratio of two eye regions are considered. Based on the area size of detected face, suitable eye-pairs are chosen.

Also, considering some intensity properties near eye-pair blobs, the numbers of appropriate pairs of eye blobs can be more fallen off. That is, (1) ratio of two eye blob intensities; (2) intensity property of two eyes and the middle of the forehead; and (3) brightness pattern near eye blob regions. If those conditions are satisfied, very analogous eye-pairs can be only chosen. Through both the linear scaling for eye patch region (Fig. 7) and histogram equalization, intensity properties of eye pairs may be robustly obtained. Figure 7 shows an example of candidate eye-pair patch region.

Estimation of mouth location: For the case of varying pose of face, if facial bounding area is precisely cut off, this area shape is quite a variable form of rectangular area. Due to this reason, estimating the location of mouth as well as eye-pair locations is also needed for detecting precise facial region. The normalization of this precisely selected facial region means a fixed standard face image patch, so this can be effective in increasing the rate of correct face recognition, for future work.

Previously presented edge-like blob for eye pair detection is also effective in estimating the mouth location that is quite a variable shape of facial feature. As the eye blobs, the blobs near the mouth area have darker intensity to an extent.


Fig. 5:	Edge-like blob map from original gray image and its negative image


Fig. 6:	Facial geometric conditions for eye-pair


Fig. 7:	Eye-pair detection example (original and enlarged eye patches with histogram equalization)


Fig. 8:	Various shape of edge-like blob regions near the mouth area

Owing to the variety of mouth feature’s shape, edge strengths of mouth in edge-like blob map are not sufficiently prominent rather than those of eyes. On the other hand, the horizontal narrow edge is more distinctive around mouth region (Fig. 8). Therefore, estimation of mouth location is carried out by obtaining multiple probable locations that have strong and suitable horizontal edges. And several candidate facial regions are decided by both pre-obtained eye pairs and above probable locations of mouth. About these facial regions, the analysis of facial region similarity is performed in next section.

For the case of eye-pairs selection, we chose level 4 edge strength pixels only. In around the mouth region, as in Fig. 8, relatively weak and narrow edges that their strength is below level 3 are often founded rather than the eye regions. However, these weak edges are frequently parallel to the direction of eye pair vector. So candidate locations of mouth are estimated as the positions where narrow and fairly strength edges exist in eye pair vector directions. Summary of estimation of probable locations of mouth is as follows.

1.	Determine normal vector to mouth locations on basis of eye pair vector,
2.	Form the area range of probable mouth locations (including from nose tip to jaw),
3.	Rotation of this area and normalization,
4.	Selecting probable locations at vertical direction if edge pixels that is above level 2 strength are larger than prescribed threshold at horizontal direction,
5.	Converting this area to vector image and component labeling for selected locations in 4),
6.	Determine candidate locations of mouth, if location blob thickness is below the prescribed threshold on basis of distance between eye pair,
7.	Choose up to three locations from the bottom (Fig. 9)

Detection of face region and localization of facial features: From the pre-obtained multiple candidate facial areas that are acquired by eye-pair and mouth location from previous sections, final location of face and its features are determined. The similarity measure between candidate area and predefined template of standard face patch is calculated, with weighted correlation technique.

At first, normalized form of candidate facial area must be prepared for measuring similarity with predefined template. The preparing step is as follows: For the pre-obtained rectangular area which includes two eyes and a mouth, basic width is determined as in Fig. 9. Basic height length is decided according to both eye-mouth distance and basic width length. This variable rectangle of obtained facial area is ‘two-stage’ scaled to fixed square patch of size 60*60 pixels. Although two-stage scale processes are performed, locations of two eyes and a mouth are always placed on fixed positions. (Fig. 10) For each eye-pair location, maximum three candidate facial areas are obtained according to multiple mouth locations and their normalized square areas are compared to standard face templates. Similarity measure between area and templates is based on basic form of correlation equations, Eq. 8. As shown in Fig. 10c, weighted region of main facial features is defined (circular region of dotted figure) also. The modified correlations are computed with weighting values at above region.

(8)

where, I_FD: obtained facial area, I_tmpl: face templates

The number of predefined standard facial templates is from 5 to 25. Even though the number of templates is increased, the required computational burden may not grow significantly. So we adopt 20 templates for improving correctness of detection tasks. Meanwhile, in the square regions in both candidate facial areas and standard templates, non-facial part is unavoidably included, e.g., at the corner area of the patch. Using the fillet mask, some pixels of non-facial area in the normalized image patch is removed (Fig. 11) and also histogram equalization is performed on patches. For these normalized face image patches, the region that gets maximum average correlations with all standard facial templates means most likely face region and is determined for final face region. Three fixed positions in face patch are also determined as final facial feature’s locations which correspond to pixel positions in entire obtained image. These corresponding positions may construct a features’ triangle of various configurations. Final facial features' triangle and face region patch image are shown in Fig. 12.

Experimental results: To present more practical results at various facial images, BioID face database^[14] is adopted. In recent researches^[12,13] of face detection, BioID face database is advantageous for describing more real environments of world. And this is also a head and shoulder images in complex backgrounds, so suits our target task, human-robot interaction. This DB consists of 1521 images (384*288 pixels, grayscale) of 23 different persons and has been collected during some sessions at different places and dates. This facial images contain quite a fairly degree of changes of face scales, pose variations, illuminations and backgrounds.


Fig. 9:	Summary of estimating probable mouth locations


Fig. 10:	Two step scaling process of rectangular facial region


Fig. 11:	Some face templates for modified correlation


Fig. 12:	Detection result of face area and feature locations


Fig. 13:	Some examples of detection results in Test set #2-BioID face database


Fig. 14:	Some examples of correct and erroneous cases in Test set #2

For a test set of BioID face database, entire facial image is converted to edge-like blob map and probable locations of facial features are founded like as several examples in Fig. 13. Some results of both successful and typical erroneous examples are shown in Fig. 14.

CONCLUSION

We have presented user face detection and its feature detection method for human robot interaction, with mobile robot platform. Successful face detection can be achieved in complex backgrounds and additional estimation of facial feature locations is also possible, irrespective of some pose variation of user face. Especially, this method can be applied in gray images as well as color images. Owing to this property of pliable type for input image, various input images can be used and also evaluated in widely used face database.

REFERENCES

Nikolaidis, A. and I. Pitas, 2000. Facial feature extraction and pose determination. Pattern Recog., 33: 1783-1791.
CrossRef Direct Link
Yang, M.N., N. Ahuja and D. Kriegman, 2000. Mixtures of linear subspaces for face detection. Proceedings of 4th International Conference on Automatic Face and Gesture Recognition, Mar. 28-30, Grenoble, France, pp: 70-76.
Direct Link
Garcia, C. and G. Tziritas, 1999. Face detection using quantized skin color regions merging and wavelet packet analysis. IEEE Trans. Multimedia, 1: 264-277.
CrossRef
Yingjie, W., C. Chin-Seng and H. Yeong-Khing, 2002. Facial feature detection and face recognition from 2D and 3D images. Pattern Recog. Lett., 23: 1191-1202.
CrossRef Direct Link
Rowley, H.A., S. Baluja and T. Kanade, 1998. Neural network-based face detection. Pattern Anal. Mach. Intell., 20: 23-28.
CrossRef
Hsu, R.L., M. Abdel-Mottaleb and A.K. Jain, 2002. Face detection in color images. IEEE Trans. Pattern Anal. Mach. Intell., 24: 696-706.
CrossRef Direct Link
Bernhard, F. and C. Kublbeck, 2002. Robust face detection at video frame rate based on edge orientation features. Proceedings of 5th International Conference on Automatic Face and Gesture Recognition, May 21-21, Washington, DC., USA., pp: 342-347.
Direct Link
Yongsheng, G., 2002. Leung, face recognition using line edge map. IEEE Trans. Pattern Anal. Mach. Intell., 24: 764-779.
CrossRef Direct Link
Taigun, L., P. Sung-Kee and K. Munsang, 2002. Face detection and recognition with multiple appearance models for mobile robot application. Proceedinggs of the International Conference on Computer Applications in Shipbuilding, Oct. 16-19, Jeonbuk, Korea, pp: 215-218.
Jain, A.K., 1989. Fundamental of Digital Image Processing. 5th Edn., Prentice-Hall, Englewood Cliffs, NJ., ISBN: 9780133361650, Pages: 569.
Yang, M.H., D. Kriegman and N. Ahuja, 2002. Detecting face in images: Survey. IEEE Trans. Pattern Anal. Mach. Intell., 24: 33-58.
Zhou, Z. and X. Geng, 2004. Projection functions for eye detection. Pattern Recog., 37: 1049-1056.
CrossRef
Jesorsky, O., K.J. Kirchberg and R.W. Frischholz, 2001. Robust face detection using the hausdorff distance. Proceedings of 3rd International Conference on Audio and Video Based Biometric Person Authentication, Jun. 6-8, Springer Verlag, London, UK., pp: 90-95.
Direct Link

Journal of Applied Sciences

Research Article

Detection of Pose and Scale Variant Human Faces in Color and Gray Images

ABSTRACT

How to cite this article

Search

INTRODUCTION

CONCLUSION

REFERENCES

Search

Leave a Comment