The intelligent service robot has emerged as one of the recent trends in the research of robotics. One of the important functions of this service robot is the realization of the Human-Robot Interaction (HRI). The facial image interface in HRI, that uses a camera, requires only the minimal amount of user cooperation, meanwhile, is capable of acquiring sufficient information about people. So, HRI using facial image is also suitable for realizing high level functions of interaction such as emotional communication between the human and the robot. For this to work, user face and facial expression recognition play a vital role and these require face detection technology and a precise estimation method of facial features location. This study proposes a new detection method of face location and the facial features estimation methods that are suitable for HRI.
Previous face detection research[1-4] mostly use a fixed camera; however, the face detection technology for HRI has its own unique properties in its embodiment. First, the face in the acquired image has significant pose variation, due to the robot platforms mobility. Second, the users face image that is located in the surroundings of robot includes a complex background and a significant amount of illumination changes. Third, when the environment circumstances surrounding the robot changes spatially as time goes by, the detection must be done in quasi-real time in order to locate the user instantaneously.
To overcome such limitations, this study proposes a new method of face and feature detection. First, the study will propose a novel directional template for effective estimation of the locations of facial features; such as the two eyes and mouth. This template will be applied to the input image to propose a new edge-like blob map with multiple intensity strengths. This approach will show that this edge-like blob map is robust in both facial pose and some illumination changes relatively and that it is also capable of estimating detailed locations of the facial features even without rough information on the facial area. Using this approach, the face image acquired through the camera becomes independent to whether it is a color or gray image. Therefore, a more flexible method of face detection will be possible. This study will show the appropriateness of the proposed method through experiments using the well-known gray-level database of facial images and various color images.
The main goal of face detection is locating position of face in an uncontrolled
environment. Previous approaches[1,2,5] limit their research goal
to enhancing the detection performance as an individual process of face detection.
In our approach, meanwhile, face detection process is taken as a prior step
of face recognition or expression recognition, considered from a viewpoint of
entire Human-Robot Interactions process. For this reason, locating facial features
in face area is also a necessary task as well as detecting face in uncontrolled
environments. Both steps of detecting face and locating its feature precisely
in variant pose may be required to perform as fast as possible, in a quasi-real
In many research works related to facial features detection, the use of facial color characteristics is a typical recent approach for pose-variant facial image. In the above study, using facial color information with lighting compensation technique, the chromatic maps of facial features are created. From the specific color properties of each features-eyes and mouth-in these maps, their locations can be estimated in facial area of these maps. But in our experiments, for the case of moderate or lower quality of color chrominance images, it is difficult to clearly distinguish facial features from each chromatic map of eye or mouth. Moreover, if facial area is tiny or the eye (mouth) features in face are small, facial features can be concealed in facial color region and cannot be found easily.
On the other hand, in these cases of unclearness of facial features, the detail shapes of them can be effectively presented in the edge image that is created by 2D differential mask operations. Many approaches use edge information for feature detection and several related research works are proposed recently, like the various type of edge map images. For example, method of using Edge Orientation Map (EOM) information that can parameterize the local features in facial area and Line Edge Map (LEM) are defined and applied to recognition of obtained facial area images. However, these researches compose edge maps which are determined on the bases of frontal face edge figures and their similarity measures are computed from these frontal normal maps. Therefore, in the case of pose variant or unknown viewpoint facial images, correct detection rate is considerably decreased.
Face detection in color and gray images: We will present our method for detecting face and its feature locations. One of the characteristics of our method is that this algorithm can be adapted to gray image as well as color image inputs. According to the image type, additional step for preprocessing facial image is included so that facial features can be detected more effectively.
Estimation of candidate face region for color images: If input image type from camera is color image, the preprocessing step using color information can be carried out. The probable location of candidate facial region is determined by chromatic property of facial color. This step effectively reduces searching range of faces at obtained image so that facial feature detection may be accomplished in quasi-real time. According to race difference, individual peculiar skin tone and different light conditions, the skin color appearance is widely varied. For effective segmentation of facial characteristic colors under various change of facial color in different light conditions, we use YcbCr color space transform[3,9]. As shown in Fig. 1, when facial color pixel distribution is transformed from RGB to YCbCr space, facial color pixels in YCbCr space are more clustered rather than those in RGB space. So, the bounds between facial color pixels and non-facial color pixels become clear, it is easy to distinguish facial color regions from obtained color image. In YCbCr color space, from the simple boundary conditions of facial color pixels, binary facial-color image is created as in Fig. 2.
However, it may not be supposed that these binary regions are all candidates of facial regions. Even though certain region is a part of white area of binary image, too small or large a region considered from facial area size is not suitable region of candidate face. Due to the difference of light reflecting effect with varying poses of face, some regions in facial area can be excluded that may contain important facial information like an eye or mouth region (Fig. 2).
To treat these problems effectively, we use hierarchical transformed image and modified morphology technique for obtaining candidate face regions, which can be adapted to various scale of face area and select probable regions pliably.
To begin with, binary facial image is reduced hierarchically. Pyramid image is used with Haar Wavelet Transform technique, the simplest wavelet kernel, for fast computation of image transform. From these reduced binary images, suitable face candidate regions can be effectively found by small number of scale variations, rather than be searched in original size binary image with all scale changes[3,9].
For low-frequency region of above wavelet transformed image, pliable scaled
regions of candidate faces are determined with morphology technique. Previous
method for scale changes of face may find wide-range scales of
face areas accurately, but needs a lot of computational burden. In our specific
case that the precise scales of face regions are not required in preprocessing
step of coarse detection, modified morphology technique is adopted for finding
ample and approximate regions of face. That is, using modified closing operation
(Eq. 1), pliable and spacious region that contains some area
around facial color regions is made up for probable range of face.
Facial color distribution in the color space and its projection
plane (1st row: RGB space, 2nd row: YCbCr space)
||Original color images and binary image from skin color characteristics
||A preprocessed image for locating candidate face regions
(a) original image (b) binary image by facial color (c) tolerant region of candidate face location by modified
More details in morphological operations can be referred to.
After this operations, as in Fig. 3, variable geometric area
(white area in Fig. 3) is obtained, which does not limit face
region as ellipse shape and including near facial area to probable regions tolerantly.
Edge-like blob map for facial feature detection: Regardless of color type of input image in above step, following four steps are common processes for face and facial feature detection. In this section, we present a method for selecting probable blob regions for facial features using newly proposed edge-like blob map.
In human face, several biometric facial features exist such as two eyes, eyebrows, nose, nose tip, mouth, ears, etc. Among these features, eyes are the most prominent features and most important clue in many face detection and recognition researches[11-13]. Mouth is also salient feature, but has very changeable shape than any other facial features, so it is difficult to modeling mouth by general geometric descriptors of biometric features. A few related studies that attempt to model mouth features are introduced. There is a research that two horizontal end tips of mouth are found in high resolution image and used for prior knowledge of face recognition procedures. Other approach uses color chromaticity that red pixels are frequent near mouth, to locate mouth region.
We present a novel approach that uses gray intensity of facial features, irrespective of their color characteristics. Two eyes have such an intensity property that the eye region is darker than near facial area. Ordinary near-eye region has distinctive shape of intensity. That is, the width of small darker intensity area of eye is longer than the height of area; such shape is just similar to horizontal blob edge.
Near-mouth region also has some darker intensity area like near-eye region.
Despite the variety of mouths region size, darker regions of mouth are
not generally different from those of eyes.
||Example of directional template and the intensity differences
of 8-directions from center pixel of template
Besides, horizontal blob edge is
more distinct around the mouth region. Therefore, considering these intensity characteristics of facial features,
we propose the new directional blob template and estimate candidate
locations of features.
Template size is determined according to facial area size, which is intended as appropriate area to detect in this method, as big as the eye blob region. Width of template is larger than height like as Fig. 4.
At the pixel P(x, y) in obtained intensity image (size: WxH) the center pixel
Pcent(xc, yc) is defined as the pixel at which
the template is placed. From this pixel Pcent, average intensity
(Eq. 2,3) of eight-directions of feature
template (size: hFFxwFF) are obtained and intensity differences
and Icent (intensity value of Pcent) are also found as
Eq. 4. Example of directional template of facial features
is shown in Fig. 4. The intensity value that has most large magnitude of intensity gradient is defined as principal intensity Ipr (Eq. 5).
Now, using principal intensity value IPr, edge-like blob map with
multiple strength intensity is created as follows. For all locations of pixel
P(x, y) in the entire obtained intensity image, the masking operation with above
directional template is applied to intensity image. Using a threshold value
that is weighted on principal intensity IPr multiple intensity strength
of each pixel in entire image is determined. For intensity difference ΔIwidth
of both sides of horizontal direction at a pixel P(x, y) if certain pixel intensity
is larger than the αPr-w eighted threshold (Eq.
6), +1 level intensity strength is assigned. Next, for the vertical direction;
if pixel intensity is larger than the βPr-weighted threshold
(Eq. 6), then +1 level edge strength is assigned, again. Similarly,
for two diagonal directions at a P(x, y) (Fig. 4); if pixel
intensity is larger than the γPr-weighted threshold then +1
level edge strength is assigned also. From this process, the entire gray intensity
image is converted into edge-like blob map image that has different 4-level
intensity strengths. Most bright edge-like blob pixels have its intensity level
of level +4. Each intensity value of strength level is defined as 40, 80, 120
and 200. Figure 5c is a negative image of edge-like blob map
for clearly showing differences of edge strengths rather than original blob
map image (Fig. 5b).
Finding eye pairs: From the previously obtained edge-like blob map, eye analogue blob regions are marked and all probable eye-pair region sets are selected.
The eye-like region has more dark intensity property than other feature regions i.e. mouth. So, we choose level 4 edge strength pixels only for candidate eye pixels. Above all, multiple intensity level blobs are divided into level 4 and level 1-3 and some properties of each small region (blob) is acquired from component labeling technique. Basic geometric conditions (Eq. 7) are applied to all candidate eye blob regions, suitable eye blobs are marked only. If the width and height of bounding rectangle of eye analogue blobs (widthE.B., heightE.B.) is smaller than 1.5 times either the width or height of previous feature template, except too noisy area (areaE.B. is below 6 pixels), these blobs are selected for candidate eyes.
All probable eye-pairs are composed from above selected eye blobs and only candidate eye pairs are selected according to whether facial geometric conditions could be satisfied. As shown in Fig. 6, length of eye pair distance, direction of eye-pair constructed vector and ratio of two eye regions are considered. Based on the area size of detected face, suitable eye-pairs are chosen.
Also, considering some intensity properties near eye-pair blobs, the numbers of appropriate pairs of eye blobs can be more fallen off. That is, (1) ratio of two eye blob intensities; (2) intensity property of two eyes and the middle of the forehead; and (3) brightness pattern near eye blob regions. If those conditions are satisfied, very analogous eye-pairs can be only chosen. Through both the linear scaling for eye patch region (Fig. 7) and histogram equalization, intensity properties of eye pairs may be robustly obtained. Figure 7 shows an example of candidate eye-pair patch region.
Estimation of mouth location: For the case of varying pose of face, if facial bounding area is precisely cut off, this area shape is quite a variable form of rectangular area. Due to this reason, estimating the location of mouth as well as eye-pair locations is also needed for detecting precise facial region. The normalization of this precisely selected facial region means a fixed standard face image patch, so this can be effective in increasing the rate of correct face recognition, for future work.
Previously presented edge-like blob for eye pair detection is also effective
in estimating the mouth location that is quite a variable shape of facial feature.
As the eye blobs, the blobs near the mouth area have darker intensity to an
||Edge-like blob map from original gray image and its negative
||Facial geometric conditions for eye-pair
||Eye-pair detection example (original and enlarged eye patches
with histogram equalization)
||Various shape of edge-like blob regions near the mouth area
Owing to the variety of mouth features shape, edge strengths of mouth
in edge-like blob map are not sufficiently prominent rather than those of eyes.
On the other hand, the horizontal narrow edge is more distinctive around mouth
region (Fig. 8). Therefore, estimation of mouth location is
carried out by obtaining multiple probable locations that have strong and suitable
horizontal edges. And several candidate facial regions are decided by both pre-obtained
eye pairs and above probable locations of mouth. About these facial regions,
the analysis of facial region similarity is performed in next section.
For the case of eye-pairs selection, we chose level 4 edge strength pixels
only. In around the mouth region, as in Fig. 8, relatively
weak and narrow edges that their strength is below level 3 are often founded
rather than the eye regions. However, these weak edges are frequently parallel
to the direction of eye pair vector. So candidate locations of mouth are estimated
as the positions where narrow and fairly strength edges exist in eye pair vector
directions. Summary of estimation of probable locations of mouth is as follows.
||Determine normal vector to mouth locations on basis of eye
||Form the area range of probable mouth locations (including from nose tip
||Rotation of this area and normalization,
||Selecting probable locations at vertical direction if edge pixels that
is above level 2 strength are larger than prescribed threshold at horizontal
||Converting this area to vector image and component labeling for selected
locations in 4),
||Determine candidate locations of mouth, if location blob thickness is
below the prescribed threshold on basis of distance between eye pair,
||Choose up to three locations from the bottom (Fig. 9)
Detection of face region and localization of facial features: From the pre-obtained multiple candidate facial areas that are acquired by eye-pair and mouth location from previous sections, final location of face and its features are determined. The similarity measure between candidate area and predefined template of standard face patch is calculated, with weighted correlation technique.
At first, normalized form of candidate facial area must be prepared for measuring
similarity with predefined template. The preparing step is as follows: For the
pre-obtained rectangular area which includes two eyes and a mouth, basic width
is determined as in Fig. 9. Basic height length is decided
according to both eye-mouth distance and basic width length. This variable rectangle
of obtained facial area is two-stage scaled to fixed square patch
of size 60*60 pixels. Although two-stage scale processes are performed, locations
of two eyes and a mouth are always placed on fixed positions. (Fig.
10) For each eye-pair location, maximum three candidate facial areas are
obtained according to multiple mouth locations and their normalized square areas
are compared to standard face templates. Similarity measure between area and
templates is based on basic form of correlation equations, Eq.
8. As shown in Fig. 10c, weighted region of main facial
features is defined (circular region of dotted figure) also. The modified correlations
are computed with weighting values at above region.
where, IFD: obtained facial area, Itmpl: face templates
The number of predefined standard facial templates is from 5 to 25. Even though the number of templates is increased, the required computational burden may not grow significantly. So we adopt 20 templates for improving correctness of detection tasks. Meanwhile, in the square regions in both candidate facial areas and standard templates, non-facial part is unavoidably included, e.g., at the corner area of the patch. Using the fillet mask, some pixels of non-facial area in the normalized image patch is removed (Fig. 11) and also histogram equalization is performed on patches. For these normalized face image patches, the region that gets maximum average correlations with all standard facial templates means most likely face region and is determined for final face region. Three fixed positions in face patch are also determined as final facial features locations which correspond to pixel positions in entire obtained image. These corresponding positions may construct a features triangle of various configurations. Final facial features' triangle and face region patch image are shown in Fig. 12.
Experimental results: To present more practical results at various facial
images, BioID face database is adopted. In recent researches[12,13]
of face detection, BioID face database is advantageous for describing more real
environments of world. And this is also a head and shoulder images in complex
backgrounds, so suits our target task, human-robot interaction. This DB consists
of 1521 images (384*288 pixels, grayscale) of 23 different persons and has been
collected during some sessions at different places and dates. This facial images
contain quite a fairly degree of changes of face scales, pose variations, illuminations
||Summary of estimating probable mouth locations
|| Two step scaling process of rectangular facial region
||Some face templates for modified correlation
|| Detection result of face area and feature locations
Some examples of detection results in Test set #2-BioID face
|| Some examples of correct and erroneous cases in Test set
For a test set of BioID face database, entire facial image is converted to
edge-like blob map and probable locations of facial features are founded like
as several examples in Fig. 13. Some results of both successful
and typical erroneous examples are shown in Fig. 14.
We have presented user face detection and its feature detection method for human robot interaction, with mobile robot platform. Successful face detection can be achieved in complex backgrounds and additional estimation of facial feature locations is also possible, irrespective of some pose variation of user face. Especially, this method can be applied in gray images as well as color images. Owing to this property of pliable type for input image, various input images can be used and also evaluated in widely used face database.