One common machine vision application is to teach a computer to discriminate some dataset automatically, to save the man-hours of boredom attributed to these tasks. Classification of human posture is a very challenging problem. The importance of human posture classification is evident by the increasing requirement of machines to be able to interact intelligently and effortlessly with a human inhabited environment. Past attempts reported in literature to recognize human actions by machines require intrusive devices that limit the scope of their applications to situations where people specifically intend to communicate with computers[1,2]. Some major early works involved the use of Moving Light Display (MLD) on subjects in a darkened room and Structure from Motion (SFM) techniques was where a 3 dimensional model of the person is reconstructed to recognize human action. MLD was a useful experiment but very intrusive. SFM technique was more complex and computationally expensive. This study reports the preliminary research conducted to identify simple human posture using computationally simple techniques that does not require the use of intrusive devices. This is the first step towards developing applications in real time surveillance, pedestrian detection and gait recognition that will take the capability of machines into understanding human domain. To improve machine capabilities in real time e.g., surveillance, activity recognition, human interaction with machines, it is desirable to represent human shapes from very high dimensional space to a low dimensional space without loss of true shape characteristics. A two-category pattern recognition task involving standing and sitting postures were chosen to create a supposedly simple task of discrimination by machine vision methods. The eigenspace transform was chosen as the desired method for discrimination. Utilization of the eigenspace which we named 'eigenposture' required special formatting of the images.
MATERIALS AND METHODS
Computational models of human posture recognition, in particular, are interesting
because they can contribute not only to theoretical insights but also practical
applications. Computers that possess the ability to recognize human shapes could
be applied to a wide variety of problems, including intruders alert, security
systems, sports-science studies and medical gait applications. The database
of images is capture during the day under normal office lighting system. Our
approach treats the human shape problem as a 2-D classification problem.
|| Overview of the overall system
The system functions by projecting human posture images onto a feature space
that spans the significant variations among known human posture images. The
significant features we called as 'eigenposture' since they are the eigenvectors
(principal components) of the set of postures; they do not necessary correspond
to feature such as head, body, or torso. The projection operation characterized
a human shape by a weighted sum of the eigenposture features and so to recognize
a particular posture it is necessary only to compare these weights to those
of known postures. Some particular advantages of our approach are that it provides
for the ability to learn and later classify new postures in an unsupervised
manner, making it is easy to implement using a neural network architecture.
System Overview: Figure 1 depicts an overview of the overall system. Basically it consists of the following steps: segmentation, feature extraction and finally classification.
The segmentation stage extracts the silhouette of a person using the binary image extraction process which consists of background differencing followed by thresholding to obtain a binary mask of the foreground region. In order to remove noise, median filtering and morphological operations are used. Next, the feature extraction component functions by projecting the training images onto a feature space that spans the significant variations among known images. The significant features which we termed as 'eigenposture' are the eigenvectors (principal components) of the set of images that serves as inputs to the various classifiers that were chosen in this study.
The Principal Component Analysis (PCA) is a useful statistic technique that has found numerous applications in fields such as recognition, classification and image data compression. It is also a common technique in extracting features from data in a high dimensional space. This quality makes it an interesting tool for our study. It is basically a systematic method to reduce data dimensionality of the input space by projecting the data from a correlated high-dimensional space to an uncorrelated low- dimensional space. PCA uses the eigenvalues and eigenvectors generated by the correlation matrix to rotate the original data coordinates along the direction of maximum variance. When ordering the eigenvalues and their corresponding eigenvectors, or principal components, in decreasing order of magnitude, the first principal component accounts for the largest variance in the original training data set, the second orthogonal principal component for the largest remaining variance and so forth. Several techniques from numerical analysis have been suggested to efficiently compute Principal Components[5,6]. The most popular method and that we adapt in this study is based on results from the matrix theory, namely the Singular Value Decomposition (SVD), which is relevant to PCA in several aspects.
For the training set matrix X, of dimension N x p and rank r, it can be rewritten using SVD as
where, U is an orthogonal N x r matrix, V is an orthogonal p x r matrix with
the eigenvectors (e1, e2ÿ. er) and S is
r x r diagonal matrix containing the square roots of the eigenvalues of the
correlation matrix XT X and hence the variances of the Principal
Components. The r eigenvectors, i.e. Principal Components of matrix V, form
an orthogonal basis that spans a new vector space, called the feature-space.
Thus, each vector i,j can be projected to a single point in this
r-dimensional feature space. However, according to the theory of PCA for highly
correlated data, each training set vector can be approximated by taking only
the first few k, where, k ≤ r, Principal Components (e1,e2ÿ.
ek). By linearly transforming the images into eigenspace, we project
the images into a new N Dimensional space, which exhibits the properties of
the samples most clearly along the coordinate axes. The most significant features
or information of the images will be in the first few principal components.
Principal component analysis has been applied in many fields.
Turk and Pentland  took the same approach to extract features
from faces. Here, we try to apply it to human shape silhouette images.
||Some human shape silhouettes used as the training data
||Five of the eigenpostures calculated from the input image
In simple term, relevant information is extracted from human silhouettes, encode
it as efficiently as possible and compare one human shape silhouette with a
database of models encoded similarly. To extract the information contained in
an image of a human shape is to somehow capture the variation in a collection
of human shapes, independent of any judgement of features, and use this information
to encode and compare new human shape and classify it. In mathematical terms,
we find the principal components of the distribution of human shapes, or the
eigenvectors of the covariance matrix of the set of human shape images, treating
an image as a vector in a very high dimensional space. These eigenposture can
be thought of as a set of features that together characterize the variation
between human shape images. Each image location contributes more or less to
each eigenvector. Figer. 2 depict selected samples images of human postures
whilst Fig. 3 presents the regenerated images of human postures
using the five selected eigenvectors, accordingly.
The eigenposture approach includes the following steps:
||Build feature space: calculate the eigenvectors from the covariance
matrix of the training set, keeping only the first k eigenvectors that correspond
to the highest eigenvalues. These k eigenvectors define the feature space,
or the eigenspace.
||Project a new human shape silhouette image into feature space: calculate
a set of weights based on the k-dimensional feature space and the
new image by projecting its image onto the Eigen space. This is done by
subtracting the mean of the training images and some simple projection.
Assume P is the new human shape image to be projected into the eigen space,
we need to first center the image by subtracting the mean of the training
||The vector w, which is also an N dimensional vector, can be
seen as the new encoding of the image in the eigen space. V is the corresponding
orthogonal p x r matrix with the eigenvectors (e 1, e 2ÿ.
e r) from the training images (X).
||Perform eigeposture training by using the eigenvectors/eigenposture as
training inputs to our classifiers for classification purpose.
Classification is a Pattern Recognition (PR) problem of assigning an object to a class. Thus the output of the PR system is an integer label. The task of the classifier is to partition the feature space into class-labeled decision regions in this case involving a two-categorical problem. For our classification purposes, three types of classifiers are used, Probabilistic Neural Network (PNN), Multilayer Perceptron (MLP) and Nearest Neighbour Classifier.
Probabilistic Neural Network (PNN): Probabilistic neural networks can
be used for classification problems. A PNN approximates the probability density
function of the training examples presented. The architecture of a PNN is as
follows: A PNN consists of three layers after the input layer. The first layer
is the so called pattern layer. Each training example has a pattern node; the
pattern node forms a product of the weight vector and the given example for
classification. When an input is presented, this layer computes the distances
from the input vector to the training input. The outcome is a vector that indicates
how close the input is to a training input. After that, the product is passed
through the activation function to the next layer, the summation layer. A summation
node receives the outputs from the pattern nodes associated with a particular
class, so each class has a particular summation node. The contributions for
each class are summed up; the outcome is a vector of probabilities. The last
layer is the output layer, here a compete transfer function on the output of
the second layer picks the maximum of these probabilities. The outcome is a
classification decision in binary form; a 1 is produced for the class that an
input is assigned to and a 0 for all the other classes. A PNN is guaranteed
to converge to a Bayesian classifier when it is given enough training data.
These networks generalize well.
Multilayer Perceptron (MLP): A multilayer Perceptron (MLP) neural network
is an extremely popular and widely documented architecture. It is a good tool
for classification purposes. It can approximate almost any regularity between
its input and output. The neural network weights are adjusted by supervised
training procedure called backpropagation.
|| 2-D plot of eigenpostures selected accordingly as in Table
Back propagation is a kind of the gradient descent method, which search an
acceptable local minimum in the neural network weight space in order to achieve
minimal error. MLP composes of an input, output and one or more hidden layers.
With the exception of the input layer, all layers compute their output with
a weighted output formula, an optional bias and an activation function. For
this study, only a single hidden node is used.
Nearest Neighbour Classifier (NN): The nearest neighbours algorithm is extremely simple yet rather powerful, and used in many applications. Among the various methods of supervised statistical pattern recognition, the Nearest Neighbour rule achieves consistently high performance, without a priori assumptions about the distributions from which the training examples are drawn. It involves a training set of both positive and negative cases. A new sample is classified by calculating the distance to the nearest training case; the sign of that point then determines the classification of the sample. For our case, there are only two possible output classes.
In order to develop a classification system, we first collect a data set of
60 images of human postures as our training data to generate the eigenpostures
and another 100 unseen images of human postures (50 each for sanding and non
standing) for testing. This training and testing human postures consists of
human standing and non standing (facing front and side view). In this work,
for each of the images, we reshaped them into a row vector of 120000 elements.
Then, the eigenvectors and eigenvalues are computed according to (1). The five
most significant eigenvectors are selected. In this work, we consider different
combinations of two eigenvectors in our experiment for classification. We evaluated
the use of different types of classifiers (MLP, PNN and NN). Each classifier
was trained to classify the human posture of standing and non standing. A combination
of these most five significant eigenpostures of the training images serves as
inputs to each classifier. For each experiment a combination of two eigenpostures
are selected as in Table 1.
The 2-D plot of eigenpostures selected are shown in Fig. 4a-j,
respectively. It can be seen the possibility of linearly classifying the two
postures of human using eigenpostures selected in Experiment 1, that are first
and second eigenpostures and Experiment 6, second and fourth eigenpostures.
|| Selection of eigenpostures for Classification Performance
|| Summary and Comparison of Classifiers
To verify the performance of classifiers, these ten combination of eigenpostures
serves as training and testing sets to all classifiers.
The three classifiers performance are as in Table 2. For standing posture, the combination of second and fourth eigenpostures give the highest classification results for both NN and MLP classifiers and second highest for PNN. As for non standing, the best classification for all three classifiers is from these combination of eigenpostures too.
A method for human body posture classification based on eigenvectors analysis is presented. The ability of using eigenpostures for classification of two main human posture has been demonstrated, but the ability to classify some other posture such as bending and laying has yet to be quantified. As can be seen from the experimental results, eigenspace technique can be used for human posture classification with high degree of accuracy. All three classifiers selected for this study show surprisingly strong generalization behaviour, to such an extend that over 90% of the unseen data was correctly classified in Case 6. This suggests that the ability of eigenspace can be extended for application in posture recognition because these can contribute to a wide variety of applications, such as security systems, intruder's alertness, gait analysis, action recognition, human computer interaction, action recognition for surveillance applications and tracking techniques for video coding and image displays. The process of human shape recognition and classification looks promising.
This study was supported by MOSTI under the IRPA Grant No. 03-02-02-0017- SR0003/07-03. The authors also acknowledge Prof. Dr. Burhanuddin Yeop Majlis as the Programme Head and UiTM for the UiTM-JPA SLAB scholarship awards.