Abstract: The goal of this study is to build a system that detects and classifies the car objects amidst background clutter and mild occlusion. This study addresses the issues to classify objects of real-world images containing side views of cars with cluttered background with that of non-car images with natural scenes. The threshold technique with background subtraction is used to segment the background region to extract the object of interest. The background-segmented image with region of interest is divided into equal sized blocks of sub-images. The spectral texture features are extracted from each sub-block. The features of the objects are fed to the back-propagation neural classifier. Thus the performance of the neural classifier is compared with various categories of block size. Quantitative evaluation shows improved results of 85.5%. A critical evaluation of present approach under the proposed standards is presented.
INTRODUCTION
Object detection and classification are necessary components in an artificially intelligent autonomous system. Especially, object classification plays a major role in applications such as security systems, traffic surveillance system, target identification, etc. It is expected that these artificially intelligent autonomous system venture onto the street of the world, thus requiring detection and classification of car objects commonly found on the street. In reality, these classification systems face two types of problem. (i) Objects of same category with large variation in appearance. (ii) The objects with different viewing conditions like occlusion, complex background containing buildings, people, trees, road views, etc. This study tries to bring out the importance of the background elimination with statistical based feature extraction method of varying sub-block size for object classification. Since dynamic motion information is no longer usable for static images, background elimination becomes a more difficult task. Thus background removed and statistical features of squared sub-blocks of the images are fed to the neural classifier. The objects of interest being a car and non-car images are classified.
Image understanding is a major area where researchers design computational systems that can identify and classify objects automatically. Identification and classification of vehicles has been a focus of investigation over last decades (Hsieh et al., 2006; Shan et al., 2005; Sun et al., 2006). A new approach to object detection that makes use of a sparse, part-based representation is proposed by Agarwal et al. (2004). This study gives very promising results in the detection of vehicles from a group of non-vehicle category of natural scenes. Nagarajan and Balasubramanie (2007) have proposed their study based on wavelet features towards object classification with cluttered background. Nagarajan and Balasubramanie (2008a, b) have presented their study based on moment invariant features and statistical features to classify the objects with mild occlusion and complex background. Jong-Hun and William (1991) integrated spectral and textural information in the classification of arbitrary shape and size. Papageorgiou and Poggio (2000) utilized appropriate global statistical features for classification to detect the car objects. The advantage of such approach is that it has some self-learning ability. Zhang and Marszalek (2006) demonstrate that image representation based on distributions of local features are effective for classification of texture and object images with challenging real-world conditions and background clutter.
BACKGROUND REMOVAL AND MAPPING FUNCTION
The overall complexity increases for the natural images as the object of interest is lying on the background region. In object classification problem, it is essential to distinguish the object of interest and the background. Segmentation of object is done through background subtraction technique. This method is more suitable when the intensity levels of the objects fall outside the range of levels in the background.
An object with natural background is shown in Fig. 1. Initially morphological operations are applied to suppress the residual errors with help of open and close pair statements (Li et al., 2004; Radke et al., 2005). The small regions are removed by filling the holes.
Then the image subtraction is applied with the earlier result. Thus the object is segmented from the background. A mapping function (1) is used to restore the object of interest from that of the subtracted image.
(1) |
where, f(x,y) is the transformed image, d(x,y) is image difference after fill
operation and I(x,y) is the original image.
Fig. 1:(a) | An occluded image with natural background denoted as I(x,y). (b) The small regions are removed by filling the holes. (c) Image difference obtained by subtraction (a) by (b) denoted as d(x,y) and (d) Image obtained by mapping function f(x, y) |
SPECTRAL FEATURES
Spectral measures of texture are based on the fourier spectrum, which is ideally suited for describing the directionality of periodic or almost periodic 2D patterns in an image. The global texture patterns are easily distinguishable as this spectral measure is a concentration of high-energy bursts in the spectrum.
Thus spectral texture is useful for discriminating between periodic and non-periodic texture patterns and for quantifying differences between periodic patterns. In this context the background segmented region of interest being car images seems to have a periodic pattern and the natural scene seems to have non-periodic patterns. The differences between periodic and non-periodic patterns are utilized for classification purpose.
Interpretations of spectrum features is simplified by expressing the spectrum in polar coordinates to yield a function S(r, θ), where, S is the spectrum function and r and θ are the variables in this coordinate system. For each direction θ, S(r, θ) may be considered a 1-D function, Sθ(r) similarly, for each frequency r, Sθ (r) is a 1-D function. Analyzing Sθ(r) for a fixed value of θ yields the behavior of the spectrum along a radial direction from the origin, whereas analyzing Sr (θ) for a fixed value of r yields the behavior along a circle centered on the origin. A global description is obtained by integrating these functions (2) and (3).
(2) |
(3) |
where, R0 is the radius of a circle centered at the origin. The results of these two equations constitute a pair of values [S(r), S(θ)] for each pair of coordinates (r, θ)]. These coordinates are varied to generate two 1D functions, S(r), S(θ), that constitute a spectral-energy description of texture for an entire image or region under consideration.
Descriptors of these functions themselves can be computed in order to characterize their behavior quantitatively. Descriptors typically used for this purpose are the location of the highest value, the mean and variance of both the amplitude and axial variations and the distance between the mean and the highest value of the function.
Thus spectral measures S(r) of texture are calculated for every sub-block of an image. The feature vector varies with the size of the sub-blocks chosen. The number of features populated by varying the block size is shown in Fig. 3.
BUILDING A NEURAL CLASSIFIER
A binary Artificial Neural Network (ANN) classifier is built with back-propagation algorithm that learns to classify an image as a member or nonmember of a class.
The number of input layer nodes is equal to the dimension of the feature space obtained from the spectral features. The number of output nodes is usually determined by the application of Khotanzand and Chung (1998) which is 1 (either Yes/No) where, a threshold value nearer to 1 represents Yes and a value nearer to 0 represents No. The neural classifier is trained with different choices for the number of hidden layer. The final architecture is chosen with single hidden layer shown in Fig. 2 that results with better performance.
The connections carry the outputs of a layer to the input of the next layer have a weight associated with them. The node outputs are multiplied by these weights before reaching the inputs of the next layer. The output neuron (4) will be representing the existence of a particular class of object.
(4) |
Fig. 2: | The three layer neural architecture |
Fig. 3: | The description of the proposed study |
PROPOSED WORK
This study addresses the issues to classify objects of real-world images containing side views of cars amidst background clutter and mild occlusion. The objects of interest to be classified are car (positive) and non-car (negative) images taken from University of Illinois at Urbana-Champaign (UIUC) standard database. The image data set consists of 1000 real images for training and testing having 500 in each class. The sizes of the images are uniform with the dimension 100x40 pixels.
The proposed framework consists of two methods followed by background removal. Method-I: 10 Blocks of size 20x20 each, Method-II: 40 Blocks of size 10x10 each. Spectral features are calculated from each single block of sub-image using equations mentioned earlier. Method-I extracts 90 features per block giving 900 features (90 features x 10 blocks) and Method-II extracts 160 features per block giving 6400 features (160 featuresx40 blocks). Data normalization is applied for the spectral features, which are the deviated from its mean by standard deviation. This process improves the performance of the neural classifier. The overall flow of the framework is shown in Fig. 3.
IMPLEMENTATION
The training is performed on different kinds of cars against a variety of background, partially occluded cars of positive class. The negative training samples include images of natural scenes, buildings and road views. The training is done with 400 images (200 positive and 200 negative) against all the methods. The testing of images are done with 1000 images (500 positive and 500 negative) taken from the same image database.
The feed-forward network for learning is done for 10 blocks of size 20x20 namely method-I and 40 blocks of size 10x10 namely Method-II, respectively. The input nodes for Method-I is 900 (10 blocksx90 features), Method-II is 6400 (40 blocksx160 features), respectively. Optimal structure validation is done and the proposed structure performs well and leads to better results. Thus the optimal structure (Fig. 2) of the neural classifier for method-I is 90-10-1 and Method-II is 6400-10-1, respectively.
The various parameters for the neural classifier training for all the methods
are given in Table 1. The Performance graph of the neural
classifier for Method-I and Method-II are shown in Fig. 4
and 5, respectively.
Fig. 4: | The performance graph of neural network training for Method-I: 10 Blocks of size 20x20 |
Fig. 5: | The performance graph of neural network training for Method-II: 40 Blocks of size 10x10 |
Table 1: | Parameters for training of the neural classifier |
DISCUSSION
In object classification problem, the four quantities of results category are given below:
(i) True Positive (TP) | = | Classify a car image into class of cars |
(ii) True Negative (TN) | = | Misclassify a car image into class of non-cars |
(iii) False Positive (FP) | = | Classify a non-car image into class of non-cars |
(iv) False Negative (FN) | = | Misclassify a non-car image into class of cars |
The objective of any classification is to maximize the number of correct classification denoted by True Positive Rate (TPR) and False Positive Rate (FPR) where by minimizing the wrong classification denoted by True Negative Rate (TNR) and False Negative Rate (FNR).
The values of nP and nN used as testing samples are 500 and 500, respectively.
Most classification algorithm includes a threshold parameter for classification
accuracy which can be varied to lie at different trade-off points between correct
and false classification. The comparison of results of the proposed methods
is shown in Table 2 which is obtained with an activation threshold
value of 0.7. Classified images of category car and non-car as resultant sample
images are shown below in the Fig. 6 and 7,
respectively.
Fig. 6: | Sample results of the neural classifier of the category car images with cluttered background and mild occlusion |
Fig. 7: | Sample results of the neural classifier of the category non-car images containing trees, road view, bike, wall, buildings and persons |
Table 2: | Comparison of experimental methods |
It is evident that the classifier with 40 blocks of size 10x10 (Method-II) is showing improved overall results of 85.5% of classification accuracy comparatively with that of 10 blocks of size 20x20 (Method-I) which gives a classification accuracy of 84.2%.
The proposed work is compared with the recent reviews and found that the results
have improved to certain extend. Nagarajan and Balasubramanie (2007) have achieved
a classification accuracy of 79% based on wavelet features towards object classification
with cluttered background. Nagarajan and Balasubramanie (2008b) have presented
their study based on statistical texture moments with a classification accuracy
of 83.8%. Nagarajan and Balasubramanie (2008a) have also extended their study
on object classification containing mild occlusion and complex background based
on invariant moment features with a classification accuracy of 84.9%. Based
on the previous results, the proposed work is an added data towards object classification
problem with an improved classification accuracy of 85.5%. It is evident that
the classification accuracy has improved substantially in comparison with previous
study depicted in Fig. 8.
Fig. 8: | Classification accuracy of the proposed method with earlier study |
CONCLUSION
Thus an attempt is made to build a system that classifies the objects amidst background clutter and mild occlusion is achieved to certain extent. Thus the goal to classify objects of real-world images containing side views of cars with cluttered background with that of non-car images with natural scenes is presented. Comparing the results in Table 2, the performance of the proposed method with 40 blocks of size 10x10 with spectral features after background removal gives a satisfactory classification rate of 85.5%. The limitation of this method is the object with a high degree of occlusion for classification. Further study extension can be made to improve the performance of the classifier system with the inclusion of feature selection process. This complete study is implemented using neural network and image processing toolbox of Matlab 6.5.
ACKNOWLEDGMENT
The authors acknowledge the reviewers valuable advices and comments, which have helped to improve the quality of this study.