College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China

**Xian-Rui Song**

College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China

Yu-Long Qiao

College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China

**Xian-Rui Song **

College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China

College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China

College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China

Moving target detection is a key problem in many image based applications. For these scenes whose background is complex caused by stochastic motion, it becomes more difficult to well detect the moving targets. This study proposes a center-surround discriminant saliency based detection method using the Stationary Wavelet Transform (SWT) and the generalized Gamma distribution (GΓD). It makes use of the fact that the properties of the moving target and the complex background is different in the stationary wavelet domain and can be described by the GΓD model. The center-surround discriminant saliency which is to measure the difference between the current location and its surround, is determined with the Symmetrized Kullback-Leibler Distance (SKLD) of the GΓD models of the center and surround in the wavelet transform. Then the moving target is detected based on this discriminant saliency. The experimental results demonstrate the effectiveness of the proposed method.

PDF Abstract XML References Citation

Yu-Long Qiao and Xian-Rui Song, 2012. Moving Target Detection in Complex Background. *Research Journal of Information Technology, 4: 195-203.*

**URL:** https://scialert.net/abstract/?doi=rjit.2012.195.203

For a human visual system, we not only focus on static image features significantly including luminance, color, contrast to the surrounding area, but also pays attention to the movement of the interesting objects. Therefore, moving target detection from the background became an important research topic in the image sequence or video processing. There are lots of algorithms for moving target detection. However, for these scenes whose background is complex caused by stochastic motion, it becomes more difficult to well detect the moving targets (Kaawaase *et al*., 2011; Nagarajan and Balasubramanie, 2008; Sulaiman *et al*., 2008).

In order to detect the moving target in image sequences, there are some typical algorithms such as mixture Gaussian model (Hui *et al*., 2011). The mixture Gaussian model based method can realize the background estimation and adaptive background update and uses background subtraction between frames to get the moving target. However, for complex dynamic background, e.g., crowd, rain, swaying trees and other moving objects such like birds and smoke-filled environment, this traditional method may detect the dynamic scene as moving targets. Therefore, Gao *et al*. (2008) proposed the center-surround discriminant saliency algorithm. They also demonstrated the performance of the method with the discriminant saliency detection in static imagery and discriminant saliency for dynamic scenes. The wavelet domain Generalized Gaussian Distribution (GGD) is used to describe the interesting region during the saliency detection. However, Van de Wouwer *et al*. (1999) verified that some wavelet detail histograms may not be fitted well by the GGD. Because the observed histograms do not have a sharp peak at zero but shift toward left or right (Do and Vetterli, 2002). In such cases, the GΓD model which is a three-parameter (scale, shape and index shape parameters) distribution model, is a good choice to replace the GGD model. Furthermore, Song (2008) presented that the GΓD model often outperforms the GGD for modeling Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) coefficients.

In this study, we proposed a moving target detection algorithm based on the center-surround discriminant saliency by using the Stationary Wavelet Transform (SWT) and the GΓD model. The SWT is performed on the image sequences (video) to obtain the multiscale decomposition. Then the GΓD is used to model the wavelet coefficients of the center and surround windows in wavelet transform domain. The center-surround saliency map is formed by combing the center and surround discrepancy that is measured with the Symmetrized Kullback-Leibler Distance (SKLD). Finally, the moving target is determined from the saliency map.

**THEORY BACKGROUND **

**Stationary wavelet transform:** The standard DWT is a non-redundant and compact representation of signal in transform domain. Although the DWT possesses many usefully characteristics and achieves satisfying result (Chen *et al*., 2003; Venkateswaran and Rao, 2007; Loum *et al*., 2007; Logashanmugam and Ramachandran, 2008), it is translation-variant. That is to say, when a signal is translated, its numerical descriptors will be not translated but modified. The reason is that there is a decimation step after filtering operation. The Stationary Wavelet Transform (SWT) has a similar tree structured implementation without any decimation (subsampling) step. Therefore, the SWT is a shift-invariant transform that is suitable for the signal analysis.

The implementation of the stationary wavelet transform is similar to that of the discrete wavelet transform, except that there is no downsampling operation. For a one-dimensional signal x, the block diagram of SWT is shown in Fig. 1. It can also be expressed as follows:

(1) |

with S_{0} = x. * is the convolution operator. Equation 1 involves two basic filters h and g. At each step, a branch output of the last step, S_{i}, convolutes with the filters or where denotes the upsampling by a factor of 2^{i} which also means inserting 2^{i}-1 zeros between two taps of the basic filters.

Fig. 1: | 1D stationary wavelet transform (1D SWT) |

For the 2D separable wavelet transform of an image can be easily derived from 1D wavelet decomposition. One-level decomposition of an image can be obtained by first applying 1D SWT to each row of the image and then each column of the resulting image. Successively perform one-level decomposition on the resulting low frequency image and then obtain the discrete wavelet representation of this image. The 2D SWT produces three detail subbands at each level, each of which is strongly oriented at orientations 0, 45 and 90°.

**Generalized gamma distribution (GΓD):** The GΓD is presented in 1962 (Stacy, 1962) and applied in different fields, such as speech spectra characterization, texture retrieval and texture classification (Maliani *et al*., 2010). Given the complex structure and enormous variation of the image, Choy and Tong (2010) demonstrated it is better than GGD for the histogram model of image variations. This study uses GΓDs to model the coefficient distributions of the interesting regions. The Probability Density Function (PDF) of GΓD is:

(2) |

where, α and β are scale and shape parameters, λ is the index shape parameter and Γ(.) is the Gamma function. The GΓD model covers many distributions such as the generalized Gaussian, Gamma and Weibull distribution.

Given the wavelet coefficients x_{i} of the interesting region, the GΓD parameters can be estimated either by using the Method of Moments (MM) or by using Maximum Likelihood Estimation (MLE). It is well known that MLE is better than MM for the parameters estimation. Thus, we use MLE approach (Choy and Tong, 2010) to estimate parameters α and λ as follows:

(3) |

(4) |

where, N is the number of the wavelet coefficient, The shape estimator can be computed by the sample Scale-in-dependent Shape Estimation (SISE) equation, defined by:

(5) |

For measure the similarity between two GΓD models of the interesting regions, we adopt the Symmetrized Kullback-Leibler Distance (SKLD):

(6) |

where, α_{i}, β_{i} and λ_{i} (i =1,2) are the parameters of the GΓD.

**PROPOSED ALGORITHM**

The discriminant center-surround hypothesis, introduced by Gao *et al*. (2008), is that the most salient locations of the visual field are those that enable the discrimination between feature responses in center and surround with smallest expected probability of error. It has been shown that the discriminant saliency based algorithms achieve state-of-the-art performance in some **image processing** problems, such as interest point detection, object recognition and background subtraction.

The discriminant center-surround saliency can be formed as a binary classification problem (Gao *et al*., 2008; Zhang *et al*., 2008). At each position l of the video, there are two classes: a class of stimuli of interest and a surround. The stimuli of interest are the observations within a neighborhood w^{1}_{l} of the location l and referred to center (with label c = 1). The surround (with label c = 0) is composed of all the stimuli that are not salient which is the observation within a surrounding window W^{0}_{1} of the neighborhood W^{1}_{l}. Based on the feature extracted from the center and surround regions, the saliency of the location l can be determined by the discriminant power of the features between the two regions. The interest object in the center window means that there has large disparity of features responses between center window and surround window. Therefore, we use this center-surround formulation to detect the moving target.

Specifically, for a video to detect the moving target, we perform k-level SWT on each frame of this video which produces 3k detail subbands with the same size as the original frame. At the position l of the current frame of the video, the center window w^{0}_{l} and surround window W^{1}_{l} are shows as Fig. 2. These windows have the similar regions W^{0}_{l,i} and W^{1}_{l,i} in each detail subbands. The wavelet coefficients in the windows W^{0}_{l,i} and W^{1}_{l,i} are bandpass features which will be modeled with the GΓD. Mathematically, the bandpass features in the center windows W^{0}_{l,i} are assumed to be drawn from the condition probability density GΓD and those in surround windows W^{1}_{l,i} from .

The saliency S(l) is defined as the power that the features Y can discriminate between the two classes. Based on the minimum probability of the Bayes classification error, S(l) is quantified by the mutual information between the feature response Y and the class label C:

(7) |

where, KL(.||.) is the Kullback-Leibler divergence between two probability distributions. A larger S(l) implies there has large disparity between center and surround which indicates a (part) moving target in the center.

Fig. 2: | Center and surround windows |

Because performing the SWT on the image sequences, the feature of a location should be multi-dimension. According to the result, the saliency S(l) can be approximated as follows (Mahadevan and Vasconcelos, 2010):

(8) |

The GΓD model is quite sensitive to its parameters. That is to say, a small change in model parameter may lead to subtle differences in its distribution. Therefore, we replace KL(.||.) in saliency S(l) with Symmetrized Kullback-Leibler Distance (SKLD):

(9) |

Then S(l) is rewritten as:

(10) |

In general, when the size of a center window is identified as nxnxt, the size of the surround window is set to 6nx6nxt. In order to generate S(l), we should estimate the GΓD parameters of each window and compute . However, due to the unbalance of the sample size (the number of the wavelet coefficients in the window), the estimated parameters of the center window are different from those of the whole window W^{0}_{l,i}∪W^{1}_{l,i}, even if they belong to the same category. Therefore, this study uses the parameters estimated from y’iεW^{0}_{l,i}∪W^{1}_{l,i}∪W^{1}_{l,i}, obtains the corresponding PDF and replaces with .

Fig. 3: | Moving target detection algorithm |

Then we have:

(11) |

Though Eq. 11 is different from Eq. 8, it could be used to describe the discriminant power of the features extracted from the location l in differentiating the two classes.

In summary, the discriminant center-surround saliency maps are generated by using the SWT and GΓD models. When a moving target moves to the center window, the saliency of this location will become larger. Based on this fact, the target could be detection from the saliency map. The entire detection algorithm is shown in Fig. 3 and summarized as follows:

• | Step 1: | Perform k-level SWT on each frame of the image sequence and there will generate 3k detail subbands |

• | Step 2: | For each position l |

• | Identify two corresponding windows in each detail subbands: the center window W^{0}_{l,i} and the surround window W^{1}_{l,i} | |

• | Estimate the GΓD model parameters of the surround window W^{1}_{l,i}, the whole window W^{0}_{l,i}∪W^{1}_{l,i} and the window W^{0}_{l,i}∪W^{1}_{l,i}∪W^{1}_{l,i} with MLE method | |

• | Compute the saliency S(l) by using Eq. 11 |

• | Step 3: | Determine a threshold value, the region whose saliency values are larger than the threshold T is the moving target |

**EXPERIMENTS AND RESULTS**

To evaluate the performance of the proposed algorithm, we chose two image sequences with the dynamic backgrounds which are downloaded from the Internet (Mahadevan and Vasconcelos, 2008).

Fig. 4(a-c): | Image sequences “bottle” and the detection results |

Fig. 5(a-d): | Image sequences “boat” and the detection results |

Each image sequence is converted to grayscale images. Figure 4 shows a bottle floating in the waving water. Figure 5 shows a driving boat which has the sweeping wave. Firstly, each frame of the image sequence is decomposed with the 2-level SWT. At each detail subband, we use the center window of size 16x16x5 pixels and surround window of size 96x96x5 pixels and adopt the MLE to estimate the GΓD model parameters. The experimental results are shown in Fig. 4 and 5. In both cases, the background is dynamic and the saliency maps completely show the foreground objects and does adapt well to the dynamic scene. The result shows that the proposed algorithm can almost completely ignoring the dynamic background and clearly detect the moving target. In other words, this method can obtain better detection results.

In this study, we propose a moving target detection method based on the discriminant center-surround saliency that is generated by using the Stationary Wavelet Transform (SWT) and generalized Gamma distribution (GΓD). The center-surround saliency has various advantages, such as, complete unsupervised, robust to the complex dynamic background and invariant to the camera motion. Meanwhile, the distribution GΓD provides a good model for the stationary wavelet coefficients. Therefore this method can effectively detect the moving target based on the saliency map. The experimental results demonstrate the performance of the proposed method.

This study is partially supported by National Natural Science Foundation of China 60902064 and 61172159, Postdoctoral Science-research Developmental Foundation of Heilongjiang Province LBH-Q09128. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers which have improved the presentation.

- Chen, T.H., G. Horng and S.H. Wang, 2003. A robust wavelet-based watermarking scheme using quantization and human visual system model. Inform. Technol. J., 2: 213-230.

CrossRefDirect Link - Choy, S.K. and C.S. Tong, 2010. Statistical wavelet subband characterization based on generalized Γ density and its application in texture retrieval. IEEE Trans. Image Process, 19: 281-289.

CrossRefDirect Link - Do, M.N. and M. Vetterli, 2002. Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance. IEEE Trans. Image Process, 11: 146-158.

CrossRefDirect Link - Gao, D., V. Mahadevan and N. Vasconcelos, 2008. On the plausibility of the discriminant center-surround hypothesis for visual saliency. J. Vision, Vol. 8.

CrossRefDirect Link - Loum, G., C.T. Haba, J. Lemoine and P. Provent, 2007. Texture characterisation and classification using full wavelet decomposition. J. Applied Sci., 7: 1566-1573.

CrossRefDirect Link - Logashanmugam, E. and R. Ramachandran, 2008. An improved algorithm for image compression using wavelets and contrast based quantization technique. Inform. Technol. J., 7: 180-184.

CrossRefDirect Link - Mahadevan, V. and N. Vasconcelos, 2010. Spatiotemporal Saliency in Dynamic Scenes. Pattern analysis and machine intelligence. IEEE Trans. Pattern Anal. Mach. Intell., 32: 171-177.

CrossRefDirect Link - Maliani, A.D.E., M.E. Hassouni, N. Lasmar and Y. Berthoumieu, 2010. Texture classification based on the Generalized γ distribution and the dual tree complex wavelet transform. Proceedings of the 5th International Communications and Mobile Network, September 30-October 2, 2010, Rabat, Morocco, pp: 1-4.
- Nagarajan, B. and P. Balasubramanie, 2008. Neural classifier for object classification with cluttered background using spectral texture based features. J. Artif. Intell., 1: 61-69.

CrossRefDirect Link - Song, K.S., 2008. Globally convergent algorithms for estimating generalized γ distributions in fast signal and image processing. IEEE Tran. Image Process., 17: 1233-1250.

CrossRefDirect Link - Stacy, E.W., 1962. A generalization of the γ distribution. Ann. Math. Statist., 33: 1187-1192.

CrossRefDirect Link - Sulaiman, S., A. Hussain, N. Tahir, S.A. Samad and M.M. Mustafa, 2008. Human silhouette extraction using background modeling and subtraction techniques. Inform. Technol. J., 7: 155-159.

CrossRefDirect Link - Hui, S., S. Kai and H. Qing-Yu, 2011. Fast moving small target tracking based on local background gaussian mixture model. Inform. Technol. J., 10: 1381-1387.

CrossRefDirect Link - Van de Wouwer, G., P. Scheunders and D. Van Dyck, 1999. Statistical texture characterization from discrete wavelet representations. IEEE Trans. Image Process., 8: 592-598.

CrossRefDirect Link - Venkateswaran, N. and Y.V.R. Rao, 2007. K-Means clustering based image compression in wavelet domain. Inform. Technol. J., 6: 148-153.

CrossRefDirect Link - Kaawaase, K.S., F. Chi, J. Shuhong and Q.B. Ji, 2011. A review on selected target tracking algorithms. Inform. Technol. J., 10: 691-702.

CrossRefDirect Link - Zhang, M., Z. Lu and J. Shen, 2008. Salient region: Presentations of image main contents and its exaction algorithms. Inform. Technol. J., 7: 992-1000.

Direct Link