INTRODUCTION
In the era of digital multimedia, the development of image/video editing tools
makes the tampering or forgery of digital media much easier. Even ordinary users
can produce forged digital media and spread them over internet maliciously.
This leads to increasing concerns about the trustworthiness of public digital
media. It is in urgent need to verify the originality and integrity of digital
video. In response to this challenge, digital media forensics (Bing
et al., 2011) occurs. The techniques for digital forensics can be divided
into two categories: active and passive forensic. Since active forensics depends
on the auxiliary data such as digital watermark or signatures (Jin
and peng, 2006), passive forensics is becoming a hot research topic in the
field of information security (Abdulfetah et al.,
2010).
In the domain of digital image forensics, a mass of relative works have been
conducted. In general, the tampering of digital video is more sophisticated
and time-consuming. With an increasing number of video editing tools, such as
Video Edit Magic, it becomes easier for video tampering. Correspondently, the
research on digital video forensics is still in its infancy. The most representative
works are summarized as follows: (1) forensics by the in-consistent trails during
the imaging process of digital video, (2) forensics by the traces of video forgery
process, such as ghost shadow (Zhang et al., 2009),
block artefacts (Abboud, 2006; Luo
et al., 2008), noise residue (Shitong et al.,
2005; Kobayashi et al., 2010), similarity
between image regions (Wang et al., 2007) and
GOP periodicity (Su et al., 2009) etc. They are
effective to detect those traditional forgery operations, including copy-paste,
double compression and frame-based manipulation.
Object-based manipulations are usually malicious for digital video. For example,
if an object is added, or deleted from digital video, it might have direct influence
to the viewers understanding of video content (Renuga
and Sadasivam, 2009). To the best of knowledge, there is still no work reported
in literatures about the passive forensics of object-based forgery in digital
video (Xiang-Wei et al., 2009).
In this study, a forgery passive forensic method based on video object is proposed, which analyses statistical property of object and variable width of its surrounding area, then digs the trace of forgery in small-scale by extracting non-subsampled contourlet coefficients and gradient information, which will effectively exploits the image edges localized in both location and direction while producing significant coefficients and these features vectors are formed as input of Support Vector Machine (SVM), then natural objects and forged ones will be successfully classified, the procedure of proposed method is listed in Fig. 1.
VIDEO OBJECT DETECTION
Usually, video forgery based on object consists of four steps: object detection,
object manipulation, motion interpolation and background in-painting.
|
Fig. 1: |
The procedure of proposed method |
|
Fig. 2(a-c): |
Detection of video object and AWOB, (a) Original frame, (b)
Background subtraction and (C) Contour of AWOB |
|
Fig. 3: |
Filter bank structure of contourlet transform |
For digital video forensics, the first step will also be object detection.
Then, the object contour and its bounding area can be located and the statistical
features are extracted in order to verify the originality and integrity of digital
video.
For moving object detection, the three typical methods are: optical flow (Yao
et al., 2012), frame difference (Xue et al.,
2011) and background subtraction (Sulaiman et al.,
2008; Jodoin et al., 2007). Optical flow
method can get accurate results when tracking fast-moving object but with complex
computation. Frame difference method is computationally efficient but too sensitive
to the scene change such as light. It is easy to lead to ghost effect. Therefore,
it is relatively reasonable to utilize background subtraction, especially for
the motion object detection from static background.
Since the left traces for object-based video forgery always exist near the object boundary, a new concept of Adjustable Width Object Boundary (AWOB) is introduced by mathematical morphology. Let l be the extracted binary object, ⊕ be the dilation operation and δs be the symmetric structure element, AWOB is defined as follows:
From Eq. 1, it is obvious that the object area gets larger
with the increase of n for dilation. In Fig. 2, two examples
are given for the above-mentioned steps in which object of original frames is
obtained by background subtraction and subsequently the contour of its AWOB,
where n equals 2.
NON-SUBSAMPLED CONTOURLET TRANSFORM
The contourlet transform (Xin et al., 2011)
is a new two-dimensional extension of the wavelet transform using multi-scale
and directional filter banks. The contourlet expansion is composed of basis
images oriented at various directions in multiple scales, with flexible aspect
ratios. Figure 3 shows a flow graph of the contourlet transform.
It consists of two major stages. The first stage is Laplace Pyramid (LP) and
the second stage is Directional Filter Banks (DFB).
Figure 4 shows an example of the frequency decomposition
achieved by the DFB. It offers a flexible multi-resolution and directional decomposition
for images, since it allows for different number of directions at each scale.
For the contourlet transform to satisfy the anisotropy scaling law, as in the
curvelet transform, the number of directions is doubled at every other finer
scale of the pyramid (Do and Vetterli, 2002).
Figure 5 shows an example of the contourlet transform on the Zoneplate image. The image is decomposed into a lowpass subband and several bandpass directional subbands.
|
Fig. 4: |
Frequency partitioning by contourlet transform |
It is noticeable that only contourlets that match with both location and direction of image contours produce significant coefficients. Thus, the contourlet transform effectively exploits the fact image edges are localized in both location and direction.
PARAMETER ESTIMATION OF GENERAL GAUSSIAN DISTRIBUTION
In this study, the distribution of detail subband coefficients is described
by General Gaussian Distribution model, whose model parameters are estimated
by Maximum Likelihood Estimate (MLE) (Lee, 2010).
For the given subband coefficients of contourlet transform (N indicates the number of coefficients), the likelihood function having independent component is defined as:
where, α and β are parameters to be estimated, in this case the following likelihood equations have a unique root in probability, which is indeed the maximum likelihood estimator:
where, Ψ is the digamma function, i.e., Ψ (z) = Fix β>0 then
(3) has a unique, real and positive solution as:
|
Fig. 5: |
Contourlet transform of the zoneplate image |
|
Fig. 6(a-c): |
Gradients in X-direction and Y-direction, (a) Original frame,
(b) Gradients in X-direction and (c) Gradients in Y-direction |
|
Fig. 7: |
Average gradients of RGB colour channel |
Substitute this into Eq. 4, the shape parameter is the solution of the following transcendental equation:
Which can be solved numerically, maximum likelihood
of shape parameter β is figured out by using the Newton-Raphson iterative
procedure with the initial guess from the moment method.
GRADIENT INFORMATION
Average gradient is the accumulation of image local luminance contrast, generally, the larger gradient value is, the stronger contrast among local parts exists and the clearer the image is, in a word, average gradient sensitively reflect the details of image. The rows and columns of the image f(x,y) are described as row and col and the expression is:
Figure 6 presents gradients of the frames both in X-direction and Y-direction. Original frames are listed in column (a), with their gradients in X-direction showed in column (b) and Y-direction showed in column (c). It is apparent that accumulation of local luminance contrast successfully reflects the details of frames.
Let z (z1......zk) (k indicates the number of gradient values) indicate the gradient components of AWOB.
As showed in Fig. 7, average gradients of RGB colour channel
are computed, respectively to reflect edge intensities, which present the same
trend. It could be seen that edge intensities are fitted into Rayleigh distribution
(Jing et al., 2011), probability function of
which is descirbed in formula (8):
And the parameters of Rayleigh distribution are estimated by MLE:
THE CLASSIFICATION BASED ON SVM
Since SVM (Hung and Liao, 2008) is a kind of machine
learning algorithm based on statistical theory, it has special advantages in
solving small samples, non-linear and high dimensional pattern recognition problem.
The basic idea behind SVM is to map sample space into a high or infinite dimension
space (Hilbert space) by nonlinear mapping p. The non-linear problem in the
original sample space will be converted into a linear problem in feature space.
Assuming that those videos with natural objects are positive samples, and video sequences with object-based forgery are negative samples, there will be four categories of possible judgments for a two-class (positive and negative) classification problem: (1) True Positive (TP): positive samples are predicted as positive; (2) False Negative (FN): positive samples are predicted as negative; (3) False Positive (FP): negative samples are predicted as positive; (4) True Negative (TN): negative samples are predicted as negative. The performance of classifier can be described by the following metrics, such as Accuracy, Precision and Recall. Their definitions are as follows:
Two methods are used for the performance evaluation of classifier. One is True Positive Rate (TPR) and False Positive Rate (FPR) and the other is ROC (Receiver Operating Characteristic), which describes the trade-off between TP and FP. The AUC (Area Under ROC Curve) can reflect how well classifier works. The larger AUC is, the better performance the classifier has:
ROC curve is utilized to describe the trade-off between TP and FP, which shows
the performance of classifier and AUC marks how classifier works, the larger
value is the better performance the classifier will demonstrate.
EXPERIMENTAL RESULTS AND DISCUSSIONS
The test videos are directly taken from Shih et al.
(2006) without processing. Small sample data set are obtained by the object-based
forgery and manipulation algorithm. 9 typical static video sequences are chosen.
Among them, four are natural videos and the rest five samples are forged. Their
compressed file formats are AVI and WMV, respectively. The resolution of video
frame is 320*240. The experiment is performed with Matlab R2009a. The PC configurations
are as follows: Inter (R) Core (TM) i3 CPU, 2.53 GHz, RAM 2 GB. Fifty video
frames are selected from every video sequence and given the frame image, as
shown in Fig. 1, details of the forensics scheme is described
as follows:
Step 1 |
: |
Video object of original frames is obtained by background
subtraction and so is the contour of its AWOB, which will be the target
of the experiments |
Step 2 |
: |
Detail subband coefficients of the contourlet transform are computed by
3 layer decomposition of NSCT, and fitted into General Gaussian Distribution
model, then the corresponding parameters are estimated by MLE |
Step 3 |
: |
Edge intensities are analysed, after fitting into Rayleigh distribution,
the parameters are estimated for RGB channels, respectively, which will
be the next 3 feature vectors |
Step 4 |
: |
Different weights are assigned with these features to form higher dimensional
vector, which are obtained by training with step size gradually increased
from 0.5 to 0.9, Each group of weights are utilized for feature combination
(Sasikala and Kumaravel, 2007) and the group with
the highest accuracy of classification is selected |
Step 5 |
: |
Every component in the feature vector is normalized into (0, 1) on the
basis of the following mapping formula so that every component will play
a balanced role: |
where x, x2 represent sample features before and after normalization,
respectively.
Step 6 |
: |
The library libsvm is utilized, where Radial Basis Function
(RBF) is adopted as the kernel function. By cross validation, the penalty
parameter c and RBF kernel function parameter g are obtained. To achieve
better classification, the above processes are repeated for 10 times to
obtain the average results |
Table 1: |
Classification results of extracted features |
 |
|
Fig. 8(a-b): |
ROC curves of different circumstances, (a) AUC =0.91935 and
(b) AUC = 0.97419 |
The classification performances of five features and their combination are
summarized in Table 1. From it, it is apparent that various
features show different effects on the classification of real videos and forged
ones, with the highest accuracy of 99.13% (result of G channel average gradient)
and the lowest accuracy of 88.73% (result of means of NSCT coefficients) thought
they all have accuracies above 90%. Moreover, gradient features are relatively
more effective than those NSCT coefficients features. After feature combination,
both Accuracy and Precision reach stable result for over 96%.
Figure 8 contains the ROC curves when the classification accuracies are 88.73 and 94.37%, respectively. Apparently, the right figure shows better performance because ROC is closer to the upper left of the coordinate, the results of AUC are 0.9194 on the left side and 0.9742 on the right.
CONCLUSION
A passive forensic method for object-based video forgery is proposed, which
utilises the statistical features of object contour and its surrounding area.
By using non-subsampled contourlet transform coefficients in small-scale and
gradient information of each colour channel, a classification of natural objects
and forged ones is achieved by SVM. The experimental results show that the proposed
approach achieves desirable forensics results. However, due to the diversity
and content complexity of digital videos, it still needs the support of the
sample database in the training process. Future investigation will be searching
for features that not too heavily-dependent on the training samples. Furthermore,
a comprehensive data set similar to Columbia Image Splicing Detection Evaluation
Dataset (DVMM, 2008) is also needed for video forensic.
ACKNOWLEDGMENTS
This work was supported by National Natural Science Foundation of China (Grant No. 61072122), Special Pro-phase Project on National Basic Research Program of China (Grant No. 2010CB334706) and Program for New Century Excellent Talents in University (Grant No. NCET-11-0134).