HOME JOURNALS CONTACT

Information Technology Journal

Year: 2012 | Volume: 11 | Issue: 9 | Page No.: 1316-1321
DOI: 10.3923/itj.2012.1316.1321
Rate-distortion Estimation for Wavelet Video Coding Based on Multi-dimensional Gaussian Distribution
Xuesong Jin, Maoliu Lin, Zhaojie Zhao and Zhipeng Fan

Abstract: The rate allocation proposed a problem that exists in wavelet transformation based video coding. The goal of this paper is to design a solution for estimating the rate-distortion for wavelet subband faced by wavelet-based video coders. We proposed a new technique for establishing rate-distortion function using correlation between wavelet coefficients. In order to minimize the overall error of the reconstructed video, a rate allocation algorithm based on rate-distortion function was implemented. Experiment results show that the proposed rate-distortion model can predict the real rate-distortion performance of video encoder, can improve the reconstructed video quality significantly.

Fulltext PDF Fulltext HTML

How to cite this article
Xuesong Jin, Maoliu Lin, Zhaojie Zhao and Zhipeng Fan, 2012. Rate-distortion Estimation for Wavelet Video Coding Based on Multi-dimensional Gaussian Distribution. Information Technology Journal, 11: 1316-1321.

Keywords: multivariate normal distribution, Wavelet video coding, optimal rate-allocation and rate-distortion function

INTRODUCTION

With the drastic developments in computer and network technologies, highly efficient compression and transmission for video information are always the primary targets. When develops a video Codec system, high coding efficiency, namely covering more information by using less bits, is not the only goal but also the adaptive ability to meet various of channel bandwidth, terminal capabilities and user demands. Thus in order to meet the new video coding challenges, Scalable Video Coding (SVC) was developed which provides the scalability of temporal, spatial and quality, etc. (Schwarz et al., 2007).

The wavelet-based image compression algorithms, such as JPEG 2000 (Taubman and Marcellin, 2002), are known to outperform the discrete cosine transform (DCT)-based compression techniques in still image coding. It is natural to extend wavelet-based image processing techniques for video coding. The 3D wavelet transforms decompress video sequence into independent wavelet subbands. The embedded bit-stream is generated for each subband by performing the entropy coding. In order to produce the bit-stream with a given bit rate, the amount of bytes be allocated for each subband in the output embedded bit stream should be determined. This decision procedure raises a competition among all the subbands. Finally, the rate allocation is arbitrated by equal-slope criterion of rate-distortion (R-D) curve which plays a crucial role in rate allocation (Xu et al., 2001).

According to the characteristic of the existing main R-D models, these R-D can be classified into three categories: analytic, empirical and semi-analytic (Xie et al., 2009). Many R-D modeling methods in literatures have been proposed to estimate accurate R-D functions for different signals. Mallat and Falzon (1998) suggested a thorough analysis of R-D performance of wavelet transform coding in the low bit-rate region and presented an accurate model for still image coding. Their evaluation results revealed that the R-D function takes great different forms for low and high bit rate. Dai et al. (2004) combined Mallat’s distortion calculation with the bit-rate estimation which is proposed by He and Mitra (2001), this approach achieve a close-form operational R-D model. Mingshi and Sachaar (2006) proposed a unified mathematical model to describe wavelet coefficient operational rate-distortion behavior based on their statistical investigation of a typical three-dimensional wavelet Codec. Their model can capture sequence characteristics accurately and reveal the relationship between wavelet decomposition levels and the overall R-D performance. Foo et al. (2008) analysed the encoding process and method of the encoder, thus established analytical models based on wavelet sub-band. This method is more complex. Meng-Ping and Nguyen (2010) constructed a linear distortion model, implemented the rate extractor based on optimize motion quality level. The PCRD-opt rate allocation method is main method in image coding which is also being used in wavelet-based video coding systems (Taubman and Marcellin, 2002).

Xu et al. (2001) obtained operational R-D point pairs, through utilizing the rate of the actual output of coder and the estimation distortion as quantization step decrease. They used the binary search algorithm to find the optimal bit-rate cut-off point. The proposed method estimated operational rate-distortion discrete points and rate allocation algorithm was implemented. This study presents a new method to establish the rate-distortion function of wavelet subband and then introduced a novel algorithm to optimize the rate allocation using the rate-distortion function.

THE ANALYSIS OF SUB-BAND COEFFICIENTS CHARACTERISTICS

There was several system schemes can be chosen in wavelet based scalable video coding system. This paper is based on T+2D structure. Motion Compensation Temporal Filtering (MCTF) was applied to remove inter-frame correlation along the time axis to implement temporal scalability. Then DWT is performed on the obtained temporal subband to finally generate T-S subband. We have found certain relationship between the variance and correlation of subband and the statistical property of original pixel. The relationship will be explained consequently.

The 3D wavelet video coding schemes is to perform MCTF (Ohm, 1994) on the input video sequence directly before spatial transform. The multi-level MCTF decomposes the video frames into several temporal subbands, such as temporal high-pass subbands and temporal low-pass subbands. After temporal decomposition, a spatial decomposition is applied to each temporal subband to further decompose the frames spatially. After temporal and spatial subband transforms, the coefficients will be coded with 3D Embedded Subband Coding with Optimal Truncation (3D-ESCOT) (Xu et al. 2001). Basically, 3D-ESCOT follows the idea and coding procedure of EBCOT in JPEG 2000 (Taubman, 2000). The one-dimensional signal is adopted to illustrate because the convenience of analyzing subband coefficient characteristics. Supposed that signal is one-dimensional Gauss random process {x(n), n∈z}. Variance is σ2, autocorrelation function of random process R(k) = r|k|σ2, o≤r≤1, k∈Z. The signal decomposed with wavelet. The wavelet decomposition is equivalently two channel filter. The filter include of high-pass h0 and low-pass h1. As the result, subbands B0 and B1 are generated.

Fig. 1: One-dimensional signal wavelet decomposition

As in Fig. 1, high-pass and low-pass subbands coefficients are obtained by following formula, respectively:

(1)

(2)

The variance of high-pass subband B0 and low-pass subband B1(n) are depicted by D(B0) and D(B1), respectively. D(B0) and D(B1) are calculated from formula Eq. 3 and 4:

(3)

(4)

The notation ∑pixels’ summation of corresponding matrix and ⊗ means the element-multiply between corresponding matrix. Matrix C is as following:


(5)

(6)

It shows that the subband coefficients’ variance is determined by filter coefficients and autocorrelation of original signal.

Although wavelet filter can remove the temporal and spatial correlation of video signal, there still are some relations between wavelet coefficients. Entropy coding can utilize the relation to improve the coding efficiency. The autocorrelation of coefficients in sub-band B0 and B1 can be calculated from formula Eq. 5 and 6.

RATE-DISTORTION FUNCTION

The same frequency subband coefficients have similar statistical properties. Each subband, generated by temporal and spatial transforms, is divided into 3D coding blocks which will be coded independently, shown in Fig. 2. In the coding block, the correlation between the coefficients whose coordinate is (i1, j1, t1) and (i2, j2, t2) will be obtained by the following formula:

(7)

In the coding block, the coefficients correlation can be approximated exponential function of the corresponding coefficients spatial location. The correlation of coefficient of coarser subbands is calculated by using Eq. 5 and 6, recursively.

For each coding block, bit-plane coding and context-based arithmetic coding are used. Coder encodes the current coefficients x exploiting the correlation between current coefficient and the adjacent coefficients in the position of the 3D coding block. Let Nx = (xu, xd, xl, xr, xf, xb) denotes the context of current coefficients. The adjacent coefficients of current coefficient consist of up, down, left, right, before and behind coefficients, respectively, shown in Fig. 3. The coefficient x and the context Nx constitute a seven-dimensional random vector X = (x, Nx) which obeys following Gaussian distribution:

(8)

which:

In the covariance matrix Σ of different sub-bands, ρ can be obtained by Eq. 5 and 6.

Fig. 2: Three-dimensional wavelet coefficients code block

Fig. 3: The relationship between the current coding coefficients and the context correlation

Since the covariance matrix Σ is positive semi-definite, there is always a positive eigenvalues λ12,..., λn. The corresponding eigenvectors q1,q2,..., qn are easy to calculate. The eigenvectors constitute orthogonal matrix of Q. Covariance matrix Σ can be constituted by the following formula:

and:


(9)

Let X' = Q-1X, the probability density function of vector X' becomes:

Each component of random vector X' is completely independent. Therefore, the entropy of random vector X' can be obtained by the following formula:

The same transformation is performed on Y, equally, Y' = Q-1Y. The transformation will not affect average mutual information will not since the transform Q-1 is orthogonal transform and reversible. In addition, the Q is real matrix, so the above transformation is unitary transformations. Then, in the transformation, Euclidean distance remains the same length. The above transformation always uses the mean squared error to calculated distortion and ensures that the rate-distortion function of both sides is the equal.

In Eq. 9, each independent component x'k of random vector X' obeys Gaussian distribution, whose variance is λk. The rate-distortion function of X' can be derived from the rate-distortion function of independent components. Let θ is the specified distortion parameters. The average distortion and bit rate of component can be calculated as follows:

(10)

(11)

The values of rate and distortion will be calculated based Eq. 10 and 11 as the parameter θ varies (Berger, 1972). In this study, n is 7.

RATE-ALLOCATION ALGORITHM

The rate allocation problem for a video coder can be roughly stated as the determination of proper coding parameters so that the decoded video quality is optimized with respect to a certain fixed rate. After fully wavelet transformation, each sub-band is coded into bit-stream by entropy coder. The bit rate of each subband can be directly controlled to achieve the required distortion. For simplicity, rate-allocation is limited to a GOP. Let Rt be the rate assigned to the GOP. The rate allocation problem can be formulated as:

(12)

where, , Rα is the rate of coding motion information, such as macro-block partition mode, motion vectors, etc. Given the rate RT for the GOP, we want to allocate the rate such that the overall distortion is minimized. There are two methods to solve the optimization problems of (12): Lagrangian optimization and dynamic programming. The Lagrange optimization method is widely used because of its simple and easy to implement. Restricted optimization problem used Lagrange multipliers can be converted into a non-limited form, cost function of construct as following:

(13)

where, λ is the Lagrange multiplier. Since DA and R are monotone continuous curve of smooth, rate-distortion function is derivative everywhere. Solve the partial derivatives of cost function (13) and set partial derivatives to zero:

(14)

For each sub-band, determined λ, you can obtain the all bit-rate Ri and distortion Di. The coding parameter λ must be carefully selected so that condition R≤RT is guaranteed.

In fact, it is not easy to obtain the function expression of specific R-D function. We can get a number of R-D point pairs on the rate-distortion function. The bit allocation algorithm of using R-D point pairs can be described as follows:

Step 1: Initialization: given a λ0≤0, step size dλ>0 and threshold ε>0, make m = 0
Step 2: For each temporal subband, calculate the first derivative of the rate-distortion function:


Step 3: Calculated and allocated the rate Ri of each temporal subband
Step 4: Calculated
Step 5: If (Ri-R)/RT≤ε, then the algorithm will stop, otherwise, it will continue
Step 6: Make m = m+1, λm = λm-1+dλ, transferred to the step 2)

Fig. 4: Comparing the test results, (a) Bus (b) Foreman and (c) Coastguard

EXPERIMENTAL RESULTS AND ANALYSIS

To verify the performance of the proposed rate-distortion function estimation algorithm, we choose the exponential rate-distortion model (Po-Yuen et al., 1997; Taubman and Marcellin, 2002) for the temporal subband distortion. Then the temporal subband distortion is given by:

where, σ2n is the source variance and γn is the coding efficiency parameter. For each temporal subband n, the coding efficiency parameter γn and the variance σ2n have to be determined.

Video coding system platform based on discrete wavelet transform is built. The test sequences is CIF format with resolution size 352x288, frame rate is 30 fps. In the testing of the proposed method, the temporal decomposition level is set to four and using LG 5/3 filters. After decomposition, A GOP generates 16 sub-band in five types (one l4 and h4, two h3, four h2, eight h1). Then, each temporal sub-band is decomposed using Daubechies 9/7 filters. The generated wavelet coefficients are coded into bit-stream by entropy Codec.

We compare the performance of proposed method to rate-distortion model of exponential function. The test results for the CIF sequences are plotted in Fig. 4. It can be seen from the Fig. 4, for the content rapid changes video sequence, such as bus, the result of the two algorithms is more consistent. For the moderate changes video sequence, such as foreman and coastguard, the PSNR of the decoded video has increased about 0.3 dB~0.5 dB. This means that using the proposed rate-distortion model and rate allocation algorithm, the quality of the reconstruction image is improved compared to exponential model.

Owing to the calculation of correlation coefficient of each sub-band coefficients and diagonalization calculation of seven-dimensional covariance matrix, the computational complexity of proposed algorithm is higher.

CONCLUSION

A rate-distortion estimation algorithm based on multi-dimensional Gaussian distribution has been proposed for wavelet-based video coding system. Using the wavelet coefficient characteristic statistics, the encoder can derive the rate-distortion function. Experimental results show that this algorithm can be applied in wavelet video coding and achieve a better performance.

ACKNOWLEDGEMENTS

This study is partly supported by the National Natural Science Foundation of China (Grant No. 61171062) and the Natural Science Foundation of Heilongjiang Province, China (Grant No. F201114).

REFERENCES

  • Schwarz, H., D. Marpe and T. Wiegand, 2007. Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Trans. Circuit Syst. Video Technol., 17: 1103-1120.


  • Mallat, S. and F. Falzon, 1998. Analysis of low bit rate image transform coding. IEEE Trans Signal Process., 46: 1027-1042.
    CrossRef    Direct Link    


  • Dai, M., D. Loguinov and H. Radha, 2004. Rate-distortion modeling of scalable video coders. IEEE Int. Conf. Image Process., 2: 1093-1096.
    CrossRef    Direct Link    


  • He, Z. and S.K. Mitra, 2001. A unified rate-distortion analysis framework for transform coding. IEEE Trans Circuits Syst. Video Technol., 11: 1221-1236.
    CrossRef    Direct Link    


  • Wang, M. and M. van der Schaar, 2006. Operational rate distortion modeling for wavelet video coders. IEEE Trans. Signal Process., 54: 3505-3517.
    CrossRef    Direct Link    


  • Foo, B., Y. Andreopoulos and M. van der Schaar, 2008. Analytical rate-distorition-complexity modeling of wavelet-based video coders. IEEE Trans. Signal Process., 56: 797-815.
    CrossRef    Direct Link    


  • Meng-Ping, K. and T.Q. Nguyen, 2010. Rate-Distortion optimized bitstream extractor for motion scalability in wavelet-based scalable video coding. IEEE Trans. Image Process., 19: 1214-1223.
    CrossRef    Direct Link    


  • Taubman, D.S. and M.W. Marcellin, 2002. JPEG2000: Image Compression Fundamentals, Standards and Practice. Vol. 1, Spronger, Boston, MA., USA., ISBN-13: 9780792375197, Pages: 773


  • Xu, J., Z. Xiong, S. Li and Y.Q. Zhang, 2001. Three-dimensional embedded subband coding with optimized truncation (3-D ESCOT). Applied Comput. Harmonic Anal., 10: 290-315.
    CrossRef    Direct Link    


  • Ohm, J.R., 1994. Three-dimensional sub-band coding with motion compensation. IEEE Trans. Image Process., 3: 559-571.
    CrossRef    Direct Link    


  • Taubman, D., 2000. High performance scalable image compression with EBCOT. IEEE Trans. Image Process., 9: 1158-1170.
    CrossRef    Direct Link    


  • Berger, T., 1972. Rate Distortion Theory: A Mathematical Basis for Data Compression. Prentice-Hall, Englewood Cliffs, USA., ISBN: 9780137531035


  • Po-Yuen, C., J. Li, J. Kuo, 1997. Rate control for an embedded wavelet video coder. IEEE Trans. Circuits Syst. Video Technol., 7: 696-702.
    CrossRef    Direct Link    


  • Xie, M., G. Wei and Y. Ling, 2009. Analysis and application on rate-distortion model oriented scalable video sequences. Inform. Technol. J., 8: 188-194.
    CrossRef    Direct Link    

  • © Science Alert. All Rights Reserved