In some critical application areas, it is not allowed to modify the contents of the text, like military, legal and literature fields. Therefore, restoring the original contents of the text becomes a practical and important issue for text watermarking. This study aims to deal with this problem. We firstly present the concept of reversible text watermarking and then bring forth an effective scheme to achieve reversible text watermarking. Based on the synonym substitution method, the proposed algorithm applies an invertible transform to embed watermark, extract watermark and recover the original contents of the text. By using the reversible watermarking scheme, one can not only protect the interests of the author of the text, but also get the original contents of the text, if the receiver has the wish and right to revert the original contents of the text. Moreover, the scheme improves the payload capacity via using high embedding level, compressing the watermark or repeating the algorithm more than one time.
PDF Abstract XML References Citation
How to cite this article
With the rapid development of network, there are many multi-media contents in the network. However, text is still the main medium for transferring and distributing information of the internet (Topkara et al., 2005). Thus, how to prevent the text information from pirating is a key task, in terms of which, the technique of text watermarking is very useful and it has been studied a lot in recent years (Qadir and Ahmad, 2006).
Currently, there are mainly three kinds of text watermarking schemes: schemes taking texts as binary images (Ge et al., 2002), schemes using characteristics of the text (Brassil et al., 1994) and schemes using natural language processing (Topkara et al., 2006). The former two do not change the contents of the text, but they can easily be attacked by retyping; the third one called natural language watermarking has a great advantage over the former two with reference to robustness. So, natural language watermarking is the most prominent scheme for text watermarking. However, every coin has two sides; natural language watermarking has the defect that it may twist the meaning of the words, sentences or even the whole text (we name it as meaning distortion). One typical method of natural language watermarking is the synonym substitution by which the original words are replaced by their synonyms, which often causes the change of semantic meaning.
The meaning distortion generated by natural language watermarking is often quite small and imperceptible, but in military, legal and literature fields, even a very slight change of the contents of the text is undesirable. For example, if a legal paper uses natural language watermarking to embed the authors information, it may create a meaning distortion and mislead a judge to make a false verdict. The practical demands expedite the research on a watermarking scheme which can revert the contents of the text as well as embed watermark and extract watermark. There exists such a scheme for image, called reversible watermarking (Feng et al., 2006). Reversible watermarking is a novel watermarking technique which enables the encoder to embed some secret information into the image and the decoder to extract the embedded information exactly and restore the total original image. Reversible watermarking has been researched for several years for image and many successful results have been obtained. Up to now, there are three categories of reversible watermarking algorithms available: the lossless compression approaches (Celik et al., 2005), the histogram approaches (Yang et al., 2004) and the expansion approaches (Tian, 2002; Thodi et al., 2007; Coltuc and Chassery, 2006; Coltuc, 2007; Coltuc and Chassery, 2007; Chrysochos et al., 2009). Yet, to the best of our knowledge, no research was done on reversible text watermarking. In this study we firstly put forward the concept of reversible text watermarking and then we offer an effective method to study the reversible text watermarking, because the algorithms for image are not suitable for text out of the differences between text and image. In our reversible watermarking scheme, we use an invertible transform which applies the feature of the floor function to embed watermark, extract watermark and revert the contents of the text. The experimental result shows that we can improve the embedding capacity by using high embedding level or compressing the watermark or repeating the scheme more than one time.
REVERSIBLE TEXT WATERMARKING AND THE INVERTIBLE TRANSFORM
Here, we present three matters: the definition of reversible text watermarking, the three main differences between image and text and the details of the invertible transform used in our proposed scheme.
Reversible text watermarking:
In accordance with the concept of the reversible watermarking for image, we define reversible text watermarking as follows: reversible text watermarking is the technology for embedding covert information into the text and restoring the original contents of the text as well as extracting the embedded information from the text. And the reversible text watermarking is based on natural language watermarking which alters the contents of the text. Here, restoring the original contents of the text refers to the recovery of the original words and sentences in the text having been changed through the watermarking process.
The technique of reversible watermarking meets the requirements of robustness, imperceptibility and readily embedding and extracting. Besides, it owns the following features differing from the traditional nonreversible watermarking:
|•||Blind embedding and extracting, which means the original contents is reverted while extracting the watermark|
|•||High embedding capacity, which indicates adequate space in the text to embed covert information|
The procedures of embedding and extracting are almost the same as traditional nonreversible watermarking schemes, except that the original contents can be reverted while extracting the watermark.
Differences between text and image:
Image is a matrix made up of pixels which are organized in a predefined order, while text refers to a list of words and punctuations. And there exist some differences between them which can be used for reversible text watermarking:
|•||The range for pixel values is a constant value for a certain gray level, e.g., for gray level 8, the range of pixel values is (0, 255). However, the range of synonym indexes (we name the index of a word in its synonym set as synonym index) does not follow such a law, e.g., for a word having 4 synonyms, the range of the synonym indexes is (0, 3), while for a word having 8 synonyms, the range of the synonym indexes is (0, 7)|
|•||Vast changes to pixel values affect the imperceptibility of image, but the alterations of the synonym indexes do not change the feelings of human beings|
|•||The threshold of pixel values is much bigger than that of synonym indexes|
The method which is demonstrated earlier is based on the above three differences.
Inspired by Coltuc and Chassery (2006, 2007), we choose to use the following invertible transform for the proposed algorithm.
Let (x1, x2) be a pair of integer values whose domain is:
where, L is the embedding level which controls the length of an embedding unit and n is a positive integer constant. Then, our transform is defined as:
Here, we should make sure (x1, x2) is in the specified domain D. Therefore, we have the following equations as our constraints:
The inverse transform of Eq. 1 is defined as follows:
where, is the floor function which returns the greatest integer less than or equal to x.
In the following contexts, we set n to 1 to illustrate the proposed algorithm.
to replace (x1',x2') in Eq .3 and assume (x1',x2')εD: if LSB (x1') + LSB (x2') = 0, then, (x1,x2) is exactly calculated from Eq .3; if LSB (x1') + LSB (x2') ≠ 0, then, (x1,x2) is decreased by (1,1) from Eq. 3.
Thereby, we can use the LSBs of (x1,x2) to embed information. Firstly, we transform (x1,x2) into (x1',x2') by Eq. 1. If (x1',x2')εD and LSB(x1')+LSB(x2') = 0 then:
(y→x means to assign y to x) and let LSB(x2') be available for coding one bit information; if (x1',x2')εD and LSB (x1') + LSB (x2') ≠ 0, then:
and let LSB(x2') be available for coding one bit information; if (x1',x2')¬ini;D, then record LSB(x2') for recovering the original x1 and
There may be some successive un-embeddable pairs which do not satisfy (Eq. 2). In order to recover the original pairs, we need to embed the LSBs of the first elements in all successive un-embeddable pairs as well. In order to improve the embedding capacity, we can just embed the LSBs of the first elements of the successive un-embeddable pairs at odd positions and let the other pairs at even positions be unchanged. By doing so, we can save c/2 bits available for embedding covert information, where, c is the number of the successive un-embeddable pairs.
As stated earlier, reversible text watermarking refers to natural language watermarking. And the most popular natural language watermarking scheme is synonym substitution. For a word w having synonyms and watermark bits b in binary representation, the synonym substitution method just replaces w with its synonym whose synonym index is b. Assuming the synonym number for word w, which is the number of words in the synonym set of w, is n, the synonym substitution method can embed bits information into the position of w.
Our proposed method is based on the synonym substitution scheme.
Regardless of the three differences between text and image, we still treat the synonym indexes as the pixel values through a simulation process described in the following subsection. After that, we can embed watermark, extract watermark and recover the original contents of the text.
The general procedure of our scheme consists of the following 3 steps:
|•||Convert the synonym indexes of words in the text into pixel values|
|•||Use the above mentioned invertible transform and the synonym substitution scheme to embed watermark, extract watermark and restore the original synonym indexes of words|
|•||Get the pixel values back into synonym indexes and replace synonym indexes with words|
In the following subsections, we expand the 3 steps in details.
As required by the general process of our scheme, we should perform the following pixel value simulation procedure before embedding watermark or extracting watermark: Concatenate all synonym indexes of words in the text into a single string and divide the concatenated string into groups of embedding level L before performing step 2. The algorithm can be performed as algorithm 1 describes.
Algorithm 1: Pixel value simulation
After embedding the watermark into the simulated pixel values or recovering the original simulated pixelvalues, we should turn the pixel values into the synonym indexes and get the text by using the corresponding words for the synonym indexes (algorithm 2). The algorithm is described below.
Algorithm 2: Synonym indexes recovering
Algorithm 1 and 2 correspond to the first and last steps of the aforementioned general procedure of our scheme.
As described in the invertible transform subsection, for a pair (x1,x2), we use LSB of the first element to indicate whether one bit information is embedded into LSB of the second element (algorithm 3). The watermark embedding algorithm is shown as the following one.
Algorithm 3: Watermark embedding
Watermark extracting and the original contents of the text recovering:
In the phase of extracting watermark and recovering the original contents of the text, we perform the reverse operations used in the watermark embedding phase (algorithm 4). The algorithm can be executed as below.
Algorithm 4: Watermark extracting and original contents recovering
Our scheme was examined in several tests. We chose six groups of texts whose number of words having synonyms were about 100, 140, 180, 220, 260 and 300, respectively. The synonyms used in our tests were extracted from the famous synonyms library WordNet.
|Fig. 1:||Average embedded bits of the proposed scheme with embedding level 4, embedding level 6 and embedding level 8|
Applying our proposed method with different embedding levels, results obtained from the test texts are shown in Fig. 1, The Fig. 1 demonstrates that the average embedded bits increase with the embedding level: The average embedded bits at the embedding level 6 is about 1.2 times than that at the embedding level 4; the average embedded bits at embedding level 8 is 1.7 times than that at the embedding level 4 and 1.2 times than that at the embedding level 6.
We take the embedding level 8 as an example to make a comparison about the average embedded bits obtained by performing the proposed algorithm once and twice and by adopting the Golomb Coding to compress the watermark. Besides, we also compare the results between that obtained through using the proposed algorithm and that obtained through the traditional synonym substitution method, all the test results are shown in Fig. 2.
The increases are obvious when we use compression. Through Golomb Coding, the average embedded bits at the embedding level 8 increase by 1.5 times comparing with that at the embedding level 8. Moreover, the embedding capacity can be improved by repeating our proposed scheme. At the embedding level 8, the average embedded bits obtained by performing the algorithm twice are about 1.7 times than that obtained by performing the algorithm once.
The essential feature of our proposed scheme is to recover the original contents of the text. To achieve this purpose, we need to embed extra information used to do the recovery. Hence, the embedding capacity is much lower than that of the traditional synonym substitution method. However, it can still embed some information to identify the author of the text and we can improve the embedding capacity by using a lossless compression algorithm to compress the watermark or executing the algorithm more than one time.
|Fig. 2:|| |
Average embedded bits at the embedding level 8 by performing the proposed scheme once, performing the proposed scheme twice, adopting Golomb coding and the traditional synonym substitution method
This study described the concept and features of reversible text watermarking and proposed a practical scheme to achieve the reversible text watermarking. In this proposed scheme, we applied the synonym substitution scheme as the cornerstone and introduced an invertible transform to perform the embedding and extracting procedures.
Though the embedding capacity of our proposed scheme is lower than that of the synonym substitution scheme, our scheme as a reversible watermarking scheme can revert the original contents of the text. Hence, it is of great significance to be applied in such domains as legal, military and literature areas.
In the future, we will provide more flexible invertible transform to improve the payload capacity.
The research project was partially or fully sponsored by National Basic Research Program 973 (Grant No. 2006CB303000), special for National Basic Research Program 973 (Grant No. 2009CB326202, 2010CB334706), National Natural Science Foundation of China (Grant No. 60736016, 60873198, 60973128, 60973113), and Science Program of Institutes of Higher Education of Hunan Province (Grant No. 09w023).
- Brassil, J., S. Low, N. Maxemchuk and L. O'Garman, 1994. Electronic marking and identification techniques to discourage document copying. Proceedings of the IEEE INFOCOM Networking for Global Communications, Jun. 12-16, IEEE Press, Piscataway, New Jersey, pp: 1278-1287.
- Coltuc, D. and J.M. Chassery, 2007. Very fast watermarking by reversible contrast mapping. IEEE Signal Process. Lett., 14: 255-258.
- Feng, J.B., I.C. Lin, C.S. Tsai and Y.P. Chu, 2006. Reversible watermarking: Current status and key issues. Int. J. Netw. Secur., 2: 161-171.
- Topkara, M., C.M. Taskiran and E.J. Delp, 2005. Natural language watermarking. Proc. Int. Conf. Secur. Steganogr. Watermarking Multimedia Contents, 5681: 441-452.
- Yang, B., S. Martin, X. Niu, B. Christoph and S. Sun, 2004. Reversible image watermarking by histogram modification for integer DCT coefficients. Proceedings of the IEEE 6th Workshop on Multimedia Signal Processing, Sept. 29- Oct. 1, IEEE, Siena, Italy, pp: 143-146.