Vector Quantization (VQ) has been widely used for data compression due to its theoretical advantages compared to scalar quantization schemes. However, the computational complexity of both the codebook design and the vector lookup during the encoding phase is obviously a burden of its realization. Since the codebook can be pre-generated before the encoding process, the efficiency of the vector lookup is comparatively more significant (Chan et al., 1994; Hwang et al., 2001). Vector Quantization (VQ) has been successfully used in speech and image data compression (Liang et al., 1995).
One of the emerging technologies for lossy data (image in this research) compression is Vector Quantization (VQ). Vector Quantization can be used to take advantage of the correlation between neighboring pixels by quantizing pixels in groups (or vectors) rather than individually and symbolically representing the vector with a codeword. The appropriate codeword is chosen from the available codebook by minimizing a given distortion measure. Often, the distortion measure is simply the mean squared error between the quantized pixels and the codevector (Cabral, 1994; Lin and Chang, 2006).
Vector Quantization (VQ) is a source coding methodology with a provable rate-distortion optimality. However, despite more than two decades of intensive research, VQs theoretical promise is yet to be fully realized in image compression practice (Wu and Wen, 1999). Data compression using Vector Quantization (VQ) has received great attention in the last decade of its promising compression ratio and relatively simple structure. In its simplest implementation, VQ requires breaking the signal to be compressed into vectors (which may be referred to as a blocks). Each vector of the signal to be compressed is compared to the entries of a codebook containing representative vectors. The address of the codebook entry most similar to the signal vector is then transmitted to the receiver, where it is used to fetch the same entry from an identical codebook, thus reconstructing an approximating to the original signal. Compression is obtained because transmitting the address of a codebook entry requires fewer bits than transmitting the vector itself (Midanda-Trigueros et al., 1999; Huang et al., 1992; Nasrabadi and King, 1988; Kumar, 1999).
A surprising number of everyday problems are difficult to solve by traditional algorithms. A problem may qualify as difficult for a number of different reasons; for example, the data may be too noisy or irregular; the problem may be difficult to model; or it may simply take too long to solve. Its easy to find examples: finding the shortest path connecting a set of cities, dividing a set of different tasks among a group of people to meet a deadline, or fitting a set of various sized boxes into the fewest trucks. In the past, programmers might have crafted a special-purpose program for each problem; now they can reduce their time significantly by using a genetic algorithm (GA) (Al-Rawi and Stephan, 1999; Grant, 1995; Ryu and Eick, 1995; Louis, 1997; Ou and Chen, 2006).
Genetic Algorithm (GA) has been successfully applied to codebook design for
Vector Quantization (VQ). However, most conventional GA-based codebook design
methods need long runtime because candidate solutions must be fine tuned by
LBG. Codebook design is the key problem of VQ and the generated codebook has
more effect on the compression performance. The traditional codebook design
method*LBG algorithm is affected by the initial codebook, often generates the
local optimal codebook and needs intensive computation. Research efforts in
codebook design have been concentrated in two directions: to generate a better
codebook that approaches the global optimal solution and to reduce the computational
complexity (Huang et al., 2001).
VQ using GA (GAVQ): A proposed algorithm: In this section, a new algorithm
for Image compression is suggested. This algorithm tries to exploit the facilities
of Genetic Algorithm (GA) and then uses these facilities to enhance the use
of the Vector Quantization method. The main steps of the proposed algorithm
can be stated as follows:
||Problem Representation (focusing on the choice of a suitable
representation of the problem).
||Clustering (grouping the most similar input instances in sets that have
common characteristics between these input instances).
||Genetic Operations (performing the Crossover and Mutation operations on
the elements of the sets that are produced from Step 2).
||Merging (merging those sets becomes nearest after performing the genetic
operations in Step3).
||Performance Evaluation and Termination Criteria (testing the performance
of the algorithm at each generation and termination criteria, if the termination
criteria are satisfied then STOP, otherwise GOTO Step 2).
Problem representation: The first step of the GAVQ algorithm is the
preparing step, which involves the representation of the problem in such a way
that it becomes suitable to be applied by the GAVQ algorithm.
Vector representation: Initially, the image is divided into a non-overlapped
blocks of dimension HxW (for examples 2x2,
). Each block is then represented
as a vector of D-dimension (i.e., D = HxW). Now, consider the case that one
has to represent a given D-dimensional input vector =
(x1, x2, x3,
, xD) with a particular
(c1, c2, c3,
, cD) selected as
the best representative of the vector within
a codebook (i.e., the codevectors are extracted from the input vectors). The
selection is based on the minimum Euclidean distance criterion. The VQ approach
is the process that finds a mapping from the set of N input vectors to
the M output vectors (codewords or codevectors) these
M codevectors represent the codebook as shown in Fig. 1.
Chromosomal representation: For any GA, a chromosome representation
is needed to describe each individual in the population.
The representation scheme determines how the problem is structured in the GA
and also determines the genetic operators that are used. Each individual or
chromosome is made up of a sequence of genes from a certain alphabet (i.e.,
letters, integer numbers, floating point number or bytes). Consider an input
(x1, x2, x3,
, xD) be an individual
(chromosome) in the population that consists of N individuals (input vectors),
each chromosome consists of D number of bytes (genes), so the chromosome contains
a string of (D*8) binary digits or bits (i.e., 0 or 1), where (D*8) represents
a chromosome length, as shown in Fig. 1.
Genetic algorithms use a number of parameters to control their evolutionary search for the solution to their given problem. These parameters include selection operator, crossover operator, mutation operator, crossover rate, mutation rate and maximum number of generations.
At the start of the proposed algorithm, the population size is equal to N,
which contains all input vectors .
Also, the codebook itself contains all input vectors as the codevectors initially
(i.e., M = N).
Clustering: This operation assigns each vector in
the population to the codevector (for
j = 0,1,2,
, M-1) according to the similarities between and
in another words, a vector is
assigned to the codevector if
the nearest codevector to ,
where means the distortion measure between two vectors of D-dimension given
The selection of the nearest codevector needs search through all the individuals
in the population using a full sequential search. This may take a long time
(for a large number of input vectors) at the first generation, but it becomes
suitable more and more after some generations.
The clustering operation produces to the next step (i.e., Step 3) a codebook
of M codevectors where
M≤N. Each codevector
involves a list of indexes (members) that represent all input vectors (members)
which belong to this codevector.
Genetic operations: Genetic operations (crossover and mutation) are
applied to each element
of the codebook to get a more suitable codevector that represents its members.
Initially, before the start of the algorithm, the parameters that control the
working of the GA must be set. The random selection of individuals (parent(s))
for genetic operations is considered to be the selection operator. The crossover
operator is a single point (byte) in each individual (parent). The complement
method is considered to be the mutation operator. Each crossover and mutation
operations are applied on 50% of individuals in the population for each generation.
The maximum number of generations is considered 20.
Crossover operation: For each codevector
in the codebook, select randomly from its members list one index
for example I (i.e., one input vector )
as a first parent and assume that the codevector itself
is to be the second parent. In the next step, one point (byte) in each parent
is randomly selected (xk and ck for k = 0,1,2,
and then exchange these points between the two parents to produce two child
(i.e., two new vectors)
After that, the checking step begins to check the most adequate vector among
the vectors( ,
replace it as a new codevector that
represents its members list as a better codevector and ignore the two
others. This check process is performed by calculating the total distortion
over the members (vectors) that belong to the codevector ,
for each one of the three vectors (i.e., TD(),
The Total Distortion (TD) can be calculated as follow, note that yj
in Eq. (3) represents the jth byte (gene) in each of the vectors
xi represents the ith input vector (member) that belong to the codevector
where E is the number of members (vectors) in the list of members of the codevector
D is the vector dimension (i.e., chromosome length in byte). The crossover operation
is shown in Fig. 2.
Mutation operation: For each codevector in
the codebook, select randomly one point (gene) and complement it. Then, replace
the new gene in the place of the old one to produce a new vector.
Now, calculate the total distortion for the new vector TD()
as in Eq. (3) and compare it with the total distortion TD()
for the original codevector. The new vector is
set as a new codevector in place of the original codevector if
otherwise ignore this new vector .
The mutation operation is shown in Fig. 3.
Merging: The goal of this step is to merge each two codevectors and
the rate of the distortion per each two genes within the two codevectors (D(,)/
D) ≤ ε, where ε is a small number less than one (0.05 in this algorithm),
D is the vector dimension and I, j = 0,1,2,
, M-1 (where M is the number
of codevectors in the codebook).
Performance evaluation and termination criteria: After each generation,
some performance evaluation and termination measures are calculated to decide
if the algorithm goes to do a next generation or to stop. Signal to Noise Ratio
(SNR), Peak Signal to Noise Ratio (PSNR) and Normalized Mean Squared Error (NMSE)
should be calculated between each input vector and the reproduction codevector
within the codebook that these input vectors belong to as shown in Eq.
4-6. This means really that these measures are calculated
between each pixel in the source image file and the reconstructed image file
after the de-compression operation is done. For simplicity, let b(I, j) and
j) be the pixel in the raw I and column j within the source and decompression
image file, respectively.
The peak value of b(I, j) is the total dynamic range of the input image and
N is the source image dimension.
To check the compression performance, the values of Compression Ratio (CR) and Bit Per Pixel Rate (BR) are calculated. The compression ratio is the amount of compression, while the BR rate is the number of bits required to represent each pixel value of the compressed image. They are calculated in two forms, On-line (when the codebook is taken into consideration) and Off-line (when the codebook is not taken into consideration):
The better compression performance of the algorithm is with the highest compression
ratio (the least BR) and the highest PSNR.
The termination criterion in this algorithm is that the algorithm will stop
on at least if one of the following condition is satisfied:
||SNR ≥ 22.0 and CR ≥ 4.0 or:
||The number of generation reached the Max, (where Max is the maximum number
of generations that are allowed in the algorithm (Max = 20 in this algorithm)).
The Table 1 show the results that are obtained after applying
the (GAVQ) proposed algorithm and one of the classical VQ algorithms (k-means
in this Table) on a set of 256 gray-scale levels image data files (Image 1-4
of size 512x512 pixels; Fig. 4). All programs are written
by using Visual C++ language (version 6.0) and these programs are executed on
the Pentium III (500 MHZ) personal computer.
From the comparison of Table 1, we can note :
||The GAVQ algorithm success to increase CR with saving the
SNR in the acceptable range.
||The number of codevector is less than the classical k-means algorithm.
||The crossover and the mutation operations in the above table represent
the successful crossover and mutation operations that are participate to
increase the CR. Because the algorithm excludes any crossover and mutation
operations that are not causing to increase the CR.
||Inspite of the major problem (i.e., the long time) of using GA in any
application. The required time for compression when using the GA (in this
work), by mixing it with the VQ method, is near to the required time by
the k-means. Certainly, it is not suitable to use this algorithm in the
real-time systems because the genetic algorithm required long time (especially
when the population become very large).
||Comparison table between GAVQ and classical k-means VQ compression
||Image data files
||Table 1presented some examples of image
files of a small size. Certainly, when using a file of large size, the algorithm
firstly divide the file into a number of segments and then apply the steps
of the algorithm on these segments to reduce the amount of time and memory
space that are required to complete the compression operation.
||Obviously, when we used the SNR and PSNR measures to calculate the distortion
(noise) that will appear in the image data during the compression operation
is not enough to ensure that this noise not causes a bad effects in the
view of the image. We can ensure the effect of this noise by adding another
subjective test to the algorithm, but for comparing the performance of the
proposed algorithm and another classical algorithm, it is obvious to use
objective test as SNR and PSNR.
Vector Quantization (VQ) compression method was first studied and implemented,
then the new method GAVQ that makes use of the GA to design the VQ codebook
was proposed, it exploits the crossover and the mutation operations of the genetic
algorithm. The GAVQ algorithm was tested on some image files. The recorded results
showed that the idea of combining the genetic algorithm and the vector quantization
compression method supports well the use of the (VQ) method.