On Skew Estimation of Persian/Arabic Printed Documents
In this study, we propose two methods especially designed to detect skew
in Persian/Arabic prints. The first one is based on vertical Black Line
Segments (BLSs) which are the baselines of horizontal projection profiles
of vertical strips. This method is fast and presents good accuracy against
practical skews. The second method, which is based on vertical White Line
Segments (WLSs), offers superior performance in recognizing large skew
The first step in processing a document by computer is to create
the digitized version of it, which is usually done by a scanner device.
However, human fault (or document feeder impairment) in placing the document
on the platen may lead to some skew in the document image. Being so, a
proper measurement against the skew should be considered before document
reorganization. The skew detection procedure may be performed at one of
the following levels:
||Page/Text blocks level: In most machine printed documents,
entire page is skewed. Therefore, skew detection may be done before
page segmentation. In some documents such as advertisements, however,
text blocks are printed in different orientation intentionally. Hence,
skew correction should be postponed until blocks are classified. Almost
all methods using this strategy use the fact that text lines in a
document image remain parallel even in presence of skew.
||Group level: In this case, the slant should be corrected
for each group (a connected component, a character, a word or part
of a word). It is generally applicable to documents printed in a cursive
language or handwritten ones. It is suitable, albeit slow, to detect
||Feature level: Document recognizers, using this strategy,
utilize some rotation invariant features for recognition, so it is
not necessary to correct the skews. However, it is rarely used in
practice because the number of rotation invariant features is not
high enough to lead to strong results.
The proposed methods estimate skew at page level. Researchers have introduced
many methods for this purpose. These methods are usually categorized based
on the techniques they utilize, such as projection profiles, Hough Transform,
cross correlation or gradient analysis (Cattoni et al., 1998; Hull,
1998; O`Gorman, 1993; Okun et al., 1999). Considering the main
idea behind them, we classify these methods into four following classes:
||Methods that choose a set of representative points (mostly related
to text lines) and identify explicitly or implicitly few parallel
straight lines using some line fitting methods (Antonacopoulos, 1997;
Hashizume et al., 1986; Hinds et al., 1990; Nakano et
al., 1990; Smith, 1995; Yu and Jain, 1996).
||Methods that extract some orientation sensitive features and use
them to calculate the skew (Li et al., 2007; Chou et al.,
2007; Dong et al., 2005; Amin and Wu, 2005; Kapoor et al.,
2004; Kapogiannopoulos and Kalouptsidis, 2002; Rundle, 1974; Sauvola
and Pietikäinen, 1995; Sun and Si, 1989).
||Methods that define some target criteria which are maximized or
minimized to alleviate the skew angle (Baird, 1987; Ishitani, 1993;
Postl, 1986; Srihari and Govindaraju, 1989).
||Other methods that are not placed in the above classes such as ones
that use hardware to detect page properties (Aghajan et al.,
1994; Yamada, 1989).
Each of the above classes has its advantages and disadvantages. Methods
of class I usually use bounding blocks of connected components to select
representative points. As locating connected components is time-consuming,
they are not generally time efficient, especially when Hough Transform
is used for line fitting. However, they usually achieve accurate skew
estimation. Calculating orientation sensitive features may not be cost effective,
albeit, class II methods are typically faster than class I methods. One
disadvantage of this class is inaccuracy of the results in most of the
cases. Class III requires a search mechanism to find minimum or maximum
of the target criteria which is generally time-consuming.
Detecting skew in a document image prepared by a cursive language is
so complex that many existing methods may not present adequate performance.
In such documents, words may consist of scattered pieces that are located
above or below base lines. Moreover, the width and the height of connected
components are considerably different. Furthermore, documents written
in these languages usually are less structured than others. Class I methods
have usually worst results in case of cursive text, as they do classification
(in text and non-text blocks) based on the size of bounding boxes or other
features that are not appropriate for cursive languages.
Basically, skewed cursive language documents are treated in two manners:
as generic documents or as cursive handwritten ones. Considering these
documents, as handwritten and detecting skew at non-page level is not
favorable, since the structured-ness of these documents, such as presence
of a base line for each text line, is ignored. As such, there is a need
to design new skew detection algorithms for such documents.
Persian and Arabic are from the most used cursive languages and are similar
in many aspects such as writing styles. Researcher may refer to Hadjar
and Ingold (2003) for detailed information about Arabic language.
Although many papers have been published on skew estimation, few of them
are related to Arabic or Persian languages. Amin (2000) has introduced
a method that estimates skew at group level. At first, connected components
are extracted from the document image and then they are grouped based
on some criteria. Afterwards, each group is divided into vertical segments
of approximately the width of one connected component and only the bottom
rectangle in each segment is stored. At last, Hough Transform is utilized
to detect skew angle.
Sarfraz et al. (2003) have proposed a procedure named Drift Correction,
which first determines the rotation angle of the text by computing the
tangents of all the line segments that can be constructed between any
pair of black pixels in the image. The angle that has the highest number
of occurrences is assumed to be angle of skew for the image. Sarfaraz
et al. (2005) have utilized Haar wavelet to decompose image into
detail sub images and have used multi-scale properties of the image along
with Principal component analysis to estimate the orientation of principal
axis of clustered data.
To deal with document printed in these two languages, we propose two
methods which belong to class I, but are quite fast and accurate. These
methods use the horizontal projection profiles of the strips rather than
features extracted from connected components.
THE FIRST PROPOSED METHOD
First proposed method is based on the vertical black line segments.
This section delivers the details of this method.
A well-known technique in document image analysis is splitting document
page into narrow horizontal/vertical strips and then calculating vertical/horizontal
projection profiles of these strips, respectively (Akiyama and Hagita,
1990; Min et al., 1996; Pavlidis and Zou, 1991; Schlang, 1985).
Here, we employ vertical strips and their horizontal projection profiles
(HPPs). The strips should be wide enough to contain useful information.
Research show that the width of two small-sized or one medium-sized character
(roughly 24 pixels in 100 ppi images) yields acceptable results. The advantage
of using HPPs of these strips rather than analyzing HPPs for entire page
is that the parts related to the text lines do not attach to the adjacent
lines, if strips are not too wide and the skew is not severe.
The baselines of HPPs of strips, which we refer to them as vertical Black
Line Segments (BLSs), are the building blocks of our method. In Fig.
1, a typical Persian document and its extracted BLSs are shown.
Filtering the BLSs: We need to identify the BLSs corresponding
to the text part of the document image. For this purpose, the conditions
are checked for each BLS:
where, TBLS, D and T denote the thickness of text line corresponding
to a BLS, the dominant interline distance and the dominant text line thickness,
respectively. If all the conditions are satisfied, the BLS is considered
as related to the text parts, otherwise it is discarded:
As an estimate of T, the peak of the histogram of the length of the BLSs
is considered. D is obtained approximately using the histogram of the
distances between consequent peaks of HPPs corresponding to the BLSs.
It is worthy of mention that text written in Arabic language has a salient
peak related to baselines in their HPPs (even in presence of a little
||A typical Persian document and its black line segments
||The BLSs of Fig. 1 after filtering
Applying the mentioned rules, those BLSs would be left over that are
more likely related to the text lines. Figure 2 is obtained
from BLSs of Fig. 1, using this method.
Assigning the BLSs to text lines: The next step is to identify
the text lines from the determined BLSs. This is performed by assigning
an incremental number to each BLS as its text line number and then considering
the BLSs with the same number as a text line. The idea behind this procedure
is that: if two BLSs in two adjacent strips are the nearest ones to each
other, they are related to a text line. Note that the word nearest is
ambiguous. In this research Manhattan distance of the middle points of
two BLSs used as a measure of their nearness. Assume A and B are two BLSs
in two adjacent strips.
||The extracted text lines for the document of Fig.
Suppose that Amid and Bmid are the middle points of A and B, respectively, then
the distance between A and B is calculated as:
Since, A and B are located in adjacent strips, the first term of the
above equation is equal to the strips width which is a constant number.
As this function is used only for comparison, one can ignore the first
term and obtain:
The above procedure links the BLSs with the same text line numbers together
to obtain the extracted text lines, as shown in Fig. 3.
Extracting representative points: In the next stage, we need to
select a number of representative points for each text line. The midpoints
of the BLSs related to the same text line may be chosen for this purpose.
However, a better choice is the intersection points of diameters of trapezoids
constructed with consequent BLSs, which are related to the same text line.
Once the representative points are identified, Least Square Error method
is used to fit straight lines to the points related to each text line.
The majority vote of the gradient of these lines identifies the skew angle.
Since the gradients are not necessarily an integer number, quantization
is unavoidable. Experiments show that by considering 0.5° as the step
size of quantization, the results are acceptable.
THE SECOND METHOD
Although the BLS method is fast, the size of skews detected by this
method does not satisfy the need of some applications. This limitation
arises from two reasons:
||The way BLSs are built: If the skew is increased, we should make
the strips more narrow to prevent joining HPPs. But as mentioned earlier,
the strips can not be too narrow.
||The manner text lines are extracted: Try to extract text lines and
use them to determine the skew, any error in text line identification
would lead to significant error.
To overcome these limitations this research propose a second method which
is called WLS. In this method we white pixels are used rather than black
ones to detect representative points. Furthermore, this study introduces
a new method to detect straight-lines from the representative points.
The WLS method: The WLS method is on the basis of the following
facts in Persian/Arabic text:
||Many parts of the text are placed on baselines. These parts are
related to some letters (mostly, initial or medial letters) or to
the linkage between them.
||The baselines are parallel and they remain parallel even in presence
of skew. Therefore the vertical distances between two text lines are
almost constant and are equal to Dominant Interline Distance (DID).
||DID is much larger than pen thickness. So, if we use vertical White
Line Segments (WLSs) as new building blocks, it`s no need to find
a way to get rid of dots and diacritic. Note that in BLS method HPPs
use to avoid this problem.
||The white line segments (WLSs) for the document of
DID is calculated as the most occurrence length of WLSs. Moreover, to
remove false WLSs, the ones which are very shorter or longer than DID
are ignored (Fig. 4).
It is worthy to notice that both BLS and WLS methods result in blank
blocks for non-text parts of a document image. So, one may use these two
methods to categorize the text and non-text image blocks.
Extracting representative points: Similar to BLS method, the representative
points are the set of pixels that are generally placed on the baselines.
To define the representative points, the midpoints between the bottom
points of WLSs related to a text line and the top points of WLSs related
to the line below are selected.
To determine whether the line B is below a selected text line A, the
following conditions are checked:
The first condition guarantees that B is related to one of text lines
below A. The second condition restricts the distance between A and B to
ensure that B is exactly related to the next text line of A.
To attain higher speed, the number of representative points related to
a WLS can be reduced to one. Experiments show that such reduction still
leads to acceptable results (Fig. 5).
Skew detection: To detect skew from the selected points, it is
proposed a procedure which we call Slop Seeker (SS).
||The representative points taken from WLS of the document
in Fig. 1
This procedure is as accurate as Hough Transform, yet faster than that.
To formulate the procedure, suppose that P1 … Pn
denote the n representative points and let Li,j denote the
line crossing Pi and Pj and θi,j
be the slope of it. θi,j can be calculated by the following
Consider Ci,α to be the number of lines crossing Pi
and their quantized values of slops are all equal to the integer |α|.
Let M stand for maximum degree that can be detected by our method. Now,
to intensify the score of collinear slope values, the energy function
of Ci,α is calculated as:
The maximum value of that energy function gives the estimate of skew:
To evaluate performance of the proposed methods, 455 document images
were considered from different sources such as magazines, newspapers,
advertisements and books. To reduce the run time, the images are down-sampled
to 100 ppi. Then, the images were rotated from -45 to +45° with 0.5°
step size. The performances of BLS and WLS have been evaluated against
the imposed skew values.
||Experimental results for WLS schemes
Furthermore, to have a comparison between the proposed Slop Seeker method and
Hough Transform, both of them have been employed with conjunction to WLS method
(Table 1). The experiments show that BLS, although faster,
fails against skews larger than 15° . Both methods of WLS and Slop Seeker
and WLS and Hough Transform present satisfactory performance, appropriate to
be applied to documents containing mixed text, graphic or line drawing. However,
Slop Seeker performs in much higher speed especially for large skew estimations.
Comparing to a the recent Arabic skew estimation method, WLS method detects
skews up to 45° with 1° error, which is much superior to Sarfaraz et al. (2005) algorithm which detects 10.2627° skew in Arabic
document with -0.2627° error.
In this study, two methods for skew estimation of printed documents
in Persian or Arabic languages are proposed. The first method (BLS), which
is based on detecting black line strips, is fast and simple, yet accurate
against practical skews, however, fails in case of skews larger than 15
degrees. The second method is based on White Line Strips (WLS) of the
printed document and presents good performance against skews up to 45° .
Moreover, we proposed Slop Seeker, an algorithm to estimate the skew,
out of white line strips. Though as accurate as Hough Transform, Slop
Seeker performs faster, especially for large skews. Both of the proposed
methods can be applied to the document images with low resolution as 100
ppi. These methods can be used to categorize the blocks of a document
image to text and non-text classes, as well.
1: Aghajan, H.K., B.H. Khalaj and T. Kailath, 1994. Estimation of skew angle in text-image analysis by SLIDE: Subspace-based line detection. Mach. Vision Applied, 7: 267-276.
2: Akiyama, T. and N. Hagita, 1990. Automatic entry system for printed documents. Pattern Recog., 23: 1141-1154.
3: Amin, A., 2000. Recognition of printed Arabic text based on global features and decision tree learning techniques. Pattern Recog., 33: 1309-1323.
4: Amin, A. and S. Wu, 2005. Robust skew detection in mixed text/graphics documents. Proceeding of 8th International Conference on Document Analysis and Recognition, August 29-September 1, 2005, IEEE Computer Society Press, USA., pp: 247-251.
5: Antonacopoulos, A., 1997. Local skew angle estimation from background space in text regions. Proceeding of the 4th International Conference on Document Analysis and Recognition, August 18-20, 1997, Ulm, Germany, pp: 684-688.
6: Baird, H.S., 1987. The skew angle of printed documents. Proceeding of the SPSE 40th Symposium on Hybrid Imaging Systems, May 20-21, 1987, Rochester, New York, USA., pp: 204-208.
7: Cattoni, R., T. Coianiz, S. Messelodi and C.M. Modena, 1998. Geometric layout analysis techniques for document Image understanding: A review. Technical Report, IRST, Trento, Italy, pp: 1-68.
8: Chou, C.H., S.Y. Chu and F. Chang, 2007. Estimation of skew angles for scanned documents based on piecewise covering by parallelograms. Pattern Recog., 40: 443-455.
9: Dong, J., P. Dominique, A. Krzyyzak and C.Y. Suen, 2005. Cursive word skew/slant corrections based on Radon transform. Proceeding of the 8th International Conference on Document Analysis and Recognition, August 31-September 1, 2005, IEEE Computer Society Press, pp: 478-483.
10: Hadjar, K. and R. Ingold, 2003. Arabic newspaper page segmentation. Proceeding of the 7th International Conference on Document Analysis and Recognition, August 3-6, 2003, Edinburgh, pp: 895-899.
11: Hashizume, A., P.S. Yeh and A. Rosenfeld, 1986. A method of detecting the orientation of aligned components. Patt. Recog. Lett., 4: 125-132.
12: Hinds, S.C., J.L. Fisher and D.P. D’Amato, 1990. A document skew detection method using run-length encoding and the Hough transform. Proceeding of 10th International Conference on Pattern Recognition, June 16-21, 1990, Atlantic City, N.J. USA., pp: 464-468.
13: Hull, J.J., 1998. Document Image Skew Detection: Survey and Anotated Bibliography. In: Document Analysis Systems II, Hull, J.J. and S.L. Taylor (Eds.). World Scientific, Singapore, pp: 40-64.
14: Ishitani, Y., 1993. Document skew detection based on local region complexity. Proceeding of 2nd International Conference on Document Analysis and Recognition, October 20-22, 1993, Tsukuba, Japan, pp: 49-52.
15: Kapogiannopoulos, G. and N. Kalouptsidis, 2002. A fast high precision algorithm for the estimation of skew angle using moments. Proceeding of the IASTED International Conference Signal Processing, Pattern Recognition and Applications, SPPRA, June 25, 2002, Crete, Greece, pp: 275-279.
16: Kapoor, R., D. Bagai and T.S. Kamal, 2004. A new algorithm for skew detection and correction. Pattern Recog. Lett., 25: 1215-1229.
17: Li, S., Q. Shen and J. Sun, 2007. Skew detection using wavelet decomposition and projection profile analysis. Pattern Recog. Lett., 28: 555-562.
18: Min, Y., S.B. Cho and Y. Lee, 1996. A data reduction method for efficient document skew estimation based on hough transformation. Proceeding of the 13th International Conference on Pattern Recognition, August 25-29, 1996, IEEE Press, Vienna, Austria, pp: 732-736.
19: Nakano, Y., Y. Shima, H. Fujisawa, J. Higashino and M. Fujinawa, 1990. An algorithm for the skew normalization of document images. Proceeding of 10th International Conference on Pattern Recognition, June 6-21, 1990, Atlantic City, NJ., USA., pp: 8-13.
20: Okun, O., M. Pietikainen and J. Sauvola, 1999. Document skew estimation without angle range restriction. IJDAR, 2: 132-144.
21: Pavlidis T. and J. Zou, 1991. Page segmentation by white streams. Proceeding of 1st International Conference on Document Analysis and Recognition, September 30-October 2, 1991, France, pp: 945-953.
22: Postl, W., 1986. Detection of linear oblique structures and skew scan in digitized documents. Proceeding of the 8th International Conference on Pattern Recognition, October 27-31, 1986, Paris, France, pp: 687-689.
23: Rundle, A., 1974. Optimum Scan Angle Determining Means. International Business Machines, Inc., U.S. Patent 3,831,146.
24: Sarfraz, M., S. Nazim and A. Al-Khuraidly, 2003. Offline Arabic text recognition system. Proceeding of International Conference on Geometric Modeling and Graphics, July 16-18, 2003, IEEE Computer Society, pp: 30-35.
25: Sarfraz, M., A. Zidouri, S.A. Shahab, 2005. A novel approach for skew estimation of document images in OCR system. Proceeding of International Conference on Computer Graphics, Imaging and Vision: New Trends, July 26-29, 2005, IEEE Computer Society Press, pp: 30-35.
26: Sauvola, J. and M. Pietikäinen, 1995. Skew angle detection using texture direction analysis. Proceeding of the 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, pp: 1099-1106.
27: Schlang, A., 1985. Text line bounding system. Litton Systems Inc., U.S. Patent 4,558,461.
28: Smith, R., 1995. A simple and efficient skew detection algorithm via text row accumulation. Proceeding of the 3rd International Conference on Document Analysis and Recognition, August 14-16, 1995, Montreal, Canada, pp: 1145-1151.
29: Srihari, S.N. and V. Govindraju, 1989. Analysis of textual image using the Hough transform. Mach. Vision Applied, 2: 141-153.
30: Sun, C. and D. Si, 1989. Skew and slant correction for document images using gradient direction. Proceeding of the 4th International Conference on Document Analysis and Recognition, August 18-20, 1989, Ulm, Germany, pp: 142-146.
31: Yamada, M., 1989. Image processing system. Canon, U.S. Patent 4,802,229.
32: Yu, B. and A. Jain, 1996. A robust and fast skew detection algorithm for generic documents. Patt. Recog., 29: 1599-1629.