HOME JOURNALS CONTACT

Information Technology Journal

Year: 2014 | Volume: 13 | Issue: 12 | Page No.: 1961-1968
DOI: 10.3923/itj.2014.1961.1968
Identification of 12+18+206+1 Characters of Dravidian Semmozhi-image Processing Trip to South India
C. Lakshmi, K. Thenmozhi, J.B.B. Rayappan and Rengarajan Amirtharajan

Abstract: In this study, implied an overture for acquainting Tamil Character which is handwritten. This refers the procedure of converting written Tamil font to printed one since it is a tedious to course the above mentioned owing to its deviated writing manner, dimension, angle of direction etc. Here, scanned images get pre-processed and subdivided into, first, paragraphs, then (paragraph) to lines, then (line) to words and finally (word) to separate glyph. This study coalesces structural plus categorization analysis and is determined to be extra proficient in support of outsized and composite sets. Recognition Efficiency is enhanced and this proposal generates finest outcomes apart from doing better than existing methods. Also this routine can be further extended to other Indian languages as well.

Fulltext PDF Fulltext HTML

How to cite this article
C. Lakshmi, K. Thenmozhi, J.B.B. Rayappan and Rengarajan Amirtharajan, 2014. Identification of 12+18+206+1 Characters of Dravidian Semmozhi-image Processing Trip to South India. Information Technology Journal, 13: 1961-1968.

Keywords: Tamil character, Optical character reorganization and artificial neural network

INTRODUCTION

In this modern evolving technology, language is one of the barriers, which stops the effectiveness of the various software technologies from reaching the commonly literate people who know only their regional languages. To show the effectiveness, evolving technologies should also be language compatible. Hand written character recognition is gaining much importance these days (Raj and Abirami, 2012). Historical events of Tamilnadu, importance of Tamil culture are encapsulated in hand-filled forms. Digitization of the Formulas of the Siddha medicine is exposed in the Olai chuvadi. Auto interactive system may develop to convert written document into a sound file for supporting blind persons to know the details on the pages.

Dissimilar local clusters regions be erratically chosen and chromosome strings are constructed using Genetic algorithm. Local lineaments for every chromosome are dug out and a global feature is generated and given to trained SVM classifiers for pattern recognition. It is hard to create local expanses of apt dimension containing precise astute info. It is overcome by taking numerous accumulations of local realms which are of capricious sizes (Das et al., 2012). A amalgamation of statistical feature extraction, feed forward neural network characterizing poly-layered back proliferation and lexical matching recognition has provide a lead way in recognizing the Malay handwritten cheque words (Noori et al., 2011). Discrete wavelet transformation is utilized to convert the characters in terms of pixels and then to binary that provide quality information about the character for recognition. Further the sequence of binary are classified using hamming distance capable of tracing the bit changes in two sequences of the binary and the bit value is used to recognize a character (Razak et al., 2009). Simplified SIFT recognizes the geometric centre of the license plate by passing through 4 maneuvers explicitly, recognition of Scale-space extrema, localization of Key point, assigning predilection followed finally by description of key point; then the main direction is found by the PCA algorithm. The technique of SVM is also utilized here to make the simplified SIFT as tough as the original SIFT (Yang et al., 2011).

Here, the pixel information is extracted by averaging a trained set of character images. Bayesian fusion process for all pixel probabilities helps in character recognition. Further to differentiate between similar sized characters SVM classifier is also used (Al-Hmouz, 2012). The new way of approach for character recognition is discussed in this study. Adaptive Resonance Theory 1 (ART1) which is a vector classifier and it is a class of self organizing neural architecture (unsupervised learning). The basic idea behind this ART1 is it accepts input vector and classifies into one of the classes depending on which of the stored pattern it resembles more (Tambouratzis, 1996).

This article focuses on the implementation of character recognition using neural networks. Logic neural network can be closely related to easily implementable. RAM (Random Access Memory). It has high functionality and very short training and operating times. The two neural networks discussed here are WISARD and SOLNN. WISARD is the first commercial available neural network in which the content of the address determined by the input can be changed. It has 2^n memory locations and can be formed by set of discriminator nodes. SOLNN which is similar to WISARD is a modified and self organizing network in which the structure can be changed by varying several bits rather than a single bit to each location (Vishwakarma and Deshmukh, 2010). Thus obtained individual characters are put through practices of feature excerption, to pull out traits like character’s pinnacle, girth, total horizontal as well as vertical streaks, bends, circles, slopes, centroids, exceptional spots (Banumathi and Nasira, 2011).

Strokes and stroke sequence are extracted to define a two-dimensional input character as one-dimensional representation for Korean characters recognition. Stroke sequence is evaluated using stroke codes and stroke relations between two consecutive sequences (Kim and Kim, 1996). The suggested mechanism involves pre-processing of images being altered to binary forms using a scanner having dpi of 600 and scaled via parallel tapering algorithm. Then skeleton of the image gets sketched for building a graph from right to left. Some features like streaks, Loops, Curves etc. are dug out from graphs. Categorizing characters is based on neural systems composing five layers (Amin et al., 1996). Strokes in input draft decide the characters’ and radicals’ range. Candidate characters are confined on the basis of those having and not having such renowned radicals. As some of them may possess soaring harmonizing costs, they are discarded due to reduced recognizing time (Lay et al., 1996).

Model-based structural matching modus operandi incurs proper correspondence of stroke in addition to ensure structural rendition. ARG defines character reference in grounds of line segments and points. Structural rendition can be attained by two steps candidate stripe drawing out and unswerving matching. As a result, the former gets pulled out by means of line following and the latter through heuristic search (Liu et al., 2001). For instance, Arabic text is divided into words and their sub-words after which dots are drawn out. An adaptive routine is put forward for finding slant angles of various components in text line. A distinct segmentation run is modeled which is used in appreciation of phase, derived from Arabic penning style that uses polygonal estimation and subsequently identified by fuzzy matching algorithm. Dynamic indoctrination selects the best hypothesis of acknowledged characters for all words or sub-words (Tanvir Parvez and Mahmoud, 2013). There are some research work on tamil character recognition available in (Banumathi and Nasira, 2011; Gandhi and Iyakutti, 2009; Kannan and Prabhakar, 2008a, b; Kumar et al., 2008)

Optical Character Recognition finds itself as a vast and salient demesne in Pattern Recognition (Al-Hmouz, 2012). The archives date back to 1870 which is later developed as abet for the visually hampered. Lately evolved digital technologies exploited OCR in postal service and for data dispensation. Some principal connives of OCR can be handwritten, on-line, fixed-font and script (Raj and Abirami, 2012). Each recognizing method is unique and has deep roots in this construct. Customary prototype tactics which are enforced on OCR incorporates point-by-point comprehensive assessment, conversion, local property revival, template harmonizing, structural and curvature scrutiny. Moreover, OCR shoots for paperless environs by exchanging the printed deeds to software turned texts. With the help of Unicode and SVM, versatile practices and stratagem for pattern recognition are divulged. At present, it is used to wield students’ history, data entry operation in business circle; because it decreases cost, time delay and errors which are very common in manual entries. Neural networks, in general, formulate identification more authentic and proficient. Neural mode undertakes three secernated steps.

In step 1 characters’ binary pattern is interpreted to a propitious build. With this as the input, the next step grooms back propagation bringing out data about the network and follow-on weights. The final step takes the upshots if the second and eventually creates the network. Neural networks play a huge role in Tamil characters recognition both online and offline where CNN is exploited for visual tasks which depend upon categorization and identification. In contrary to connected NNs, CNNs are easy to build as well as to use. Exclusive of neural networks, subspaces, matching schemes, HMM and star-based lineament also unearth recognition procedures. BPN is also a choice for Character Recognizer.

Character realization can be split up into online and offline. While the former deals with machine recognition the latter addresses the process of studying scanned images. Machine recognition is one area worthy of experimenting and recently researchers have shown a lot of interest in discovering new vista of recognition. Identifying a pattern online is all about familiarizing a libretto the way it is penned by means of pen or digital stylus. It charters avant-garde implication in milieu of Indian languages. Ultimately, this bids a convincing resolution to existing problems in word processing. Strokes sequence can be used to construct handwritten characters in which the strokes are symbolized as shape feature strings. With the help of this, terra incognita strokes are acknowledged by likening it against a stroke database.

Thus an entire character is recognized via., all constituent strokes. Octal graph is also an effective way to approach off-line handwritten Tamil characters for improving slant correction. It adjudicates representing a letter’s fundamental form irrespective of its written style. Hence, a letter is acknowledged through the graph’s weights. Octal graph renders compartmentalization in addition to standardization that enhances tilt correction (Kannan and Prabhakar, 2008a). One can witness ameliorated results by incorporating various disciplines of Maths and engineering to study the pattern recognition in a detailed manner. The motive for mentioning so is that combination of various grounds leads to higher recognition rate, minimum computation time over and above greater accuracy. Scope of this study, a) feature extraction by Line and Word segmentation technique are carried out and b) Recognition stage consists of finding centroid and geometric moments of the images.

MATERIALS AND METHODS

ANN based improved optical character recognition system has been proposed and the block diagram representation is given in Fig. 1. Here document is scanned and saved as image. Image is digressed like document into paragraph, then lines, words and at last individual character. After segmentation each letter is normalized to certain height length and slat angle. Squeezed shapes are undergone for Preprocessing to make it noise free. By using edge detection, cropped image converted into binary format. Features like height, length, number of horizontal lines, vertical lines, curves and circles extracted and these features are used to train the ANN for pattern recognition. Back propagation gives higher accuracy in recognition.

Multilayer BPN: Gradient decent BPN is more suitable in pattern recognition environment shown in Fig. 2, it can confine the result and it is multilayer network. BPN can converge to the correct pattern even the pattern is inexact or noisy. Multilayer Back Propagation Network is having three different layers namely input, hidden and output. It can bea multi class recognizer. Input patterns are applied to the layer 1, weighted inputs are added then sigmoid activation can be applied to obtain the intermediate cluster. Bias unit is usually added in all layers to provide minimum input. Using threshold activation, matching error is estimated in layer 2 from the intermediate cluster input. Estimated error signal is back propagated from layer 2 to layer 1, then the weighted paths are modified to make error zero. BPN can recognize the pattern even it is noisy.

Fig. 1: Block diagram representation of the proposed system

Fig. 2: Multilayer BPN

Fig. 3: Flow chart for training

PHASE I: Train the ANN for the character recognition application: Flowchart representation is given in Fig. 3 and the major steps are as follows:

Get the training characters to improvising knowledge of the BPN
Assign the Unicode for printed text which is a target
Assign the extracted features from character as an input vector and corresponding Unicode as an output vector.
Find the activation of the hidden units and output units.
If calculated value matches with assigned Unicode then go to testing phase, otherwise modify the synaptic strength of the hidden units and output units using error signal generation.
Continue the procedure till training to be completed

Error signal generation:

Find out deviation between the test pattern and the training pattern
Back propagate the error information signal to hidden from output layer
•` Weights of the hidden and output layer are updated using error information signal

Synaptic strength of the multilayer network is increased by error signal, so trained pattern converged.

PHASE-II: Test the ANN with noisy character: Different style of handwritten:

Compare the test pattern feature with trained patterns one by one, As given Fig. 4
If MSE is minimum, display the character
Otherwise output the error signal
Sample training patters is given in Fig. 5.

Fig. 4: Flow chart for testing

For performance analysis of this study, some sample patterns are predefined with indices. Experiment is done for number of epochs. The sample pattern and the three different graphs are shown in Fig. 5-9.

Illustration involves three patterns say 1, 2 and 3 and test and trained characters are equated against each other. Here, patterns 1 and 3 are handwritten samples. For example trained sample 1 is compared with the three test samples. Even though, literally, it is completely different from the others, due to some common features, the algorithm exhibits some numerical results. To explain, the trained sample 1 (‘o’) has similar curvatures and circular features with that of test pattern 1 (‘a’); but since it is handwritten, it exhibits wryness decreasing the level of similarity. So, for 1st pattern the resulting value is 0.0421. Similarly for the remaining two patterns (‘u’, ‘ee’) it completely contrasts and give the values in negative i.e., -0.0161 and-0.5139, respectively.

The pattern matching results is given in Table 1. The matched results are better visualized in the pattern matching graphs given in Fig. 10, 11 and 12. For appreciation, this study takes ten sample patterns comprising different Tamil letters. Each of them is unique in their writing style and characteristics. In this study, each of the input character is likened against the samples and concluded correct depending on how far this matches with the latter.

Fig. 5: Sample training patterns

Fig. 6:
Performance analysis of ANN while training using MSE

Fig. 7:
Patterns are converged at the gradient = 0.1485 and 955th epoch

Fig. 8: Patterns are trained with t = 0.9409 when t = 1

Table 1: Level of pattern matching-test and trained characters with actual output y = 1

Fig. 9: Testing the BPN with and without noisy patterns

Fig. 10: Pattern matching for test pattern 1

Fig. 11: Pattern matching for test pattern 2

Fig. 12: Pattern matching for test pattern 3

In matching the 2nd sample (‘u’), it does not match with the pattern 1 (‘a’) and pattern 3 (‘ee’), thus giving contrast results. If one has a closer look, since it completely differs from 1 the result is negative (-0.0225). In the case of pattern 3, the feature line segment somewhat coincides thus produces the positive upshot (0.1922). The best result of 1.0051 is obtained with pattern 2 (‘u’) since both the test and trained pattern remain identical. Thus, experimental values closer to 1 signify that there exists exact or appreciable matching and vice versa. Similarly, for the taken test patterns, anticipated results are obtained namely 0.9779 (‘a’), 1.0051 (‘u’), 0.3776 (‘ee’). The results vary with the written style (i.e., handwritten or perfect shape). As a result, with this as initiative, the algorithm can be expanded to words, then to lines and finally to paragraphs. This is a very salubrious construct that provides anticipated faster results.

Apart from analyzing the numerical results, graphical results are also portrayed by plotting index of the trained characters and their matching level against trained pattern. Test pattern 1 almost matches with the 3rd trained sample, pattern 2 goes with 2nd character and finally the last one and 10th trained character are alike. In the case of third test character, it matches with some other trials too because of some similar characteristics like line segments, dots, curves, circles etc. Moreover, relation between MSE and epoch is also studied from which it is inferred that above 900 iterations, MSE reduces to 0.01. This parameter is used to amend the weight. In this algorithm, Patterns get converged at 955th epoch with the gradient value of 0.1485. Here, Gradient is the relation between weights assigned for subsequent iterations. In the final graph, the performance of this manuscript is likened with the anticipated, which proves that it goes almost hand in hand with it confirming the effective performance of this study.

CONCLUSION

Recognition of characters, a deterministic approach, is a newly emerged domain of pattern recognition and image processing which has interested many researchers. This study is a result of one more possible and probably the unsurpassed practice in recognizing characters. It utilizes the conventional traits of character recognition namely preprocessing, feature extortion, edge detection and BPN. This is basically a neural network plot in which the algorithm is a blend of structural scrutiny and categorization analysis exhibiting good upshots for even intricate and big sets. It has ceiling competence and possess enhanced recognition rate thus making it exceptional than the remaining ones. The forte behind this study is its ease to extend the concept for other linguistic identification and appreciation.

REFERENCES

  • Amin, A., H. Al-Sadoun and S. Fischer, 1996. Hand-printed Arabic character recognition system using an artificial network. Pattern Recognit., 29: 663-675.
    CrossRef    Direct Link    


  • Banumathi, P. and G.M. Nasira, 2011. Handwritten Tamil character recognition using artificial neural networks. Proceedings of the International Conference on Process Automation, Control and Computing, July 20-22, 2011, Coimbatore, pp: 1-5.


  • Das, N., R. Sarkar, S. Basu, M. Kundu, M. Nasipuri and D.K. Basu, 2012. A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Applied Soft Comput., 12: 1592-1606.
    CrossRef    Direct Link    


  • Gandhi, R.I. and K. Iyakutti, 2009. An attempt to recognize handwritten tamil character using kohonen SOM. Int. J. Adv. Networking Appl., 1: 188-192.
    Direct Link    


  • Kannan R.J. and R. Prabhakar, 2008. An improved handwritten tamil character recognition system using octal graph. J. Comput. Sci., 4: 509-516.
    CrossRef    Direct Link    


  • Kannan, J.R. and R. Prabhakar, 2008. Accuracy augmentation of Tamil OCR using algorithm fusion. Int. J. Comput. Sci. Network Secur., 8: 51-56.
    Direct Link    


  • Kumar, J.J., R. Prabhakar and R.M. Suresh, 2008. Off-line cursive handwritten Tamil characters recognition. Proceedings of the International Conference on Security Technology, December 13-15, 2008, Hainan Island, pp: 159-164.


  • Kim, H.J. and P.K. Kim, 1996. Recognition of off-line handwritten Korean characters. Pattern Recognit., 29: 245-254.
    CrossRef    Direct Link    


  • Lay, S.C., C.H. Lee, N.J. Cheng, C.C. Tseng and B.S. Jeng et al., 1996. On-line Chinese character recognition with effective candidate radical and candidate character selections. Pattern Recognit., 29: 1647-1659.
    CrossRef    Direct Link    


  • Liu, C.I., I.J. Kim and J.H. Kim, 2001. Model-based stroke extraction and matching for handwritten Chinese character recognition. Pattern Recognit., 34: 2339-2352.
    CrossRef    Direct Link    


  • Noori, O., S.M.S. Ahmad and A. Shakil, 2011. Offline malay handwritten cheque words recognition using artificial neural network. J. Applied Sci., 11: 86-95.
    CrossRef    Direct Link    


  • Al-Hmouz, R., 2012. OCR based pixel fusion. J. Applied Sci., 12: 2319-2325.
    CrossRef    


  • Razak, Z., N.A. Ghani, E.M. Tamil, M.Y.I. Idris and N.M. Noor et al., 2009. Off-line jawi handwriting recognition using hamming classification. Inform. Technol. J., 8: 971-981.
    CrossRef    Direct Link    


  • Tambouratzis, G., 1996. Applying logic neural networks to hand-written character recognition tasks. Proceedings of the 8th International Conference on Tools with Artificial Intelligence,q November 16-19, 1996, Greece, pp: 268-271.


  • Tanvir Parvez, M. and S.A. Mahmoud, 2013. Arabic handwriting recognition using structural and syntactic pattern attributes. Pattern Recognit., 46: 141-154.
    CrossRef    Direct Link    


  • Vishwakarma, A. and A.Y. Deshmukh, 2010. A design approach for hand written character recognition using adaptive resonance theory network I. Proceedings of the 3rd International Conference on Emerging Trends in Engineering and Technology, November 19-21, 2010, Goa, India, pp: 624-.


  • Yang, M.N., X.J. Li and X.H. Zhang, 2011. Robust description method of SIFT for features of license plate characters. Inform. Technol. J., 10: 2189-2195.


  • Raj, M.A.R. and S. Abirami, 2012. A survey on Tamil handwritten character recognition using OCR techniques. Comput. Sci. Inform. Technol., 5: 115-127.
    CrossRef    Direct Link    

  • © Science Alert. All Rights Reserved