Abstract: The reading skills of computer are still way behind that of human beings. Most character recognition systems cannot read degraded documents and handwritten characters or words. Devanagari, an alphabetic script, is used by over 500 million people all over the world. In this study, we are presenting a Devanagari handwritten character recognition system using Artificial Neural Network. Up to 96% recognition rate is achieved for certain characters.
INTRODUCTION
India is multilingual country of more than 1 billion population with 18 constitutional languages and 10 different script. Devanagari, an alphabetic script, is used by a number of Indian languages. It was developed to write Sanskrit but was later adapted to write many other languages such as Marathi, Hindi, Konkani and Nepali. Many other Indian languages use close variant of this script (Masica, 1991). Although, Sanskrit is an ancient language and no longer spoken, written material still exists. Hindi is worlds third most commonly used language after English and Chinese and there are approximately 500 million people all over the world that speak and write in Hindi. Devanagari has about 11 vowels and 34 consonants. Figure 1 shows Devanagari characters. It is used as the writing system for over 28 languages including Sanskrit, Hindi, Kashmiri, Marathi and Nepali and is used by more than 500 million people. Thus, Devanagari Handwritten character recognition is an open field of research which still has large amount of scope for developments. (Bahlmann et al., 2004).
Fig. 1: | Devnagari characters |
Fig. 2: | Selected devanagari characters |
A few models that have been applied for the hand written character recognition system include structure-based models (Aparna et al., 2004; Chan and Yeung, 1998), stochastic models (Li et al., 1998) and learning-based models (Manke and Bodenhausen, 1994). Learning-based models have received wide attention for pattern recognition problems. Neural network models have been reported to achieve better performance than other existing models in many recognition tasks. Support vector machines have also been observed to achieve reasonable generalization accuracy, especially in implementations of handwritten digit recognition (Bin et al., 2000) and character recognition in Roman (Bahlmann et al., 2004), Thai (Sanguansat et al., 2004) and Arabic (Bentounsi and Batouche, 2004) scripts. We have also attempted successfully to recognize Marathi numerals by using Artificial Neural Network (Khanale, 2010a).
As many Indian languages have a similar character set, developing a recognition engine for one Indian language serves as a framework for others as well. Handwritten character recognition for any Indian writing system is rendered complex because of the presence of composite characters.
THE PROBLEM
A recognition system is to be developed for recognition of handwritten devanagari characters by using artificial neural network. The ten selected devanagari characters are given in Fig. 2. Any particular character from data sheet can be selected. The selected character need to be preprocessed and is converted into 5 by 7 matrix of Boolean values. It is further classified into a class based on its unique feature value by the artificial neural network. The input characters may contain noise or they may differ in shape as per the style of writing of a person. We expect the system to classify reasonably well.
DESIGN OF RECOGNITION SYSTEM
Data collection: A special sheet is designed for data collection. Data
is collected from people domain with 10 samples of each character from about
40 persons from different fields and age. Data acquisition is done manually,
i.e., data collection for the experiment has been done from the different individuals.
Writers were provided with the plain A4 sheet and each writer has asked to write
Devanagari characters from
Block diagram of recognition system: Figure 4 shows the basic block diagram of the recognition system.
The hand written devanagari characters are scanned and a digitized document is obtained. Out of the available characters, a particular character is selected by using segmentation. The image of character is cropped and it is resized to fix rows and columns.
Fig. 3: | Data collection |
Fig. 4: | Basic block diagram of recognition system |
Fig. 5: | Representation of character |
The result is that each character is represented by 5 by 7 grid of Boolean
values. For example, the character
Fig. 6: | Image of handwritten character |
However, images of handwritten characters differ from that of ideal one and
contain certain different shapes as per style of writing. The image of handwritten
For the purpose of recognition a trained artificial neural network is used. The network is trained for a set of ideal and handwritten characters. The network determines a unique feature value for each character. It is further compared with that of ideal one to determine recognition. A reasonably well recognition is expected from the network.
To determine the feature value, character image is decomposed into directional planes. Each directional plane is partitioned into equal sized zones and then sum of pixel values in each zone is taken to have feature value.
NEURAL NETWORK AND ITS TRAINING
The neural network receive 35 Boolean values as a 35 element input vector. It responds to it, by outputting 10 element vector output with 1 in numeral position and 0 elsewhere. Also, the network should recognize handwritten characters, that is, network must make few mistakes with noise and different style of writing of hand written characters.
The selected architecture of the network is a two layer feed forward network with 10 neurons each. The transfer function is log-sigmoid. The training of the network is done with backpropogation algorithm. The network is trained with batch propogation with adaptive learning rate. The performance function used is sum squared error. The goal was set to 0.1. Figure 7 shows training of network. For more details about network, one can refer to our publication on Marathi numerals (Khanale, 2010b).
The network is trained with ideal vectors until it has a 0.1 sum squared error. Then the network is trained with 10 sets of ideal and noisy vectors. For noisy vectors the goal was set to 0.2. Figure 8 shows training of the network with noise. After training, the network is ready to use.
Testing of network: A GUI application is developed where the scanned image of hand written characters can be loaded. Any particular character from the set of characters can be cropped and it is further pre processed to have binary 5 by 7 representation of the character. Further, by using the trained network one can evaluate the feature value of character.
Figure 9 shows feature value of ideal
Fig. 7: | Training of the network |
Fig. 8: | Training of the network with noise |
Table 1: | Feature values of printed ideal characters |
RESULTS AND DISCUSSION
The performance of the network is given in Fig. 11. Observe that up to noise level of 0.25 recognition is 100%. Table 1 shows feature values of ideal printed characters. Table 2 shows feature values of handwritten characters. Table 3 gives recognition rate which is based on difference between feature values of ideal and handwritten characters. In Table 3, the last row indicates the average of recognition rate.
Fig. 9: | Feature value of ideal character |
Fig. 10: | Feature value of handwritten character |
Fig. 11: | Performance of the network |
Table 2: | Feature values of handwritten characters |
Table 3: | Recognition rate of handwritten characters |
Observe that up to 96% recognition is achieved for handwritten devanagari characters. The neural network method described here can also be extended to other Indian scripts.