Off-line Sinhala Handwritten Postal City Name Recognition Using Segmentation Free Algorithms

A.G.A.V. Anuradha and N.D.Kodikara

ABSTRACT

Offline handwriting recognition is still an open area in research community. And many researchers pay attention to this area since it is linked with lots of applications. However, very little number of researches has been done to recognize Sinhala handwritings. In this research, we present a system towards Sri Lankan postal automation based on the recognition of postal city name written within a restricted area on postal envelopes. The proposed system uses segmentation free algorithms. So, the whole word is considered instead of isolated characters. Four phases are carried out in this research such as preprocessing, feature detection, recognition and postprocessing. There are 7 stages in preprocessing phase namely noise removing, thresholding, detection of the rectangular area, skew detection, skew correction, underline removing and thinning. We propose the use of Gabor filter to produce effective features. The other used feature extraction methods are Horizontal Projection Profile histogram as well as Vertical Projection Profile histogram. 13 features are identified for each image by using those techniques. A feed forward neural network with one hidden layer is used in the recognition process.

This is based on the supervised learning algorithm. There are 6 output nodes to classify 50 different postal cities. Data for the training set and testing set is received from the National Science Foundation (NSF) database which consists of Sinhala postal city names written by people for the real purpose as well as from some other people. The well trained neural network outputs a binary number which corresponds to the recognized city name. The postprocessing technique uses a Microsoft Access database which maps the decimal value of the binary number to the city name. It gives the recognized city name in a user readable manner. Testing and evaluation were done in a comparable manner using several neural networks with 50, 20 and 10 cities. The accuracy of postal city name recognition was about 41%. Lack of training data, weaknesses of skew detection and underline removing algorithms are the reasons for this lower recognition rate. Some features used here are not much effective and this situation has also affected to the low accuracy.