Abstract—The desire to edit scanned text document forces the researchers to think about the optical character recognition (OCR). OCR is the process of recognizing a segmented part of the scanned image as a character. OCR process consists of three major sub processes - pre processing, segmentation and then recognition. Out of these three, the segmentation process is the most important phase of the overall OCR process. It is the most significant process because if the output of segmentation phase is incorrect then we can not expect the correct results; it is just like garbage in and garbage out. But on the same time, segmentation is complex too. If the document is handwritten then the situation becomes more cumbersome, because in that case only few points are there which can be used to make segmentation. In this paper, we formulate an algorithm to segment the scanned document image as a character. As per our earlier published work, the information about the lines and words within each line is written in a data file. According to proposed algorithm, one part is extracted from the word present in the line. This extracted part is checked whether it has some meaningful symbol (as per Gurumukhi script). If it has then the extracted part is marked and written in the file, otherwise the extracted part is readjusted to find the symbol. For classification, we have used hybrid approach which consists of water reservoir and feature extraction approach. This concept was implemented and got good reasonable results.
Index Terms—OCR, Segmentation, gurumukhi, handwritten, feature, water reservoir.
Rajiv Kumar, Thapar University, (email: email@example.com)
Amardeep Singh, Pbi University, (email: firstname.lastname@example.org)
Cite: Rajiv Kumar and Amardeep Singh, "Character Segmentation in Gurumukhi Handwritten Textusing Hybrid Approach," International Journal of Computer Theory and Engineering vol. 3, no. 4, pp. 499-501, 2011.