RECOGNITION AND VERIFICATION OF HARDWRITTEN AND HAND-PRINTER BRITISH POSTAL ADDRESSES

Author(s):  
A. C. DOWNTON ◽  
R. W. S. TREGIDGO ◽  
E. KABIR

An algorithmic architecture for a high-performance optical character recognition (OCR) system for hand-printed and handwritten addresses is proposed. The architecture integrates syntactic and contextual post-processing with character recognition to optimise postcode recognition performance, and verifies the postcode against simple features extracted from the remainder of the address to ensure a low error rate. An enhanced version of the characteristic loci character recognition algorithm was chosen for the system to make it tolerant of variations in writing style. Feature selection for the classifier is performed automatically using the B/W algorithm. Syntactic and contextual information for hand-printed British postcodes have been integrated into the system by combining low-level postcode syntax information with a dictionary trie structure. A full implementation of the postcode dictionary trie is described. Features which define the town name effectively, and can easily be extracted from a handwritten or hand-printed town name are used for postcode verification. A database totalling 3473 postcode/address image has used to evaluate the performance of the complete postcode recognition process. The basic character recognition rate for the full unconstrained alphanumeric character set is 63.1%, compared with an expected maximum attainable 75–80%. The addition of the syntactic and contextual knowledge stages produces an overall postcode recognition rate which is equivalent to an alphanumeric character recognition rate of 86–90%. Separate verification experiments on a subset of 820 address images show that, with the first-order features chosen, an overall correct address feature code extraction rate of around 35% is achieved.

Author(s):  
Htwe Pa Pa Win ◽  
Phyo Thu Thu Khine ◽  
Khin Nwe Ni Tun

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the scripts’ natures. One major contribution of the work in this paper is the design of logically rigorous coding based features. To show the effectiveness of the proposed method, this paper assumed the documents are successfully segmented into characters and extracted features from these isolated Myanmar characters. These features are extracted using structural analysis of the Myanmar scripts. The experimental results have been carried out using the Support Vector Machine (SVM) classifier and compare the pervious proposed feature extraction method.


Author(s):  
SHENG-LIN CHOU ◽  
WEN-HSIANG TSAI

The problem of handwritten Chinese character recognition is solved by matching character stroke segments using an iteration scheme. Length and orientation similarity properties, and coordinate overlapping ratios are used to define a measure of similarity between any two stroke segments. The initial measures of similarity between the stroke segments of the input and template characters are used to set up a match network which includes all the match relationships between the input and template stroke segments. Based on the concept of at-most-one to one mapping an iteration scheme is employed to adjust the match relationships, using the contextual information implicitly contained in the match network, so that the match relationships can get into a stable state. From the final match relationships, matched stroke-segment pairs are determined by a mutually-best match strategy and the degree of similarity between the input and each template character is evaluated accordingly. Certain structure information of Chinese characters is also used in the evaluation process. The experimental results show that the proposed approach is effective. For recognition of Chinese characters written by a specific person, the recognition rate is about 96%. If the characters of the first three ranks are checked in counting the recognition rate, the rate rises to 99.6%.


Author(s):  
Teddy Surya Gunawan ◽  
Abdul Mutholib ◽  
Mira Kartiwi

<span>Automatic Number Plate Recognition (ANPR) is an intelligent system which has the capability to recognize the character on vehicle number plate. Previous researches implemented ANPR system on personal computer (PC) with high resolution camera and high computational capability. On the other hand, not many researches have been conducted on the design and implementation of ANPR in smartphone platforms which has limited camera resolution and processing speed. In this paper, various steps to optimize ANPR, including pre-processing, segmentation, and optical character recognition (OCR) using artificial neural network (ANN) and template matching, were described. The proposed ANPR algorithm was based on Tesseract and Leptonica libraries. For comparison purpose, the template matching based OCR will be compared to ANN based OCR. Performance of the proposed algorithm was evaluated on the developed Malaysian number plates’ image database captured by smartphone’s camera. Results showed that the accuracy and processing time of the proposed algorithm using template matching was 97.5% and 1.13 seconds, respectively. On the other hand, the traditional algorithm using template matching only obtained 83.7% recognition rate with 0.98 second processing time. It shows that our proposed ANPR algorithm improved the recognition rate with negligible additional processing time.</span>


2021 ◽  
Author(s):  
Komuravelli Prashanth ◽  
Kalidas Yeturu

<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div>


Author(s):  
Mohammed Erritali ◽  
Youssef Chouni ◽  
Youssef Ouadid

The main difficulty in developing a successful optical character recognition (OCR) system lies in the confusion between the characters. In the case of Amazigh writing (Tifinagh alphabets), some characters have similarities based on rotation or scale. Most of the researchers attempted to solve this problem by combining multiple descriptors and / or classifiers which increased the recognition rate, but at the expense of processing time that becomes more prohibitive. Thus, reducing the confusion of characters and their recognition times is the major challenge of OCR systems. In this chapter, the authors present an off-line OCR system for Tifinagh characters.


Author(s):  
Zhiwei Jiang ◽  
Xiaoqing Ding ◽  
Liangrui Peng ◽  
Changsong Liu

Hidden Markov Model (HMM) is an effective method to describe sequential signals in many applications. As to model estimation issue, common training algorithm only focuses on the optimization of model parameters. However, model structure influences system performance as well. Although some structure optimization methods are proposed, they are usually implemented as an independent module before parameter optimization. In this paper, the clustering feature of states in HMM is discussed through comparing the mechanism of Quadratic Discriminant Function (QDF) classifier and HMM. Then, through the clustering effect of Viterbi training and Baum–Welch training, a novel clustering-based model pre-training approach is proposed. It can optimize model parameters and model structure by turns, until the representative states of all models are explored. Finally, the proposed approach is evaluated on two typical OCR applications, printed and handwritten Arabic text line recognition. And it is compared with some other optimization methods. The improvement of character recognition performance proves the proposed approach can make more precise state allocation. And the representative states are benefit to HMM decoding.


Handwritten character recognition (HCR) mainly entails optical character recognition. However, HCR involves in formatting and segmentation of the input. HCR is still an active area of research due to the fact that numerous verification in writing style, shape, size to individuals. The main difficult part of Indian handwritten recognition has overlapping between characters. These overlapping shaped characters are difficult to recognize that may lead to low recognition rate. These factors also increase the complexity of handwritten character recognition. This paper proposes a new approach to identify handwritten characters for Telugu language using Deep Learning (DL). The proposed work can be enhance the recognition rate of individual characters. The proposed approach recognizes with overall accuracy is 94%.


Author(s):  
Binod Kumar Prasad

Purpose of the study: The purpose of this work is to present an offline Optical Character Recognition system to recognise handwritten English numerals to help automation of document reading. It helps to avoid tedious and time-consuming manual typing to key in important information in a computer system to preserve it for a longer time. Methodology: This work applies Curvature Features of English numeral images by encoding them in terms of distance and slope. The finer local details of images have been extracted by using Zonal features. The feature vectors obtained from the combination of these features have been fed to the KNN classifier. The whole work has been executed using the MatLab Image Processing toolbox. Main Findings: The system produces an average recognition rate of 96.67% with K=1 whereas, with K=3, the rate increased to 97% with corresponding errors of 3.33% and 3% respectively. Out of all the ten numerals, some numerals like ‘3’ and ‘8’ have shown respectively lower recognition rates. It is because of the similarity between their structures. Applications of this study: The proposed work is related to the recognition of English numerals. The model can be used widely for recognition of any pattern like signature verification, face recognition, character or word recognition in another language under Natural Language Processing, etc. Novelty/Originality of this study: The novelty of the work lies in the process of feature extraction. Curves present in the structure of a numeral sample have been encoded based on distance and slope thereby presenting Distance features and Slope features. Vertical Delta Distance Coding (VDDC) and Horizontal Delta Distance Coding (HDDC) encode a curve from vertical and horizontal directions to reveal concavity and convexity from different angles.


Author(s):  
Yasir Babiker Hamdan ◽  
Sathish

There are many applications of the handwritten character recognition (HCR) approach still exist. Reading postal addresses in various states contains different languages in any union government like India. Bank check amounts and signature verification is one of the important application of HCR in the automatic banking system in all developed countries. The optical character recognition of the documents is comparing with handwriting documents by a human. This OCR is used for translation purposes of characters from various types of files such as image, word document files. The main aim of this research article is to provide the solution for various handwriting recognition approaches such as touch input from the mobile screen and picture file. The recognition approaches performing with various methods that we have chosen in artificial neural networks and statistical methods so on and to address nonlinearly divisible issues. This research article consisting of various approaches to compare and recognize the handwriting characters from the image documents. Besides, the research paper is comparing statistical approach support vector machine (SVM) classifiers network method with statistical, template matching, structural pattern recognition, and graphical methods. It has proved Statistical SVM for OCR system performance that is providing a good result that is configured with machine learning approach. The recognition rate is higher than other methods mentioned in this research article. The proposed model has tested on a training section that contained various stylish letters and digits to learn with a higher accuracy level. We obtained test results of 91% of accuracy to recognize the characters from documents. Finally, we have discussed several future tasks of this research further.


Author(s):  
Soumya De ◽  
R. Joe Stanley ◽  
Beibei Cheng ◽  
Sameer Antani ◽  
Rodney Long ◽  
...  

Images in biomedical publications often convey important information related to an article's content. When referenced properly, these images aid in clinical decision support. Annotations such as text labels and symbols, as provided by medical experts, are used to highlight regions of interest within the images. These annotations, if extracted automatically, could be used in conjunction with either the image caption text or the image citations (mentions) in the articles to improve biomedical information retrieval. In the current study, automatic detection and recognition of text labels in biomedical publication images was investigated. This paper presents both image analysis and feature-based approaches to extract and recognize specific regions of interest (text labels) within images in biomedical publications. Experiments were performed on 6515 characters extracted from text labels present in 200 biomedical publication images. These images are part of the data set from ImageCLEF 2010. Automated character recognition experiments were conducted using geometry-, region-, exemplar-, and profile-based correlation features and Fourier descriptors extracted from the characters. Correct recognition as high as 92.67% was obtained with a support vector machine classifier, compared to a 75.90% correct recognition rate with a benchmark Optical Character Recognition technique.


Sign in / Sign up

Export Citation Format

Share Document