RECOGNITION AND VERIFICATION OF HARDWRITTEN AND HAND-PRINTER BRITISH POSTAL ADDRESSES

An algorithmic architecture for a high-performance optical character recognition (OCR) system for hand-printed and handwritten addresses is proposed. The architecture integrates syntactic and contextual post-processing with character recognition to optimise postcode recognition performance, and verifies the postcode against simple features extracted from the remainder of the address to ensure a low error rate. An enhanced version of the characteristic loci character recognition algorithm was chosen for the system to make it tolerant of variations in writing style. Feature selection for the classifier is performed automatically using the B/W algorithm. Syntactic and contextual information for hand-printed British postcodes have been integrated into the system by combining low-level postcode syntax information with a dictionary trie structure. A full implementation of the postcode dictionary trie is described. Features which define the town name effectively, and can easily be extracted from a handwritten or hand-printed town name are used for postcode verification. A database totalling 3473 postcode/address image has used to evaluate the performance of the complete postcode recognition process. The basic character recognition rate for the full unconstrained alphanumeric character set is 63.1%, compared with an expected maximum attainable 75–80%. The addition of the syntactic and contextual knowledge stages produces an overall postcode recognition rate which is equivalent to an alphanumeric character recognition rate of 86–90%. Separate verification experiments on a subset of 820 address images show that, with the first-order features chosen, an overall correct address feature code extraction rate of around 35% is achieved.

Download Full-text

A Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2012010102 ◽

2012 ◽

Vol 2 (1) ◽

pp. 16-41 ◽

Cited By ~ 1

Author(s):

Htwe Pa Pa Win ◽

Phyo Thu Thu Khine ◽

Khin Nwe Ni Tun

Keyword(s):

Feature Extraction ◽

Structural Analysis ◽

Character Recognition ◽

Optical Character Recognition ◽

Extraction Method ◽

Recognition Performance ◽

Extraction Methods ◽

Support Vector ◽

Svm Classifier ◽

Feature Extraction Method

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the scripts’ natures. One major contribution of the work in this paper is the design of logically rigorous coding based features. To show the effectiveness of the proposed method, this paper assumed the documents are successfully segmented into characters and extracted features from these isolated Myanmar characters. These features are extracted using structural analysis of the Myanmar scripts. The experimental results have been carried out using the Support Vector Machine (SVM) classifier and compare the pervious proposed feature extraction method.

Download Full-text

RECOGNIZING HANDWRITTEN CHINESE CHARACTERS BY STROKE-SEGMENT MATCHING USING AN ITERATION SCHEME

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001491000120 ◽

1991 ◽

Vol 05 (01n02) ◽

pp. 175-197 ◽

Cited By ~ 8

Author(s):

SHENG-LIN CHOU ◽

WEN-HSIANG TSAI

Keyword(s):

Character Recognition ◽

Contextual Information ◽

Recognition Rate ◽

Stable State ◽

Evaluation Process ◽

Iteration Scheme ◽

Chinese Characters ◽

Structure Information ◽

Segment Matching ◽

Set Up

The problem of handwritten Chinese character recognition is solved by matching character stroke segments using an iteration scheme. Length and orientation similarity properties, and coordinate overlapping ratios are used to define a measure of similarity between any two stroke segments. The initial measures of similarity between the stroke segments of the input and template characters are used to set up a match network which includes all the match relationships between the input and template stroke segments. Based on the concept of at-most-one to one mapping an iteration scheme is employed to adjust the match relationships, using the contextual information implicitly contained in the match network, so that the match relationships can get into a stable state. From the final match relationships, matched stroke-segment pairs are determined by a mutually-best match strategy and the degree of similarity between the input and each template character is evaluated accordingly. Certain structure information of Chinese characters is also used in the evaluation process. The experimental results show that the proposed approach is effective. For recognition of Chinese characters written by a specific person, the recognition rate is about 96%. If the characters of the first three ranks are checked in counting the recognition rate, the rate rises to 99.6%.

Download Full-text

Performance Evaluation of Automatic Number Plate Recognition on Android Smartphone Platform

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v7i4.pp1973-1982 ◽

2017 ◽

Vol 7 (4) ◽

pp. 1973

Author(s):

Teddy Surya Gunawan ◽

Abdul Mutholib ◽

Mira Kartiwi

Keyword(s):

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Processing Time ◽

Intelligent System ◽

Recognition Rate ◽

The Other ◽

Other Hand ◽

Additional Processing ◽

Artificial Neural Network Ann

<span>Automatic Number Plate Recognition (ANPR) is an intelligent system which has the capability to recognize the character on vehicle number plate. Previous researches implemented ANPR system on personal computer (PC) with high resolution camera and high computational capability. On the other hand, not many researches have been conducted on the design and implementation of ANPR in smartphone platforms which has limited camera resolution and processing speed. In this paper, various steps to optimize ANPR, including pre-processing, segmentation, and optical character recognition (OCR) using artificial neural network (ANN) and template matching, were described. The proposed ANPR algorithm was based on Tesseract and Leptonica libraries. For comparison purpose, the template matching based OCR will be compared to ANN based OCR. Performance of the proposed algorithm was evaluated on the developed Malaysian number plates’ image database captured by smartphone’s camera. Results showed that the accuracy and processing time of the proposed algorithm using template matching was 97.5% and 1.13 seconds, respectively. On the other hand, the traditional algorithm using template matching only obtained 83.7% recognition rate with 0.98 second processing time. It shows that our proposed ANPR algorithm improved the recognition rate with negligible additional processing time.</span>

Download Full-text

Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks

10.36227/techrxiv.14795592.v2 ◽

2021 ◽

Author(s):

Komuravelli Prashanth ◽

Kalidas Yeturu

Keyword(s):

Neural Network ◽

Character Recognition ◽

Optical Character Recognition ◽

High Performance ◽

Performance Metrics ◽

Training Data ◽

Mean Average Precision ◽

Average Precision ◽

Squared Error Loss Function ◽

Scanned Documents

<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div>

Download Full-text

Search-Based Classification for Offline Tifinagh Alphabets Recognition

Advancements in Computer Vision Applications in Intelligent Systems and Multimedia Technologies - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-4444-0.ch013 ◽

2020 ◽

pp. 255-267

Author(s):

Mohammed Erritali ◽

Youssef Chouni ◽

Youssef Ouadid

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Processing Time ◽

Recognition Rate ◽

Main Difficulty ◽

Optical Character

The main difficulty in developing a successful optical character recognition (OCR) system lies in the confusion between the characters. In the case of Amazigh writing (Tifinagh alphabets), some characters have similarities based on rotation or scale. Most of the researchers attempted to solve this problem by combining multiple descriptors and / or classifiers which increased the recognition rate, but at the expense of processing time that becomes more prohibitive. Thus, reducing the confusion of characters and their recognition times is the major challenge of OCR systems. In this chapter, the authors present an off-line OCR system for Tifinagh characters.

Download Full-text

Exploring More Representative States of Hidden Markov Model in Optical Character Recognition: A Clustering-Based Model Pre-Training Approach

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500147 ◽

2015 ◽

Vol 29 (03) ◽

pp. 1550014 ◽

Cited By ~ 2

Author(s):

Zhiwei Jiang ◽

Xiaoqing Ding ◽

Liangrui Peng ◽

Changsong Liu

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Character Recognition ◽

Optical Character Recognition ◽

Hidden Markov ◽

Recognition Performance ◽

Optimization Methods ◽

Model Parameters ◽

Model Structure ◽

Training Approach

Hidden Markov Model (HMM) is an effective method to describe sequential signals in many applications. As to model estimation issue, common training algorithm only focuses on the optimization of model parameters. However, model structure influences system performance as well. Although some structure optimization methods are proposed, they are usually implemented as an independent module before parameter optimization. In this paper, the clustering feature of states in HMM is discussed through comparing the mechanism of Quadratic Discriminant Function (QDF) classifier and HMM. Then, through the clustering effect of Viterbi training and Baum–Welch training, a novel clustering-based model pre-training approach is proposed. It can optimize model parameters and model structure by turns, until the representative states of all models are explored. Finally, the proposed approach is evaluated on two typical OCR applications, printed and handwritten Arabic text line recognition. And it is compared with some other optimization methods. The improvement of character recognition performance proves the proposed approach can make more precise state allocation. And the representative states are benefit to HMM decoding.

Download Full-text

Recognition of Handwritten Characters using Deep Convolutional Neural Network

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1064.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 314-317

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Recognition Rate ◽

Writing Style ◽

Handwritten Character Recognition ◽

New Approach ◽

Optical Character ◽

Handwritten Character ◽

Handwritten Recognition ◽

Difficult Part

Handwritten character recognition (HCR) mainly entails optical character recognition. However, HCR involves in formatting and segmentation of the input. HCR is still an active area of research due to the fact that numerous verification in writing style, shape, size to individuals. The main difficult part of Indian handwritten recognition has overlapping between characters. These overlapping shaped characters are difficult to recognize that may lead to low recognition rate. These factors also increase the complexity of handwritten character recognition. This paper proposes a new approach to identify handwritten characters for Telugu language using Deep Learning (DL). The proposed work can be enhance the recognition rate of individual characters. The proposed approach recognizes with overall accuracy is 94%.

Download Full-text

APPLICATION OF ZONAL AND CURVATURE FEATURES TO NUMERALS RECOGNITION

International Journal of Students Research in Technology & Management ◽

10.18510/ijsrtm.2021.922 ◽

2021 ◽

Vol 9 (2) ◽

pp. 7-12

Author(s):

Binod Kumar Prasad

Keyword(s):

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Recognition Rate ◽

Recognition System ◽

Signature Verification ◽

Optical Character ◽

Knn Classifier ◽

Average Recognition Rate ◽

Distance Coding

Purpose of the study: The purpose of this work is to present an offline Optical Character Recognition system to recognise handwritten English numerals to help automation of document reading. It helps to avoid tedious and time-consuming manual typing to key in important information in a computer system to preserve it for a longer time. Methodology: This work applies Curvature Features of English numeral images by encoding them in terms of distance and slope. The finer local details of images have been extracted by using Zonal features. The feature vectors obtained from the combination of these features have been fed to the KNN classifier. The whole work has been executed using the MatLab Image Processing toolbox. Main Findings: The system produces an average recognition rate of 96.67% with K=1 whereas, with K=3, the rate increased to 97% with corresponding errors of 3.33% and 3% respectively. Out of all the ten numerals, some numerals like ‘3’ and ‘8’ have shown respectively lower recognition rates. It is because of the similarity between their structures. Applications of this study: The proposed work is related to the recognition of English numerals. The model can be used widely for recognition of any pattern like signature verification, face recognition, character or word recognition in another language under Natural Language Processing, etc. Novelty/Originality of this study: The novelty of the work lies in the process of feature extraction. Curves present in the structure of a numeral sample have been encoded based on distance and slope thereby presenting Distance features and Slope features. Vertical Delta Distance Coding (VDDC) and Horizontal Delta Distance Coding (HDDC) encode a curve from vertical and horizontal directions to reveal concavity and convexity from different angles.

Download Full-text

Construction of Statistical SVM based Recognition Model for Handwritten Character Recognition

Journal of Information Technology and Digital World - September 2019 ◽

10.36548/jitdw.2021.2.003 ◽

2021 ◽

Vol 3 (2) ◽

pp. 92-107

Author(s):

Yasir Babiker Hamdan ◽

Sathish

Keyword(s):

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Recognition Rate ◽

Banking System ◽

Developed Countries ◽

Support Vector ◽

Handwritten Character Recognition ◽

Research Article ◽

Handwritten Character

There are many applications of the handwritten character recognition (HCR) approach still exist. Reading postal addresses in various states contains different languages in any union government like India. Bank check amounts and signature verification is one of the important application of HCR in the automatic banking system in all developed countries. The optical character recognition of the documents is comparing with handwriting documents by a human. This OCR is used for translation purposes of characters from various types of files such as image, word document files. The main aim of this research article is to provide the solution for various handwriting recognition approaches such as touch input from the mobile screen and picture file. The recognition approaches performing with various methods that we have chosen in artificial neural networks and statistical methods so on and to address nonlinearly divisible issues. This research article consisting of various approaches to compare and recognize the handwriting characters from the image documents. Besides, the research paper is comparing statistical approach support vector machine (SVM) classifiers network method with statistical, template matching, structural pattern recognition, and graphical methods. It has proved Statistical SVM for OCR system performance that is providing a good result that is configured with machine learning approach. The recognition rate is higher than other methods mentioned in this research article. The proposed model has tested on a training section that contained various stylish letters and digits to learn with a higher accuracy level. We obtained test results of 91% of accuracy to recognize the characters from documents. Finally, we have discussed several future tasks of this research further.

Download Full-text

Automated Text Detection and Recognition in Annotated Biomedical Publication Images

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.2014040103 ◽

2014 ◽

Vol 9 (2) ◽

pp. 34-63 ◽

Cited By ~ 1

Author(s):

Soumya De ◽

R. Joe Stanley ◽

Beibei Cheng ◽

Sameer Antani ◽

Rodney Long ◽

...

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Recognition Rate ◽

Regions Of Interest ◽

Correct Recognition ◽

Support Vector ◽

Data Set ◽

Biomedical Publication ◽

Biomedical Publications ◽

Detection And Recognition

Images in biomedical publications often convey important information related to an article's content. When referenced properly, these images aid in clinical decision support. Annotations such as text labels and symbols, as provided by medical experts, are used to highlight regions of interest within the images. These annotations, if extracted automatically, could be used in conjunction with either the image caption text or the image citations (mentions) in the articles to improve biomedical information retrieval. In the current study, automatic detection and recognition of text labels in biomedical publication images was investigated. This paper presents both image analysis and feature-based approaches to extract and recognize specific regions of interest (text labels) within images in biomedical publications. Experiments were performed on 6515 characters extracted from text labels present in 200 biomedical publication images. These images are part of the data set from ImageCLEF 2010. Automated character recognition experiments were conducted using geometry-, region-, exemplar-, and profile-based correlation features and Fourier descriptors extracted from the characters. Correct recognition as high as 92.67% was obtained with a support vector machine classifier, compared to a 75.90% correct recognition rate with a benchmark Optical Character Recognition technique.

Download Full-text