Deep Extreme Learning Machine-Based Optical Character Recognition System for Nastalique Urdu-Like Script Languages

2020 ◽  
Author(s):  
Syed Saqib Raza Rizvi ◽  
Muhammad Adnan Khan ◽  
Sagheer Abbas ◽  
Muhammad Asadullah ◽  
Nida Anwer ◽  
...  

Abstract Optical character recognition systems convert printed or handwritten scripts into digital text formats like ASCII or UNICODE. Urdu-like script languages like Urdu, Punjabi and Sindhi are widely spoken languages of the world, especially in Asia. An enormous amount of printed and handwritten text of such languages exist, which needs to be converted into computer-understandable formats for knowledge extraction. In this study, extreme learning machine’s (ELM’s) most recently proposed variant called deep extreme learning machine (DELM)-based optical character recognition (OCR) system is proposed to enhance Urdu-like script language’s character recognition rate. The proposed DELM-based character recognition model is optimizing the OCR process by reducing the overhead of Pre-processing, Segmentation and Feature Extraction Layer. The proposed system evaluations accomplished 98.75% training accuracy with 1.492 × 10−3 RMSE and 98.12% testing accuracy with 1.587 × 10−3 RMSE, with six DELM hidden layers. The results show that the proposed system has attained the foremost recognition rate as compared to any previously proposed Urdu-like script language OCR system. This technique is applicable for machine-printed text and fractionally useful for handwritten text as well. This study will aid in the advancement of more accurate Urdu-like script OCR’s software systems in the future.

Author(s):  
Binod Kumar Prasad

Purpose of the study: The purpose of this work is to present an offline Optical Character Recognition system to recognise handwritten English numerals to help automation of document reading. It helps to avoid tedious and time-consuming manual typing to key in important information in a computer system to preserve it for a longer time. Methodology: This work applies Curvature Features of English numeral images by encoding them in terms of distance and slope. The finer local details of images have been extracted by using Zonal features. The feature vectors obtained from the combination of these features have been fed to the KNN classifier. The whole work has been executed using the MatLab Image Processing toolbox. Main Findings: The system produces an average recognition rate of 96.67% with K=1 whereas, with K=3, the rate increased to 97% with corresponding errors of 3.33% and 3% respectively. Out of all the ten numerals, some numerals like ‘3’ and ‘8’ have shown respectively lower recognition rates. It is because of the similarity between their structures. Applications of this study: The proposed work is related to the recognition of English numerals. The model can be used widely for recognition of any pattern like signature verification, face recognition, character or word recognition in another language under Natural Language Processing, etc. Novelty/Originality of this study: The novelty of the work lies in the process of feature extraction. Curves present in the structure of a numeral sample have been encoded based on distance and slope thereby presenting Distance features and Slope features. Vertical Delta Distance Coding (VDDC) and Horizontal Delta Distance Coding (HDDC) encode a curve from vertical and horizontal directions to reveal concavity and convexity from different angles.


Optical Character Recognition or Optical Character Reader (OCR) is a pattern-based method consciousness that transforms the concept of electronic conversion of images of handwritten text or printed text in a text compiled. Equipment or tools used for that purpose are cameras and apartment scanners. Handwritten text is scanned using a scanner. The image of the scrutinized document is processed using the program. Identification of manuscripts is difficult compared to other western language texts. In our proposed work we will accept the challenge of identifying letters and letters and working to achieve the same. Image Preprocessing techniques can effectively improve the accuracy of an OCR engine. The goal is to design and implement a machine with a learning machine and Python that is best to work with more accurate than OCR's pre-built machines with unique technologies such as MatLab, Artificial Intelligence, Neural networks, etc.


2019 ◽  
pp. 2067-2079
Author(s):  
Waleed Noori Hussein ◽  
Haider N. Hussain

     The growing relevance of printed and digitalized hand-written characters has necessitated the need for convalescent automatic recognition of characters in Optical Character Recognition (OCR). Among the handwritten characters, Arabic is one of those with special attention due to its distinctive nature, and the inherent challenges in its recognition systems. This distinctiveness of Arabic characters, with the difference in personal writing styles and proficiency, are complicating the effectiveness of its online handwritten recognition systems. This research, based on limitations and scope of previous related studies, studied the recognition of Arabic isolated characters through the identification of its features and dots in view of producing an efficient online Arabic handwriting isolated character recognition system. It proposes a hybrid of decision tree and Artificial Neural Network (ANN), as against being combined with other algorithms as found in previous studies. The proposed recognition process has four main steps with associated sub-steps. The results showed that the proposed method achieved the highest performance at 96.7%, whereas the benchmark methods which are EDMS and Naeimizaghiani had 68.88% and 78.5 % respectively. Based on this, ANN has the best performance recognition rate at 98.8%, while the best rate for decision tree was obtained at 97.2%.


An Optical Character Recognition (OCR) is the process of converting an image representation of a document into an editable format. In addition, people have the ability to recognize characters without difficulty as reading papers or books. However, developing an OCR system that has the ability to read and recognized Arabic diacritics characters as human still, remain a problem. More, specifically, poor recognition rate in most of optical diacritics characters recognition is mainly attributed to failing in segmenting a handwritten text correctly. To overcome this problem, we perform develop a method based on seven operations; it starts with searching the text-line height followed by reading words from the line. Then identify the diacritics regions. The segmentation is also applied during this operation by converting the text-line into a grayscale and binary image. Moreover, we introduced a new model based on k-nearest neighbors (KNN) algorithm to identify diacritics and characters segmentation. KNN is trained to directly predict the diacritic from the text-line. Finally, we offer an evaluation discussion on optical diacritics characters recognition.


Author(s):  
Bassam Alqaralleh ◽  
Malek Zakarya Alksasbeh ◽  
Tamer Abukhalil ◽  
Harbi Almahafzah ◽  
Tawfiq Al Rawashdeh

This paper brings into discussion the problem of recognizing Arabic numbers using a monocular camera as the only sensor. When a digital image is presented in this application, optical character recognition (OCR) can be exploited to comprehend numerical data. However, there has been a limited success when applied to the handwritten Arabic (Indian) numbers. This paper aims to overcome this limitation and introduces optical character recognition system based on skeleton matching. The proposed approach is used for handwritten Arabic numbers only. The experimental results indicate the effectiveness of the proposed optical character recognition system even for numbers written in worst case. The right system achieves a recognition rate of 99.3 %.


Sign in / Sign up

Export Citation Format

Share Document