Robust Character Recognition Using Connected-Component Extraction

Author(s):  
Wai-Lin Chan ◽  
Chi-Man Pun

Segmentation is division of something into smaller parts and one of the Component of character recognition system. Separation of characters, words and lines are done in Segmentation from text documents. character recognition is a process which allows computers to recognize written or printed characters such as numbers or letters and to change them into a form that the computer can use. the accuracy of OCR system is done by taking the output of an OCR run for an image and comparing it to the original version of the same text. The main aim of this paper is to find out the various text line segmentations are Projection profiles, Weighted Bucket Method. Proposed method is horizontal projection profile and connected component method on Handwritten Kannada language. These methods are used for experimentation and finally comparing their accuracy and results.


2000 ◽  
Author(s):  
Jun Zhao ◽  
Shahram Latifi ◽  
Dongsheng Yao ◽  
Emma Regentova

2015 ◽  
Vol 4 (3) ◽  
pp. 1-29 ◽  
Author(s):  
P. Sudir ◽  
M. Ravishankar

In present day video text greatly helps video indexing and retrieval system as they often carry significant semantic information. Video text analysis is challenging due to varying background, multiple orientations and low contrast between text and non-text regions. Proposed approach explores a new framework for curved video text detection and recognition where from the observation that curve text regions can be well defined by edges size and uniform texture, Probable curved text edge detection is accomplished by processing wavelet sub bands followed by text localization by utilizing fast texture descriptor LU-transform. Binarization is achieved by maximal H-transform. A Connected Component filtering method followed by B-Spline curve fitting on centroid of each character vertically aligns each oriented character. The aligned text string is recognized by optical character recognition (OCR). Experiments on various curved video frames shows that proposed method is efficacious and robust in detecting and recognizing curved videotext.


Author(s):  
Khairun Saddami ◽  
Khairul Munadi ◽  
Yuwaldi Away ◽  
Fitri Arnia

<p><span>Ancient document usually contains multiple noises such as uneven-background, show-through, water-spilling, spots, and blur text. The noise will affect the binarization process. Binarization is an extremely important process in image processing, especially for character recognition. This paper presents an improvement to Nina binarization technique. Improvements were achieved by reducing processing steps and replacing median filtering by Wiener filtering. First, the document background was approximated by using Wiener filter, and then image subtraction was applied. Furthermore, the manuscript contrast was adjusted by mapping intensity of image value using intensity transformation method. Next, the local Otsu thresholding was applied. For removing spotting noise, we applied labeled connected component. The proposed method had been testing on H-DIBCO 2014 and degraded Jawi handwritten ancient documents. It performed better regarding recall and precision values, as compared to Otsu, Niblack, Sauvola, Lu, Su, and Nina, especially in the documents with show-through, water-spilling and combination noises.</span></p>


2021 ◽  
Vol 11 (6) ◽  
pp. 7968-7973
Author(s):  
M. Kazmi ◽  
F. Yasir ◽  
S. Habib ◽  
M. S. Hayat ◽  
S. A. Qazi

Urdu Optical Character Recognition (OCR) based on character level recognition (analytical approach) is less popular as compared to ligature level recognition (holistic approach) due to its added complexity, characters and strokes overlapping. This paper presents a holistic approach Urdu ligature extraction technique. The proposed Photometric Ligature Extraction (PLE) technique is independent of font size and column layout and is capable to handle non-overlapping and all inter and intra overlapping ligatures. It uses a customized photometric filter along with the application of X-shearing and padding with connected component analysis, to extract complete ligatures instead of extracting primary and secondary ligatures separately. A total of ~ 2,67,800 ligatures were extracted from scanned Urdu Nastaliq printed text images with an accuracy of 99.4%. Thus, the proposed framework outperforms the existing Urdu Nastaliq text extraction and segmentation algorithms. The proposed PLE framework can also be applied to other languages using the Nastaliq script style, languages such as Arabic, Persian, Pashto, and Sindhi.


2020 ◽  
Vol 9 (2) ◽  
pp. 249
Author(s):  
Audini Nifira Putri ◽  
I Putu Gede Hendra Suputra

Arabic letters or Hijaiyah letters recognition is a challenge in itself because one letter consists of more than one character, namely the main character, companion character such as dots and lines, and punctuation called harakat. The image segmentation process is the most important in a character recognition system because it affects the separation of objects in an image. In this research, Hijaiyah letter segmentation aims to separate the letters according to the character of each letter using the Connected Component Labeling (CCL) method. Merging labels on each character will be done by looking for the Euclidean distance value from adjacent centroids. The experiment succeeded in segmenting each Hijaiyah character with an accuracy value of 86%. 


Author(s):  
Qaiser Abbas

This paper presents a technique for optical recognition of Urdu characters using template matching based on a probabilistic N-Gram language model. Dataset used has the collection of both printed and typed text. This model is able to perform three types of segmentations including line, ligature and character using horizontal projection, connected component labeling, corners and pointers techniques, respectively. A separate stochastic lexicon is built from a collected corpus, which contains the probability values of grams. By using template matching and the N-Gram language model, our study predicts complete segmented words with the promising result, particularly in case of bigrams. It outperforms three out of four existing models with an accuracy rate of 97.33%. Results achieved on our test dataset are encouraging in one perspective but provide direction to work for further improvement in this model.


Author(s):  
Ikhwan Ruslianto ◽  
Agus Harjoko

AbstrakPengenalan plat nomor di Indonesia biasanya digunakan pada sistem parkir yang masih dilakukan secara manual, yaitu dengan mencatat karakter plat nomor oleh petugas jaga parkir. Padahal pengenalan plat nomor tidak hanya dilakukan untuk system perparkiran tetapi dapat digunakan untuk menemukan kendaraan yang melanggar peraturan lalu lintas dijalan raya secara real time, misalnya pelaku tabrak lari pada kecelakaan maupun kendaraan yang melanggar rambu-rambu lalu lintas.Penelitian ini memberikan alternatif pengenalan karakter plat nomor mobil menggunakan metode connected component analysis dan matching sehingga dapat menyelesaikan permasalahan dengan background yang kompleks dan mobil yang bergerak dijalan raya.Metode connected component analysis berhasil melakukan proses segmentasi plat dan segmentasi karakter dengan kondisi background yang kompleks secara tepat terhadap 67 sampel citra dengan tingkat keberhasilan 95,52% untuk segmentasi plat dan 94,98% untuk segmentasi karakter dan metode template matching berhasil melakukan proses pengenalan karakter secara akurat dengan tingkat keberhasilan 87,45%. Kata kunci— real time, connected component analysis, template matching  Abstract Indonesia’s number plat recognition system are typically used in parking lots that are still done manually, by recording the license plate characters by parking guard. Though number plate recognition system is not only for parking but can be used to find vehicles that violate traffic rules highway street in real time, such as actors on the hit and run accident and the vehicles that violate traffic signs.This study provides an alternative car number plate character recognition using connected component analysis and matching so as to solve problems with complex background and a moving car on the road.Connected component analysis method successfully to the plates segmentation and character segmentation in complex background condition are appropriate to the 67 sample images with the success rate of 95.52% for the plate segmentation and 94.98% for plate character segmentation and template matching method successfully perform the character recognition process accurately with a success rate of 87.45%. Keywords— real time, connected component analysis, template matching


Sign in / Sign up

Export Citation Format

Share Document