A Clustering Backed Deep Learning Approach for Document Layout Analysis

Lecture Notes in Computer Science - Machine Learning and Knowledge Extraction ◽

10.1007/978-3-030-57321-8_23 ◽

2020 ◽

pp. 423-430

Author(s):

Rhys Agombar ◽

Max Luebbering ◽

Rafet Sifa

Keyword(s):

Deep Learning ◽

Learning Approach ◽

Layout Analysis ◽

Document Layout Analysis ◽

Document Layout

Download Full-text

Document Layout Analysis Using Detection Transformers

10.2118/207266-ms ◽

2021 ◽

Author(s):

Prashanth Pillai ◽

Purnaprajna Mangsuli

Keyword(s):

Deep Learning ◽

Object Detection ◽

Superior Performance ◽

Layout Analysis ◽

Bounding Box ◽

Document Layout Analysis ◽

Document Layout ◽

Bounding Boxes ◽

Abstract In the O&G (Oil & Gas) industry, unstructured data sources such as technical reports on hydrocarbon production, daily drilling, well construction, etc. contain valuable information. This information however is conveyed through various formats such as tables, forms, text, figures, etc. Detecting these different entities in documents is essential for building a structured representation of the information within and for automated processing of documents at scale. Our work presents a document layout analysis workflow to detect/localize different entities based on a deep learning-based framework. The workflow comprises of a deep learning-based object-detection framework based on transformers to identify the spatial location of entities in a document page. The key elements of the object-detection pipeline include a residual network backbone for feature extraction and an encoder-decoder transformer based on the latest detection transformers (DETR) to predict object-bounding boxes and category labels. The object detection is formulated as a direct set prediction task using bipartite matching while also eliminating conventional operations like anchor box generation and non-maximal suppression. The availability of sufficient publicly available document layout data sets that incorporate the artifacts observed in historical O&G technical reports is often a major challenge. We attempt to address this challenge by using a novel training data augmentation methodology. The dense occurrence of elements in a page can often introduce uncertainties resulting in bounding boxes cutting through text content. We adopt a bounding box post-processing methodology to refine the bounding box coordinates to minimize undercuts. The proposed document layout analysis pipeline was trained to detect entity types such as headings, text blocks, tables, forms, and images/charts in a document page. A wide range of pages from lithology, stratigraphy, drilling, and field development reports were used for model training. The reports also included a considerable number of historical scanned reports. The trained object-detection model was evaluated on a test data set prepared from the O&G reports. DETR demonstrated superior performance when compared with the Mask R-CNN on our dataset.

Download Full-text

Chinese document layout analysis based on texture features

Proceedings. International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2002.1175330 ◽

2003 ◽

Author(s):

Yu Wang ◽

Xue-Dong Tian ◽

Bao-Lan Guo

Keyword(s):

Texture Features ◽

Layout Analysis ◽

Document Layout Analysis ◽

Document Layout

Download Full-text

Investigation of feature selection for historical document layout analysis

2014 4th International Conference on Image Processing Theory, Tools and Applications (IPTA) ◽

10.1109/ipta.2014.7001961 ◽

2014 ◽

Author(s):

Hao Wei ◽

Kai Chen ◽

Anguelos Nicolaou ◽

Marcus Liwicki ◽

Rolf Ingold

Keyword(s):

Feature Selection ◽

Historical Document ◽

Layout Analysis ◽

Document Layout Analysis ◽

Document Layout ◽

Download Full-text

A Document Layout Analysis Method Based on Morphological Operators and Connected Components

2018 XLIV Latin American Computer Conference (CLEI) ◽

10.1109/clei.2018.00080 ◽

2018 ◽

Author(s):

Sebastian Wilde Alarcon Arenas ◽

Yessenia Yari ◽

Graciela Meza-Lovon

Keyword(s):

Connected Components ◽

Analysis Method ◽

Layout Analysis ◽

Morphological Operators ◽

Document Layout Analysis ◽

Document Layout

Download Full-text

A SURVEY OF TEXTURE-BASED METHODS FOR DOCUMENT LAYOUT ANALYSIS

Series in Machine Perception and Artificial Intelligence - Texture Analysis in Machine Vision ◽

10.1142/9789812792495_0012 ◽

2000 ◽

pp. 165-177 ◽

Author(s):

OLEG OKUN ◽

MATTI PIETIKÄINEN

Keyword(s):

Layout Analysis ◽

Document Layout Analysis ◽

Document Layout

Download Full-text

Local Descriptors for Document Layout Analysis

Advances in Visual Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-17277-9_4 ◽

2010 ◽

pp. 29-38 ◽

Author(s):

Angelika Garz ◽

Markus Diem ◽

Robert Sablatnig

Keyword(s):

Layout Analysis ◽

Local Descriptors ◽

Document Layout Analysis ◽

Document Layout

Download Full-text

Hybrid Feature Selection for Historical Document Layout Analysis

2014 14th International Conference on Frontiers in Handwriting Recognition ◽

10.1109/icfhr.2014.22 ◽

2014 ◽

Author(s):

Hao Wei ◽

Kai Chen ◽

Rolf Ingold ◽

Marcus Liwicki

Keyword(s):

Feature Selection ◽

Historical Document ◽

Layout Analysis ◽

Document Layout Analysis ◽

Document Layout ◽

Download Full-text

Arabic document layout analysis

Pattern Analysis and Applications ◽

10.1007/s10044-017-0595-x ◽

2017 ◽

Vol 20 (4) ◽

pp. 1275-1287 ◽

Author(s):

Amany M. Hesham ◽

Mohsen A. A. Rashwan ◽

Hassanin M. Al-Barhamtoshy ◽

Sherif M. Abdou ◽

Amr A. Badr ◽

...

Keyword(s):

Layout Analysis ◽

Document Layout Analysis ◽

Document Layout

Download Full-text

Document Layout Analysis Based on Fuzzy Energy Matrix

International Journal of Contents ◽

10.5392/ijoc.2015.11.2.001 ◽

2015 ◽

Vol 11 (2) ◽

pp. 1-8

Author(s):

KangHan Oh ◽

SooHyung Kim

Keyword(s):

Energy Matrix ◽

Layout Analysis ◽

Document Layout Analysis ◽

Document Layout ◽

Download Full-text

Separation of Text and Non-text in Document Layout Analysis using a Recursive Filter

KSII Transactions on Internet and Information Systems ◽

10.3837/tiis.2015.10.017 ◽

2015 ◽

Vol 9 (10) ◽

Keyword(s):

Layout Analysis ◽

Recursive Filter ◽

Document Layout Analysis ◽

Document Layout

Download Full-text