An algorithm for rule-based layout pattern matching

Author(s):  
Sheng-Hao Wang ◽  
Yen-Jong Chen ◽  
Ting-Chi Wang ◽  
Oscar Chen
Keyword(s):  
1995 ◽  
Vol 04 (03) ◽  
pp. 301-321 ◽  
Author(s):  
S.E. MICHOS ◽  
N. FAKOTAKIS ◽  
G. KOKKINAKIS

This paper deals with the problems stemming from the parsing of long sentences in quasi free word order languages. Due to the word order freedom of a large category of languages including Greek and the limitations of rule-based grammar parsers in parsing unrestricted texts of such languages, we propose a flexible and effective method for parsing long sentences of such languages that combines heuristic information and pattern-matching techniques in early processing levels. This method is deeply characterized by its simplicity and robustness. Although it has been developed and tested for the Greek language, its theoretical background, implementation algorithm and results are language independent and can be of considerable value for many practical natural language processing (NLP) applications involving parsing of unrestricted texts.


A Romanization system is used to convert some text of a source script to the Roman script through word by word mapping. The phonological characteristics of the source word are not lost. Only writing script is changed, without any changes in the spoken language. This paper presents a rule based approach for Romanization of Gurmukhi script proper nouns. The aim is to develop a lightweight Romanization system, which may produce multiple possible results for the same input word. The algorithm uses a list of Gurmukhi script characters along with their equivalent character combinations in Roman script. Direct mapping of Gurmukhi script characters to their equivalent Roman script character combinations does not produce efficient results, so some rules are applied to get the correct mappings. The rules are basically to place or remove the letter ‘a’ in between the mapped consonants. Three different sets of rules are applied to get three different Romanized outputs. All these outputs are acceptable for information extraction using pattern matching. In Gurmukhi, some words are written differently than these are pronounced. To handle such words, these words or part of these words are stored in a database table. Along with these words their Romanized form is also stored in second column. The table is used to directly pick the Romanization from the table and use it for Romanization of these words. The result of this Romanization system is a set of possible words that can be generated from the source script word. It enables an application to pattern match those output words with some text or database to get the required information


Author(s):  
Bradley J. Falch ◽  
Tony Hu ◽  
Terry Hsuan ◽  
Elvis Yang ◽  
T.H. Yang ◽  
...  
Keyword(s):  

VLSI Design ◽  
1999 ◽  
Vol 10 (1) ◽  
pp. 117-125 ◽  
Author(s):  
Wonjong Kim ◽  
Hyunchul Shin

A new hierarchical layout vs. schematic (LVS) comparison system for layout verification has been developed. The schematic hierarchy is restructured to remove ambiguities for consistent hierarchical matching. Then the circuit hierarchy is reconstructed from the layout netlist by using a modified SubGemini algorithm recursively in bottom-up fashion. For efficiency, simple gates are found by using a fast rule-based pattern matching algorithm during preprocessing. Experimental results show that our hierarchical netlist comparison technique is effective and efficient in CPU time and in memory usage, especially when the circuit is large and hierarchically structured.


2009 ◽  
Vol 35 (5) ◽  
pp. 563-570 ◽  
Author(s):  
Bolanle Adefowoke Ojokoh ◽  
Olumide Sunday Adewale ◽  
Samuel Oluwole Falaki

Web documents are available in various forms, most of which do not carry additional semantics. This paper presents a model for general document metadata extraction. The model, which combines segmentation by keywords and pattern matching techniques, was implemented using PHP, MySQL, JavaScript and HTML. The system was tested with 40 randomly selected PDF documents (mainly theses). An evaluation of the system was done using standard criteria measures namely precision, recall, accuracy and F-measure. The results show that the model is relatively effective for the task of metadata extraction, especially for theses and dissertations. A combination of machine learning with these rule-based methods will be explored in the future for better results.


Sign in / Sign up

Export Citation Format

Share Document