scholarly journals From Graphematics to Phrasal, Sentential, and Textual Semantics Through Morphosyntax by Means of Corpus-Driven Grammar and Ontology: A Case Study on One Tibetan Text

2021 ◽  
Vol 72 (2) ◽  
pp. 319-329
Author(s):  
Aleksei Dobrov ◽  
Maria Smirnova

Abstract This article presents the current results of an ongoing study of the possibilities of fine-tuning automatic morphosyntactic and semantic annotation by means of improving the underlying formal grammar and ontology on the example of one Tibetan text. The ultimate purpose of work at this stage was to improve linguistic software developed for natural-language processing and understanding in order to achieve complete annotation of a specific text and such state of the formal model, in which all linguistic phenomena observed in the text would be explained. This purpose includes the following tasks: analysis of error cases in annotation of the text from the corpus; eliminating these errors in automatic annotation; development of formal grammar and updating of dictionaries. Along with the morpho-syntactic analysis, the current approach involves simultaneous semantic analysis as well. The article describes semantic annotation of the corpus, required by grammar revision and development, which was made with the use of computer ontology. The work is carried out with one of the corpus texts – a grammatical poetic treatise Sum-cu-pa (VII c.).

2019 ◽  
Author(s):  
Abhisek Chowdhury

Social media feeds are rapidly emerging as a novel avenue for the contribution and dissemination of geographic information. Among which Twitter, a popular micro-blogging service, has recently gained tremendous attention for its real-time nature. For instance, during floods, people usually tweet which enable detection of flood events by observing the twitter feeds promptly. In this paper, we propose a framework to investigate the real-time interplay between catastrophic event and peo-ples’ reaction such as flood and tweets to identify disaster zones. We have demonstrated our approach using the tweets following a flood in the state of Bihar in India during year 2017 as a case study. We construct a classifier for semantic analysis of the tweets in order to classify them into flood and non-flood categories. Subsequently, we apply natural language processing methods to extract information on flood affected areas and use elevation maps to identify potential disaster zones.


Author(s):  
Paula M Mabee ◽  
Wasila M Dahdul ◽  
James P Balhoff ◽  
Hilmar Lapp ◽  
Prashanti Manda ◽  
...  

The study of how the observable features of organisms, i.e., their phenotypes, result from the complex interplay between genetics, development, and the environment, is central to much research in biology. The varied language used in the description of phenotypes, however, impedes the large scale and interdisciplinary analysis of phenotypes by computational methods. The Phenoscape project (www.phenoscape.org) has developed semantic annotation tools and a gene–phenotype knowledgebase, the Phenoscape KB, that uses machine reasoning to connect evolutionary phenotypes from the comparative literature to mutant phenotypes from model organisms. The semantically annotated data enables the linking of novel species phenotypes with candidate genes that may underlie them. Semantic annotation of evolutionary phenotypes further enables previously difficult or novel analyses of comparative anatomy and evolution. These include generating large, synthetic character matrices of presence/absence phenotypes based on inference, and searching for taxa and genes with similar variation profiles using semantic similarity. Phenoscape is further extending these tools to enable users to automatically generate synthetic supermatrices for diverse character types, and use the domain knowledge encoded in ontologies for evolutionary trait analysis. Curating the annotated phenotypes necessary for this research requires significant human curator effort, although semi-automated natural language processing tools promise to expedite the curation of free text. As semantic tools and methods are developed for the biodiversity sciences, new insights from the increasingly connected stores of interoperable phenotypic and genetic data are anticipated.


2021 ◽  
Vol 336 ◽  
pp. 06017
Author(s):  
Mabao Ban ◽  
Zhijie Cai ◽  
Rangzhuoma Cai ◽  
Rangjia Cai

The recognition of Tibetan interrogative sentences is a basic work in natural language processing, which has a wide application value in terms of Tibetan syntactic analysis, semantic analysis, intelligent question answering, search engine and other research fields. Employing interrogative pronouns as a entry point to analyze the phrase features before and after interrogative pronouns, the paper proposes a method for Tibetan interrogative sentence recognition and classification based on phrase features by designing a Tibetan interrogative sentence recognition and classification model based on phrase features. Experimental results show that the recognition accuracy, recall rate and F value of this method are 98.21%, 100.00% and 99.10% respectively, and the average classification accuracy, recall rate and F value are 96.98%, 100.00% and 98.39%, respectively.


Author(s):  
Kristopher Doll ◽  
Conrad S. Tucker

The United States generates more than 250 million tons of municipal solid waste (trash/garbage), with only 34% being recycled. In the broader global environment, the problem of waste management is becoming increasingly relevant, demanding innovative solutions. Traditional End-of-Life (EOL) approaches to managing waste include recycle, reuse, remanufacture and disposal. Recently, resynthesis was proposed as an alternative to traditional EOL options that combines multiple products to create a new product distinct from its parent assemblies. Resynthesis employs data mining and natural language processing algorithms to quantify assembly/subassembly combinations suitable for new product combinations. However, existing resynthesis methodologies proposed in the design community have been limited to exploring subassembly combinations, failing to explore potential combinations on a materials level. The authors of this paper propose a material resynthesis methodology that combines the materials of multiple EOL products using conventional manufacturing processes that generate candidate resynthesized materials that satisfy the needs of existing domains/applications. Appropriate applications for a resynthesized material are discovered by comparing the properties of the new material to the functional requirements of application classes which are found using clustering and latent semantic analysis. In the course of this paper, the authors present a case study that demonstrates the feasibility of the proposed material resynthesis methodology in the construction materials domain.


Author(s):  
Anastasiia Yu. Zinoveva ◽  
Svetlana O. Sheremetyeva ◽  
Ekaterina D. Nerucheva

Properly annotated text corpora are an essential condition in constructing effective and efficient tools for natural language processing (NLP), which provide an operational solution to both theoretical and applied linguistic and informational problems. One of the main and the most complex problems of corpus annotation is resolving tag ambiguities on a specific level of annotation (morphological, syntactic, semantic, etc.). This paper addresses the issue of ambiguity that emerges on the conceptual level, which is the most relevant text annotation level for solving informational tasks. Conceptual annotation is a special type of semantic annotation usually applied to domain corpora to address specific informational problems such as automatic classification, content and trend analyses, machine learning, machine translation, etc. In conceptual annotation, text corpora are annotated with tags reflecting the content of a certain domain, which leads to a type of ambiguity that is different from general semantic ambiguity. It has both universal and language- and domain-specific peculiarities. This paper investigates conceptual ambiguity in a case study of a Russian-language corpus on terror attacks. The research methodology combines automated and manual steps, comprising a) statistical and qualitative corpus analysis, b) the use of pre-developed annotation resources (a terrorism domain ontology, a Russian ontolexicon and a computer platform for conceptual annotation), c) ontological-analysis-based conceptual annotation of the corpus chosen for the case study, d) corpus-based detection and investigation of conceptual ambiguity causes, e) development and experimental study of possible disambiguation methods for some types of conceptual ambiguity. The findings obtained in this study are specific for Russian-language terrorism domain texts, but the conceptual annotation technique and approaches to conceptual disambiguation developed are applicable to other domains and languages.


Author(s):  
Khaoula Mrhar ◽  
Mounia Abik

Explicit Semantic Analysis (ESA) is an approach to measure the semantic relatedness between terms or documents based on similarities to documents of a references corpus usually Wikipedia. ESA usage has received tremendous attention in the field of natural language processing NLP and information retrieval. However, ESA utilizes a huge Wikipedia index matrix in its interpretation by multiplying a large matrix by a term vector to produce a high-dimensional vector. Consequently, the ESA process is too expensive in interpretation and similarity steps. Therefore, the efficiency of ESA will slow down because we lose a lot of time in unnecessary operations. This paper propose enhancements to ESA called optimize-ESA that reduce the dimension at the interpretation stage by computing the semantic similarity in a specific domain. The experimental results show clearly that our method correlates much better with human judgement than the full version ESA approach.


Author(s):  
P. SENGUPTA ◽  
B.B. CHAUDHURI

A lexical subsystem that contains a morphological level parser is necessary for processing natural languages in general and inflectional languages in particular. Such a subsystem should be able to generate the surface form (i.e. as it appears in a natural sentence) of a word, given the sequence of morphemes constituting the word. Conversely, and more importantly, the subsystem should be able to parse a word into its constituent morphemes. A formalism which enables the lexicon writer to specify the lexicon of an inflectional language is discussed. The specifications are used to build up a lexical description in the form of a lexical database on one hand and a formulation of derivational morphology, called Augmented Finite State Automata (AFSA), on the other. A compact lexical representation has been achieved, where generation of the surface forms of a word, as well as parsing of a word is performed in a computationally attractive manner. The output produced as a result of parsing is suitable for input to the next stage of analysis in a Natural Language Processing (NLP) environment, which, in our case is based on a generalization of the Lexical Functional Grammar (LFG). The application of the formalism on inflectional Indian languages is considered, with Bengali, a modern Indian language, as a case study.


2011 ◽  
Vol 2011 ◽  
pp. 1-15 ◽  
Author(s):  
Mourad Gridach ◽  
Noureddine Chenfour

We present an XML approach for the production of an Arabic morphological database for Arabic language that will be used in morphological analysis for modern standard Arabic (MSA). Optimizing the production, maintenance, and extension of morphological database is one of the crucial aspects impacting natural language processing (NLP). For Arabic language, producing a morphological database is not an easy task, because this it has some particularities such as the phenomena of agglutination and a lot of morphological ambiguity phenomenon. The method presented can be exploited by NLP applications such as syntactic analysis, semantic analysis, information retrieval, and orthographical correction.


2017 ◽  
Vol 7 (5) ◽  
pp. 2014-2016
Author(s):  
M. Madhukar ◽  
S. Verma

Social networks have become one of the major and important parts of daily life. Besides sharing ones views the social networking sites can also be very efficiently used to judge the behavior and attitude of individuals towards the posts. Analysis of the mood of public on a particular social issue can be judged by several methods. Analysis of the society mood towards any particular news in form of tweets is investigated in this paper. The key objective behind this research is to increase the accuracy and effectiveness of the classification by the process of Natural Language Processing (NLP) Techniques while focusing on semantics and World Sense Disambiguation. The process of classification includes the combination of the effect of various independent classifiers on one particular classification problem. The data that is available in the form of tweets on twitter can easily frame the insight of the public attitude towards the particular tweet. The proposed work implements a hybrid method that includes Hybrid K, clustering and boosting. A comparison of this scheme versus a K-means/SVM approach is provided. Results are shown and discussed.


2018 ◽  
Author(s):  
Paula M Mabee ◽  
Wasila M Dahdul ◽  
James P Balhoff ◽  
Hilmar Lapp ◽  
Prashanti Manda ◽  
...  

The study of how the observable features of organisms, i.e., their phenotypes, result from the complex interplay between genetics, development, and the environment, is central to much research in biology. The varied language used in the description of phenotypes, however, impedes the large scale and interdisciplinary analysis of phenotypes by computational methods. The Phenoscape project (www.phenoscape.org) has developed semantic annotation tools and a gene–phenotype knowledgebase, the Phenoscape KB, that uses machine reasoning to connect evolutionary phenotypes from the comparative literature to mutant phenotypes from model organisms. The semantically annotated data enables the linking of novel species phenotypes with candidate genes that may underlie them. Semantic annotation of evolutionary phenotypes further enables previously difficult or novel analyses of comparative anatomy and evolution. These include generating large, synthetic character matrices of presence/absence phenotypes based on inference, and searching for taxa and genes with similar variation profiles using semantic similarity. Phenoscape is further extending these tools to enable users to automatically generate synthetic supermatrices for diverse character types, and use the domain knowledge encoded in ontologies for evolutionary trait analysis. Curating the annotated phenotypes necessary for this research requires significant human curator effort, although semi-automated natural language processing tools promise to expedite the curation of free text. As semantic tools and methods are developed for the biodiversity sciences, new insights from the increasingly connected stores of interoperable phenotypic and genetic data are anticipated.


Sign in / Sign up

Export Citation Format

Share Document