lexical resources
Recently Published Documents


TOTAL DOCUMENTS

252
(FIVE YEARS 71)

H-INDEX

15
(FIVE YEARS 3)

Author(s):  
Dhanashree S. Kulkarni ◽  
Sunil S. Rodd

Sentiment Analysis (SA) has been a core interest in the field of text mining research, dealing with computational processing of sentiments, views, and subjective nature of the text. Due to the availability of extensive web-based data in Indian languages such as Hindi, Marathi, Kannada, Tamil, and so on. It has become extremely significant to analyze this data and recover valuable and relevant information. Hindi being the first language of the majority of the population in India, SA in Hindi has turned out to be a critical task particularly for companies and government organizations. This research portrays a systematic review specifically in the field of Hindi SA. The major contribution of this article includes the categorization of numerous articles based on techniques that have attracted researchers in performing SA tasks in Hindi language. This survey classifies these state-of-the-art computational intelligence techniques into four major categories namely lexicon-based techniques, machine learning techniques, deep learning techniques, and hybrid techniques. It discusses the importance of these techniques based on different aspects such as their impact on the issues of SA, levels of analysis, and performance evaluation measures. The research puts forward a comprehensive overview of the majority of the work done in Hindi SA. This study will help researchers in finding out resources such as annotated datasets, linguistic resources, and lexical resources. This survey delivers some significant findings and presents overall future research directions in the field of Hindi SA.


2022 ◽  
Vol 12 ◽  
Author(s):  
Zhong Wang ◽  
Weiwei Fan ◽  
Alex Chengyu Fang

Previous research on the INTRODUCTORY IT PATTERN unveiled various lexical and grammatical aspects of its use as a grammatical stance device, including the range of the most frequently used adjectival and verbal stance lexemes, associated stance meanings, the most frequent sub-patterns, and the distinct uses in various contextual settings of the pattern. However, the stance meanings of the pattern, which are deeply rooted in the associated lexical resources, are still understudied. This study explores the meanings of the INTRODUCTORY IT PATTERN by referring to the stance meanings of the pattern associated with the adjectival and verbal lexemes that are statistically attracted to the pattern. The research samples were extracted from the British component of the International Corpus of English (ICE-GB). The samples were manually annotated for different stance types and a collexeme analysis was performed to identify the full range of stance lexemes statistically associated with the INTRODUCTORY IT PATTERN (collexemes). The results show that both adjectival and verbal collexemes are statistically and functionally significant for the delivery of discrete stance types/subtypes. Adjectival collexemes are frequently deployed for all four stance types: Epistemic stance, Evaluation stance, Dynamic stance, and Deontic stance, while verbal collexemes are valuable lexical resources for the Epistemic stance, as their use entails modalized evidentiality, pointing to epistemic judgment of the writer-speaker toward events/propositions. Close examination of the use of adjectival and verbal collexemes identified three fundamental meanings of the INTRODUCTORY IT PATTERN. First, the pattern is inherently evaluative as it tends to attract more lexemes with evaluative meanings and associates evaluative meanings with superficially non-evaluative lexemes. Second, it features a scalarized expression of diversified stance types/subtypes, thus, especially reflective of the scalarized semantic feature of stance expression. Third, it connotates an overwhelmingly positive likelihood judgment. The article concludes by discussing the limitations of this study.


Author(s):  
Anna M. Plekhanova ◽  
◽  
Tsymzhit P. Vanchikova

The article aims to analyze the principal directions in the activities of Buryat-Mongolian State Institute of Culture (1929–1936) / Buryat-Mongolian State Institute of Language, Literature and History (1936–1944), the successor of the first scientific organization in Burya­tia — the Buryat-Mongolian Scientific Committee (1922–1929). It focuses on the achievements and problems in the organization and implementation of scientific research in the humanities in the 1930s. Materials. The sources used are unpublished documents of the Center for Oriental Manuscripts and Xylographs of the IMBT SB RAS, such as annual plans and reports on research work, minutes of meetings of the Directorate, expedition reports, presentations, abstracts and minutes of conferences, correspondence with various organizations and offices, and other materials that were instrumental in reconstructing the history of reorganizations of the scientific institute under study, in following the changes in its scientific program, and in showing its effectiveness and efficiency. Results. In the 1930–1940s, the studies in the field of history, language, literature, and arts of the Buryat-Mongolian people were the principal directions of research in the Institute. Archaeological expeditions were useful in drawing a general picture of the ancient history of Buryatia and the first cultural-historical schemes. Historians’ work resulted in publishing a significant number of documents devoted to the history of the Buryat-Mongolian people, the publications included materials on issues of the pre-revolutionary Buryat-Mongolia, the revolutionary movement and the Civil war period, culture, and education, including monographs on the history of Buryatia recognized today as classical scientific works. Within the framework of the established ideological attitudes, there was a discussion on controversial issues of the history of Buryat-Mongolia, which accepted the one-line nature of the historical process in Buryat studies. Thanks to the successes of Buryat linguistics, a reform of the Buryat-Mongolian writing was carried out, first based on the Latin, and then on the Cyrillic alphabet. The linguists of the Institute made a decisive contribution to the elaboration of the literary Buryat language, enriching its lexical resources and standardizing spelling and grammar. Collection, systematization and study of oral folk art and musical folklore, adding to the Manuscript Department of the Institute manuscripts and woodcuts in Tibetan, Mongolian, Buryat-Mongolian languages, as well as uligers, chronicles, and other historical and literary monuments, and translation work — these and other areas of scientific research shaped the development of the humanities in Buryatia in the 1930–1940s. Throughout the period of persecutions and repressions, despite personnel shortage and everyday hardships, the Institute’s team continued their work, conducting large-scale studies of the socio-political and economic history, the culture and art of Buryat-Mongolia.


2021 ◽  
Vol 21 ◽  
pp. 149-186
Author(s):  
Jamila Oueslati ◽  
Agata Wolarska

The large number of words from Arabic found in modern Spanish is proof of the deep influence Arabic has had on the Spanish language. Historical sociolinguistic processes which have lasted to the present day indicate that the influence of Arabic culture has been neither brief or superficial. Instead, it has, and continues to have great significance for the language situation of Spain. Much linguistic research has shown how loans from Arabic have been assimilated as they have become part of the lexical resources of modern Spanish. Arabic culture and civilization in the Iberian Peninsula (711-1942) above all involved the sciences, literature, art, architecture, engineering, agriculture, the military, medicine. At that time, Al-Andalus was one of the most influential European centers of science and cultural exchange in Europe. Contacts between Arabic and the Romance languages found in the Iberian Peninsula resulted in numerous loans both from Arabic to the Romance languages and from the Romance languages to Arabic. These topics have been the subject of extensive research conducted from historical, cultural and linguistic points of view. Despite the existence of numerous works concerning Arabic loans, this area requires, further, deeper research. In this article, selected issues concerning Arabic loans in Spanish are analyzed as are the adaptive processes they have undergone and the level of their integration into Spanish. The basis of the analysis is made up of oral and written texts collected in the Corpus de Español del Siglo XXI [CORPES XXI, RAE] – a corpus of contemporary Spanish from the 21st century.


Author(s):  
Ирина Анатольевна Мартыненко

Введение. Являясь крупнейшей по площади испаноязычной страной в мире, Аргентина всегда играла важную роль в историческом, экономическом и культурном развитии южноамериканского континента. Топонимический корпус Аргентины на протяжении веков складывался из европейского (преимущественно испанского) и автохтонного пластов. Испаноязычные компоненты в топонимии этого латиноамериканского государства являются своеобразной картографической формой существования испанского языка, семиотическим маркером присутствия испанской культуры в данном уголке мира. Однако в настоящее время говорить о подробной изученности аргентинской географической номенклатуры не приходится. Цель – предпринять попытку комплексного лингвистического описания испаноязычной топонимии Аргентины и внести вклад в изучение историко-культурного облика страны. Материал и методы. Выводы основаны на полученных результатах лингвопрагматического анализа, проведенного с помощью современных электронных технических средств: в качестве материалов и инструментов были использованы карты страны, электронные системы GoogleMaps и GeoNames. Результаты и обсуждение. Автор группирует географические имена испанского происхождения, приводя множественные примеры, объясняя их этимологию и выявляя метонимические цепочки. Наряду с описанием испаноязычных антропотопонимов, религиозных аллюзий, зоо- и фитотопонимов, эмоционально окрашенных топоединиц и географических имен, содержащих числительные, отдельное место отводится военной топонимии, топонимам-тезкам и топонимам-дублетам. Помимо испаноязычных географических имен, определяется доля гибридной и коренной топонимии от общего числа географических названий региона. Гетерогенность форм географических названий здесь указывает на столкновение цивилизаций, неоднородность языковых контактов и богатство лексических ресурсов локальной топонимической системы. Выяснено, что испаноязычные топонимы составляют наибольший процент в сравнении с автохтонными топонимами и топонимами-гибридами региона. В результате исследования испаноязычный топонимический пласт был распределен по группам, самую многочисленную из которых составляют антропотопонимы. Заключение. Полученные структурированные и описанные данные исследования способствуют продвижению цифровых технологий в ономастических изысканиях и позволяют использовать продемонстрированные результаты в рамках курсов теории языка, топонимики, теории языковых контактов, мигрантологии, лингвострановедения, лексикологии, диалектологии, теории нормативности, а также в преподавании испанского языка. Introduction. As the world’s largest Spanish-speaking country by area, Argentina has always played an important role in the historical, economic and cultural development of the South American continent. Over the centuries, the toponymic corpus of Argentina consists of European (mainly Hispanic) and autochthonous strata. The Hispanic components in the toponymy of this South American state are a kind of cartographic form of the existence of the Spanish language, a semiotic marker of the presence of Spanish culture in this corner of the world. However, at present, there is no possibility to talk about a detailed study of the Argentine geographic nomenclature. The purpose of the article is to attempt a comprehensive linguistic description of the Hispanic toponymy of Argentina. Material and methods. The conclusions are based on the results of the linguopragmatic analysis carried out by traditional means and modern electronic systems: maps of the country, electronic systems GoogleMaps and GeoNames were used as materials and tools. Results and discussion. The author groups place names of Spanish origin, giving multiple examples, explaining their etymology and identifying metonymic chains. Along with the description of Hispanic anthropotoponyms, religious allusions, zoo- and phytotoponyms, emotionally colored topo-units and geographical names containing numerals, a separate place is given to military toponymy, shift names and doublet toponyms. In addition to Hispanic place names, the share of bilingual and indigenous place names in the total number of place names in the region is determined. The heterogeneity of the forms of geographical names here indicates the clash of civilizations, the heterogeneity of linguistic contacts and the richness of the lexical resources of the local toponymic system. It was found that Hispanic toponyms make up the largest percentage in comparison with autochthonous toponyms and bilingual toponyms of the country. As a result of the study, the Hispanic toponymic layer was divided into groups, the most numerous of which are anthropotoponyms. Conclusion. The resulting structured and described research data contribute to the advancement of digital technologies in onomastic research, and also allow the use of the demonstrated results in the framework of courses in language theory, toponymy, theory of language contacts, migrantology, linguistic studies, lexicology, dialectology, theory of normativity, as well as in teaching Spanish.


2021 ◽  
pp. 123-138
Author(s):  
Krister Lindén ◽  
Jyrki Niemi ◽  
Lars Borin ◽  
Markus Forsberg ◽  
Bolette S. Pedersen ◽  
...  
Keyword(s):  

2021 ◽  
pp. 30-41
Author(s):  
Tijana BALEK

The paper presents 29 relative caritive constructions identified in Serbian language thanks to a search of the Serbian National Corpus, from which they are excerpted. Detailed analysis has shown that most of the excerpted constructions consist of a preposition, noun in a certain case, conjunction-preposition complex ‘ili bez’ and, often, a noun or pronoun in genitive. The analysis has also shown that 29 types of the excerpted constructions consist of 9 different prepositions on their left side which give the information about the absens, and that inclusion and action of the absens in concrete situations are relativized. Nevertheless, there are 5 non-prepositional constructions and 2 constructions that have complex prepositional expressions ‘u suprotnosti sa’ and ‘u saglasnosti sa’ on the left side. Regular caritive constructions negate inclusion of the absens on the right part of the construction, but these (analyzed in the paper) are specific because the conjunction ‘ili’ slightly neutralizes caritive semantics of the preposition ‘bez’. It should be said that among excerpted constructions we identified 11 with anaphoric features that are discussed in the paper as well. In general, the conducted analysis showed that caritive semantics in Slavic languages (i.e. in Serbian and Russian) is not grammaticalized, which means that there are many grammatical and lexical resources for its presentation that need to be registered and described.


2021 ◽  
Vol 82 (5) ◽  
pp. 74-79
Author(s):  
I. V. Yakushevich

This article presents a linguopoetic analysis of Boris Pasternak’s poem "Wind" ("Veter") from the position of the lingual embodiment of the duality of mythological worlds. This research focuses on the symbol of "the wind as a spirit", upon which the poem’s whole mystical idea relies. The purpose of this article is to reveal which the linguistic means used to translate the duality of mythological worlds, as well as how this cognition merges with the author’s experience and determines the poem’s figurative system and idea. The understanding of the duality of mythological worlds requires the law of participation (L. L vy-Bruhl) – the identification of the mental, emotional, and physical properties of a person and nature. In Pasternak’s poem, the suffering and rushing "I" of the deceased lyrical hero becomes the wind. In this study, the word-symbol "wind" is studied in the semantic and semiotic aspect as a sign. Its signifier is the lexeme wind meaning 'perceptual idea of an air flow'; signified – the symbolic meaning of 'spirit, soul, immortality', due to the etymological meaning of the word and pagan mythology. The results reveal that the symbol "wind" is the carrier of the duality of mythological worlds, and it programs the fictional world of the poem: on the one hand, these are the actual world of the lyrical heroine, the house, and the wind, which swings pine trees; on the other hand – the imaginary world of the spirit of the dead lyrical hero. The lexical resources of the poetic text translate this opposition in the ratio of the words I and wind, personal pronouns I and you, as well as the words ended and alive. At the grammatical level, the duality is expressed by the contrast of the verbal forms of the past and present time, as well as by the passage from the indirect thought (the lyrical hero’s mental monologue) to the 3rd person narrative about the wind and the pine trees and by the return of the poem to the lyrical hero’s indirect thought at the end. This is how Pasternak implements one of the main ideas of his novel "Doctor Zhivago" – the idea of immortality, which is confirmed in the article by referring to the novel’s macro context and biographical materials.


2021 ◽  
Vol 35 (4) ◽  
pp. 307-314
Author(s):  
Redouane Karsi ◽  
Mounia Zaim ◽  
Jamila El Alami

Traditionally, pharmacovigilance data are collected during clinical trials on a small sample of patients and are therefore insufficient to adequately assess drugs. Nowadays, consumers use online drug forums to share their opinions and experiences about medication. These feedbacks, which are widely available on the web, are automatically analyzed to extract relevant information for decision-making. Currently, sentiment analysis methods are being put forward to leverage consumers' opinions and produce useful drug monitoring indicators. However, these methods' effectiveness depends on the quality of word representation, which presents a real challenge because the information contained in user reviews is noisy and very subjective. Over time, several sentiment classification problems use machine learning methods based on the traditional bag of words model, sometimes enhanced with lexical resources. In recent years, word embedding models have significantly improved classification performance due to their ability to capture words' syntactic and semantic properties. Unfortunately, these latter models are weak in sentiment classification tasks because they are unable to encode sentiment information in the word representation. Indeed, two words with opposite polarities can have close word embeddings as they appear together in the same context. To overcome this drawback, some studies have proposed refining pre-trained word embeddings with lexical resources or learning word embeddings using training data. However, these models depend on external resources and are complex to implement. This work proposes a deep contextual word embeddings model called ELMo that inherently captures the sentiment information by providing separate vectors for words with opposite polarities. Different variants of our proposed model are compared with a benchmark of pre-trained word embeddings models using SVM classifier trained on Drug Review Dataset. Experimental results show that ELMo embeddings improve classification performance in sentiment analysis tasks on the pharmaceutical domain.


2021 ◽  
Author(s):  
Katharina Allgaier ◽  
Susana Veríssimo ◽  
Sherry Tan ◽  
Matthias Orlikowski ◽  
Matthias Hartung

We describe the use of Linguistic Linked Open Data (LLOD) to support a cross-lingual transfer framework for concept detection in online health communities. Our goal is to develop multilingual text analytics as an enabler for analyzing health-related quality of life (HRQoL) from self-reported patient narratives. The framework capitalizes on supervised cross-lingual projection methods, so that labeled training data for a source language are sufficient and are not needed for target languages. Cross-lingual supervision is provided by LLOD lexical resources to learn bilingual word embeddings that are simultaneously tuned to represent an inventory of HRQoL concepts based on the World Health Organization’s quality of life surveys (WHOQOL). We demonstrate that lexicon induction from LLOD resources is a powerful method that yields rich and informative lexical resources for the cross-lingual concept detection task which can outperform existing domain-specific lexica. Furthermore, in a comparative evaluation we find that our models based on bilingual word embeddings exhibit a high degree of complementarity with an approach that integrates machine translation and rule-based extraction algorithms. In a combined configuration, our models rival the performance of state-of-the-art cross-lingual transformers, despite being of considerably lower model complexity.


Sign in / Sign up

Export Citation Format

Share Document