scholarly journals Learning Lexical Subspaces in a Distributional Vector Space

2020 ◽  
Vol 8 ◽  
pp. 311-329
Author(s):  
Kushal Arora ◽  
Aishik Chakraborty ◽  
Jackie C. K. Cheung

In this paper, we propose LexSub, a novel approach towards unifying lexical and distributional semantics. We inject knowledge about lexical-semantic relations into distributional word embeddings by defining subspaces of the distributional vector space in which a lexical relation should hold. Our framework can handle symmetric attract and repel relations (e.g., synonymy and antonymy, respectively), as well as asymmetric relations (e.g., hypernymy and meronomy). In a suite of intrinsic benchmarks, we show that our model outperforms previous approaches on relatedness tasks and on hypernymy classification and detection, while being competitive on word similarity tasks. It also outperforms previous systems on extrinsic classification tasks that benefit from exploiting lexical relational cues. We perform a series of analyses to understand the behaviors of our model. 1 Code available at https://github.com/aishikchakraborty/LexSub .

2021 ◽  
Vol 11 (15) ◽  
pp. 6896
Author(s):  
Padraig Corcoran ◽  
Geraint Palmer ◽  
Laura Arman ◽  
Dawn Knight ◽  
Irena Spasić

Word embeddings are representations of words in a vector space that models semantic relationships between words by means of distance and direction. In this study, we adapted two existing methods, word2vec and fastText, to automatically learn Welsh word embeddings taking into account syntactic and morphological idiosyncrasies of this language. These methods exploit the principles of distributional semantics and, therefore, require a large corpus to be trained on. However, Welsh is a minoritised language, hence significantly less Welsh language data are publicly available in comparison to English. Consequently, assembling a sufficiently large text corpus is not a straightforward endeavour. Nonetheless, we compiled a corpus of 92,963,671 words from 11 sources, which represents the largest corpus of Welsh. The relative complexity of Welsh punctuation made the tokenisation of this corpus relatively challenging as punctuation could not be used for boundary detection. We considered several tokenisation methods including one designed specifically for Welsh. To account for rich inflection, we used a method for learning word embeddings that is based on subwords and, therefore, can more effectively relate different surface forms during the training phase. We conducted both qualitative and quantitative evaluation of the resulting word embeddings, which outperformed previously described word embeddings in Welsh as part of larger study including 157 languages. Our study was the first to focus specifically on Welsh word embeddings.


2007 ◽  
Vol 19 (8) ◽  
pp. 1259-1274 ◽  
Author(s):  
Dietmar Roehm ◽  
Ina Bornkessel-Schlesewsky ◽  
Frank Rösler ◽  
Matthias Schlesewsky

We report a series of event-related potential experiments designed to dissociate the functionally distinct processes involved in the comprehension of highly restricted lexical-semantic relations (antonyms). We sought to differentiate between influences of semantic relatedness (which are independent of the experimental setting) and processes related to predictability (which differ as a function of the experimental environment). To this end, we conducted three ERP studies contrasting the processing of antonym relations (black-white) with that of related (black-yellow) and unrelated (black-nice) word pairs. Whereas the lexical-semantic manipulation was kept constant across experiments, the experimental environment and the task demands varied: Experiment 1 presented the word pairs in a sentence context of the form The opposite of X is Y and used a sensicality judgment. Experiment 2 used a word pair presentation mode and a lexical decision task. Experiment 3 also examined word pairs, but with an antonymy judgment task. All three experiments revealed a graded N400 response (unrelated > related > antonyms), thus supporting the assumption that semantic associations are processed automatically. In addition, the experiments revealed that, in highly constrained task environments, the N400 gradation occurs simultaneously with a P300 effect for the antonym condition, thus leading to the superficial impression of an extremely “reduced” N400 for antonym pairs. Comparisons across experiments and participant groups revealed that the P300 effect is not only a function of stimulus constraints (i.e., sentence context) and experimental task, but that it is also crucially influenced by individual processing strategies used to achieve successful task performance.


2015 ◽  
Vol 24 (02) ◽  
pp. 1540010 ◽  
Author(s):  
Patrick Arnold ◽  
Erhard Rahm

We introduce a novel approach to extract semantic relations (e.g., is-a and part-of relations) from Wikipedia articles. These relations are used to build up a large and up-to-date thesaurus providing background knowledge for tasks such as determining semantic ontology mappings. Our automatic approach uses a comprehensive set of semantic patterns, finite state machines and NLP techniques to extract millions of relations between concepts. An evaluation for different domains shows the high quality and effectiveness of the proposed approach. We also illustrate the value of the newly found relations for improving existing ontology mappings.


Author(s):  
Cyril Belica ◽  
Holger Keibel ◽  
Marc Kupietz ◽  
Rainer Perkuhn

Author(s):  
Jane Morris

Preliminary results from an experimental study of readers’ perceptions of lexical cohesion and lexical semantic relations in text are presented. Readers agree on a common “core” of groups of related words and exhibit individual differences. The majority of relations reported are “non-classical” (not hyponymy, meronymy, synonymy, or antonymy). A group of commonly used relations is presented. These preliminary results indicate potential for improving both relations existing in lexical resources, and methods dependent on lexical cohesion analysis.Les résultatspréliminaires d’une étude expérimentale sur les perceptions des lecteurs au sujet de la cohésion lexicale et des relations lexicales sémantiques de textes sont présentés. Les lecteurs s’entendent sur un « noyau » commun de groupes de mots reliés et présentent des différences individuelles. La majorité des relations indiquées sont « non classiques » (ni hyponymiques, méronymiques, synonymiques ou antonymiques). Un groupe de relations couramment utilisées est présenté. Ces résultats préliminaires indiquent le potentiel nécessaire pour améliorer aussi bien les relations existant dans les ressources lexicales que les méthodes dépendant de l’analyse de la cohésion lexicale. 


Sign in / Sign up

Export Citation Format

Share Document