distributional semantics
Recently Published Documents


TOTAL DOCUMENTS

201
(FIVE YEARS 72)

H-INDEX

12
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Stefan Hartmann ◽  
Tobias Ungerer

The concept of ‘snowclones’ has gained interest in recent research on linguistic creativity and in studies on extravagance and expressiveness in language. However, no clear criteria for identifying snowclones have yet been established, and detailed corpus-based investigations of the phenomenon are still lacking. This paper addresses this research gap in a twofold way: On the one hand, we develop an operational definition of snowclones, arguing that three criteria are decisive: (i) the existence of a lexically fixed source construction; (ii) partial productivity; (iii) “extravagant" formal and/or functional characteristics. On the other hand, we offer an empirical investigation of two snowclones that can be considered ‘prototypical’ on the basis of previous literature, namely [the mother of all X] and [X BE the newY]. We use collostructional analysis and distributional semantics to explore the partial productivity of both patterns’ slot fillers. In sum, we argue that the concept of snowclones, if properly defined, can contribute substantially to our understanding of creative language use, especially regarding the question of how social,cultural, and interpersonal factors influence the choice of more or less salient linguistic constructions.


Author(s):  
Katerina Mandenaki ◽  
Catherine Sotirakou ◽  
Constantinos Mourlas ◽  
Spiros Moschonas

This paper examines the notions of neoliberalism and the financialization and marketisation of public life by using computational tools such as sentence embeddings on a novel corpus of neoliberal articles. More specifically, we experimented with distributional semantics along with several Natural Language Processing (NLP) techniques and machine learning algorithms in order to extract conceptual dictionaries and “seed” words. Our findings show that sentence embeddings reveal repetitive patterns constructed around the given concepts and highlight the mechanical character of an ideology in its function of providing solutions, policies and constructing stereotypes. This work introduces a novel pipeline for computer-assisted research in discourse analysis and ideology.


2021 ◽  
Vol 14 (3) ◽  
pp. 354-391
Author(s):  
Richard Huyghe ◽  
Marine Wauquier

The formation of French agent nouns (ANs) involves a large variety of morphological constructions, and particularly of suffixes. In this study, we focus on the semantic counterpart of agentive suffix diversity and investigate whether the morphological variety of ANs correlates with different agentive subtypes. We adopt a distributional semantics approach and combine manual, computational and statistical analyses applied to French ANs ending in -aire, -ant, -eur, -ien, -ier and -iste. Our methodology allows for a large-scale study of ANs and involves both top-down and bottom-up procedures. We first characterize agentive suffixes with respect to their morphosemantic and distributional properties, outlining their specificities and similarities. Then we automatically cluster ANs into distributionally relevant subsets and examine their properties. Based on quantitative analysis, our study provides a new perspective on agentive suffix rivalry in French that both confirms existing claims and sheds light on previously unseen phenomena.


Author(s):  
Anna Giabelli ◽  
Lorenzo Malandri ◽  
Fabio Mercorio ◽  
Mario Mezzanzanica ◽  
Andrea Seveso

In this paper, we present Skills2Graph, a tool that, starting from a set of users’ professional skills, identifies the most suitable jobs as they emerge from a large corpus of 2.5M+ Online Job Vacancies (OJVs) posted in three different countries (the United Kingdom, France, and Germany). To this aim, we rely both on co-occurrence statistics - computing a count-based measure of skill-relevance named Revealed Comparative Advantage (rca) - and distributional semantics - generating several embeddings on the OJVs corpus and performing an intrinsic evaluation of their quality. Results, evaluated through a user study of 10 labor market experts, show a high P@3 for the recommendations provided by Skills2Graph, and a high nDCG (0.985 and 0.984 in a [0,1] range), that indicates a strong correlation between the experts’ scores and the rankings generated by Skills2Graph.


2021 ◽  
Vol 11 (15) ◽  
pp. 6896
Author(s):  
Padraig Corcoran ◽  
Geraint Palmer ◽  
Laura Arman ◽  
Dawn Knight ◽  
Irena Spasić

Word embeddings are representations of words in a vector space that models semantic relationships between words by means of distance and direction. In this study, we adapted two existing methods, word2vec and fastText, to automatically learn Welsh word embeddings taking into account syntactic and morphological idiosyncrasies of this language. These methods exploit the principles of distributional semantics and, therefore, require a large corpus to be trained on. However, Welsh is a minoritised language, hence significantly less Welsh language data are publicly available in comparison to English. Consequently, assembling a sufficiently large text corpus is not a straightforward endeavour. Nonetheless, we compiled a corpus of 92,963,671 words from 11 sources, which represents the largest corpus of Welsh. The relative complexity of Welsh punctuation made the tokenisation of this corpus relatively challenging as punctuation could not be used for boundary detection. We considered several tokenisation methods including one designed specifically for Welsh. To account for rich inflection, we used a method for learning word embeddings that is based on subwords and, therefore, can more effectively relate different surface forms during the training phase. We conducted both qualitative and quantitative evaluation of the resulting word embeddings, which outperformed previously described word embeddings in Welsh as part of larger study including 157 languages. Our study was the first to focus specifically on Welsh word embeddings.


Morphology ◽  
2021 ◽  
Author(s):  
Rossella Varvara ◽  
Gabriella Lapesa ◽  
Sebastian Padó

AbstractWe present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, ‘the evaluation’) and nominal infinitives (e.g., das Evaluieren, ‘the evaluating’). Among the many available event nominalization patterns for German, we selected these two because they are both highly productive and challenging from the semantic point of view. Both patterns are known to keep a tight relation with the event denoted by the base verb, but with different nuances. Our study targets a better understanding of the differences in their semantic import.The key notion of our comparison is that of semantic transparency, and we propose a usage-based characterization of the relationship between derived nominals and their bases. Using methods from distributional semantics, we bring to bear two concrete measures of transparency which highlight different nuances: the first one, cosine, detects nominalizations which are semantically similar to their bases; the second one, distributional inclusion, detects nominalizations which are used in a subset of the contexts of the base verb. We find that only the inclusion measure helps in characterizing the difference between the two types of nominalizations, in relation with the traditionally considered variable of relative frequency (Hay, 2001). Finally, the distributional analysis allows us to frame our comparison in the broader coordinates of the inflection vs. derivation cline.


Author(s):  
Matthew Purver ◽  
Mehrnoosh Sadrzadeh ◽  
Ruth Kempson ◽  
Gijs Wijnholds ◽  
Julian Hough

AbstractDespite the incremental nature of Dynamic Syntax (DS), the semantic grounding of it remains that of predicate logic, itself grounded in set theory, so is poorly suited to expressing the rampantly context-relative nature of word meaning, and related phenomena such as incremental judgements of similarity needed for the modelling of disambiguation. Here, we show how DS can be assigned a compositional distributional semantics which enables such judgements and makes it possible to incrementally disambiguate language constructs using vector space semantics. Building on a proposal in our previous work, we implement and evaluate our model on real data, showing that it outperforms a commonly used additive baseline. In conclusion, we argue that these results set the ground for an account of the non-determinism of lexical content, in which the nature of word meaning is its dependence on surrounding context for its construal.


2021 ◽  
Author(s):  
Guanghao You ◽  
Moritz M. Daum ◽  
Sabine Stoll

Children acquire their first language while interacting with adults in a highly adaptive manner. While adaptation occurs at many linguistic levels such as syntax and speech complexity, semantic adaptation remains unclear due to the difficulty of efficient meaning extraction. In this study, we examine the adaptation of semantics with a computational approach based on distributional information. We show that adults, in their speech addressed to children, adapt their distributional semantics to that in the speech children produce. By analyzing semantic representations modeled from the Manchester corpus, a large longitudinal acquisition corpus of English, we find striking similarity of semantic development between child and child-directed speech, with a slight time lag in the latter. These findings provide strong evidence for the semantic adaptation in first language acquisition and suggest the important role of child-directed speech in semantic learning.


2021 ◽  
Vol 11 (12) ◽  
pp. 5743
Author(s):  
Pablo Gamallo

This article describes a compositional model based on syntactic dependencies which has been designed to build contextualized word vectors, by following linguistic principles related to the concept of selectional preferences. The compositional strategy proposed in the current work has been evaluated on a syntactically controlled and multilingual dataset, and compared with Transformer BERT-like models, such as Sentence BERT, the state-of-the-art in sentence similarity. For this purpose, we created two new test datasets for Portuguese and Spanish on the basis of that defined for the English language, containing expressions with noun-verb-noun transitive constructions. The results we have obtained show that the linguistic-based compositional approach turns out to be competitive with Transformer models.


Sign in / Sign up

Export Citation Format

Share Document