distributional semantics Latest Research Papers

Attack of the snowclones: A corpus-based analysis of extravagant formulaic patterns

10.31234/osf.io/y6a8g ◽

2021 ◽

Author(s):

Stefan Hartmann ◽

Tobias Ungerer

Keyword(s):

Language Use ◽

Empirical Investigation ◽

Operational Definition ◽

Previous Literature ◽

The Other ◽

Distributional Semantics ◽

Interpersonal Factors ◽

Research Gap ◽

The One ◽

Definition Of

The concept of ‘snowclones’ has gained interest in recent research on linguistic creativity and in studies on extravagance and expressiveness in language. However, no clear criteria for identifying snowclones have yet been established, and detailed corpus-based investigations of the phenomenon are still lacking. This paper addresses this research gap in a twofold way: On the one hand, we develop an operational definition of snowclones, arguing that three criteria are decisive: (i) the existence of a lexically fixed source construction; (ii) partial productivity; (iii) “extravagant" formal and/or functional characteristics. On the other hand, we offer an empirical investigation of two snowclones that can be considered ‘prototypical’ on the basis of previous literature, namely [the mother of all X] and [X BE the newY]. We use collostructional analysis and distributional semantics to explore the partial productivity of both patterns’ slot fillers. In sum, we argue that the concept of snowclones, if properly defined, can contribute substantially to our understanding of creative language use, especially regarding the question of how social,cultural, and interpersonal factors influence the choice of more or less salient linguistic constructions.

Download Full-text

Neural Embeddings for Text Analysis: A Case Study in Neoliberal Discourse

Journal of Education Society and Behavioural Science ◽

10.9734/jesbs/2021/v34i1130379 ◽

2021 ◽

pp. 196-204

Author(s):

Katerina Mandenaki ◽

Catherine Sotirakou ◽

Constantinos Mourlas ◽

Spiros Moschonas

Keyword(s):

Machine Learning ◽

Language Processing ◽

Text Analysis ◽

Public Life ◽

Machine Learning Algorithms ◽

Computer Assisted ◽

Distributional Semantics ◽

Computational Tools ◽

The Given

This paper examines the notions of neoliberalism and the financialization and marketisation of public life by using computational tools such as sentence embeddings on a novel corpus of neoliberal articles. More specifically, we experimented with distributional semantics along with several Natural Language Processing (NLP) techniques and machine learning algorithms in order to extract conceptual dictionaries and “seed” words. Our findings show that sentence embeddings reveal repetitive patterns constructed around the given concepts and highlight the mechanical character of an ideology in its function of providing solutions, policies and constructing stereotypes. This work introduces a novel pipeline for computer-assisted research in discourse analysis and ideology.

Download Full-text

Distributional semantics insights on agentive suffix rivalry in French

WORD Structure ◽

10.3366/word.2021.0194 ◽

2021 ◽

Vol 14 (3) ◽

pp. 354-391

Author(s):

Richard Huyghe ◽

Marine Wauquier

Keyword(s):

Quantitative Analysis ◽

Large Scale ◽

Statistical Analyses ◽

Distributional Semantics ◽

Top Down ◽

Bottom Up ◽

Distributional Properties ◽

Large Scale Study ◽

New Perspective

The formation of French agent nouns (ANs) involves a large variety of morphological constructions, and particularly of suffixes. In this study, we focus on the semantic counterpart of agentive suffix diversity and investigate whether the morphological variety of ANs correlates with different agentive subtypes. We adopt a distributional semantics approach and combine manual, computational and statistical analyses applied to French ANs ending in -aire, -ant, -eur, -ien, -ier and -iste. Our methodology allows for a large-scale study of ANs and involves both top-down and bottom-up procedures. We first characterize agentive suffixes with respect to their morphosemantic and distributional properties, outlining their specificities and similarities. Then we automatically cluster ANs into distributionally relevant subsets and examine their properties. Based on quantitative analysis, our study provides a new perspective on agentive suffix rivalry in French that both confirms existing claims and sheds light on previously unseen phenomena.

Download Full-text

Constructional Change and Distributional Semantics

10.1163/9789004446793_011 ◽

2021 ◽

pp. 248-269

Keyword(s):

Distributional Semantics

Download Full-text

Skills2Graph: Processing million Job Ads to face the Job Skill Mismatch Problem

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/708 ◽

2021 ◽

Author(s):

Anna Giabelli ◽

Lorenzo Malandri ◽

Fabio Mercorio ◽

Mario Mezzanzanica ◽

Andrea Seveso

Keyword(s):

United Kingdom ◽

Labor Market ◽

Comparative Advantage ◽

Strong Correlation ◽

User Study ◽

Distributional Semantics ◽

Skill Mismatch ◽

The United Kingdom ◽

Job Ads ◽

Large Corpus

In this paper, we present Skills2Graph, a tool that, starting from a set of users’ professional skills, identifies the most suitable jobs as they emerge from a large corpus of 2.5M+ Online Job Vacancies (OJVs) posted in three different countries (the United Kingdom, France, and Germany). To this aim, we rely both on co-occurrence statistics - computing a count-based measure of skill-relevance named Revealed Comparative Advantage (rca) - and distributional semantics - generating several embeddings on the OJVs corpus and performing an intrinsic evaluation of their quality. Results, evaluated through a user study of 10 labor market experts, show a high P@3 for the recommendations provided by Skills2Graph, and a high nDCG (0.985 and 0.984 in a [0,1] range), that indicates a strong correlation between the experts’ scores and the rankings generated by Skills2Graph.

Download Full-text

Creating Welsh Language Word Embeddings

Applied Sciences ◽

10.3390/app11156896 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6896

Author(s):

Padraig Corcoran ◽

Geraint Palmer ◽

Laura Arman ◽

Dawn Knight ◽

Irena Spasić

Keyword(s):

Vector Space ◽

Quantitative Evaluation ◽

Training Phase ◽

Distributional Semantics ◽

Word Embeddings ◽

Semantic Relationships ◽

Qualitative And Quantitative ◽

Welsh Language ◽

Language Data ◽

Large Corpus

Word embeddings are representations of words in a vector space that models semantic relationships between words by means of distance and direction. In this study, we adapted two existing methods, word2vec and fastText, to automatically learn Welsh word embeddings taking into account syntactic and morphological idiosyncrasies of this language. These methods exploit the principles of distributional semantics and, therefore, require a large corpus to be trained on. However, Welsh is a minoritised language, hence significantly less Welsh language data are publicly available in comparison to English. Consequently, assembling a sufficiently large text corpus is not a straightforward endeavour. Nonetheless, we compiled a corpus of 92,963,671 words from 11 sources, which represents the largest corpus of Welsh. The relative complexity of Welsh punctuation made the tokenisation of this corpus relatively challenging as punctuation could not be used for boundary detection. We considered several tokenisation methods including one designed specifically for Welsh. To account for rich inflection, we used a method for learning word embeddings that is based on subwords and, therefore, can more effectively relate different surface forms during the training phase. We conducted both qualitative and quantitative evaluation of the resulting word embeddings, which outperformed previously described word embeddings in Welsh as part of larger study including 157 languages. Our study was the first to focus specifically on Welsh word embeddings.

Download Full-text

Grounding semantic transparency in context

Morphology ◽

10.1007/s11525-021-09382-w ◽

2021 ◽

Author(s):

Rossella Varvara ◽

Gabriella Lapesa ◽

Sebastian Padó

Keyword(s):

Large Scale ◽

Point Of View ◽

Distributional Semantics ◽

Semantic Transparency ◽

Inclusion Measure ◽

The Difference ◽

Semantic Point ◽

The Many ◽

The Relationship

AbstractWe present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, ‘the evaluation’) and nominal infinitives (e.g., das Evaluieren, ‘the evaluating’). Among the many available event nominalization patterns for German, we selected these two because they are both highly productive and challenging from the semantic point of view. Both patterns are known to keep a tight relation with the event denoted by the base verb, but with different nuances. Our study targets a better understanding of the differences in their semantic import.The key notion of our comparison is that of semantic transparency, and we propose a usage-based characterization of the relationship between derived nominals and their bases. Using methods from distributional semantics, we bring to bear two concrete measures of transparency which highlight different nuances: the first one, cosine, detects nominalizations which are semantically similar to their bases; the second one, distributional inclusion, detects nominalizations which are used in a subset of the contexts of the base verb. We find that only the inclusion measure helps in characterizing the difference between the two types of nominalizations, in relation with the traditionally considered variable of relative frequency (Hay, 2001). Finally, the distributional analysis allows us to frame our comparison in the broader coordinates of the inflection vs. derivation cline.

Download Full-text

Incremental Composition in Distributional Semantics

Journal of Logic Language and Information ◽

10.1007/s10849-021-09337-8 ◽

2021 ◽

Author(s):

Matthew Purver ◽

Mehrnoosh Sadrzadeh ◽

Ruth Kempson ◽

Gijs Wijnholds ◽

Julian Hough

Keyword(s):

Vector Space ◽

Set Theory ◽

Predicate Logic ◽

Word Meaning ◽

Real Data ◽

Distributional Semantics ◽

Language Constructs ◽

Dynamic Syntax ◽

Lexical Content ◽

Compositional Distributional Semantics

AbstractDespite the incremental nature of Dynamic Syntax (DS), the semantic grounding of it remains that of predicate logic, itself grounded in set theory, so is poorly suited to expressing the rampantly context-relative nature of word meaning, and related phenomena such as incremental judgements of similarity needed for the modelling of disambiguation. Here, we show how DS can be assigned a compositional distributional semantics which enables such judgements and makes it possible to incrementally disambiguate language constructs using vector space semantics. Building on a proposal in our previous work, we implement and evaluate our model on real data, showing that it outperforms a commonly used additive baseline. In conclusion, we argue that these results set the ground for an account of the non-determinism of lexical content, in which the nature of word meaning is its dependence on surrounding context for its construal.

Download Full-text

Adults adapt to child speech in semantic use

10.31234/osf.io/78epj ◽

2021 ◽

Author(s):

Guanghao You ◽

Moritz M. Daum ◽

Sabine Stoll

Keyword(s):

First Language ◽

Time Lag ◽

Striking Similarity ◽

First Language Acquisition ◽

Distributional Semantics ◽

Semantic Learning ◽

Distributional Information ◽

Semantic Adaptation ◽

Child Speech

Children acquire their first language while interacting with adults in a highly adaptive manner. While adaptation occurs at many linguistic levels such as syntax and speech complexity, semantic adaptation remains unclear due to the difficulty of efficient meaning extraction. In this study, we examine the adaptation of semantics with a computational approach based on distributional information. We show that adults, in their speech addressed to children, adapt their distributional semantics to that in the speech children produce. By analyzing semantic representations modeled from the Manchester corpus, a large longitudinal acquisition corpus of English, we find striking similarity of semantic development between child and child-directed speech, with a slight time lag in the latter. These findings provide strong evidence for the semantic adaptation in first language acquisition and suggest the important role of child-directed speech in semantic learning.

Download Full-text

Compositional Distributional Semantics with Syntactic Dependencies and Selectional Preferences

Applied Sciences ◽

10.3390/app11125743 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5743

Author(s):

Pablo Gamallo

Keyword(s):

English Language ◽

State Of The Art ◽

Current Work ◽

Distributional Semantics ◽

Compositional Model ◽

Compositional Approach ◽

Sentence Similarity ◽

Selectional Preferences ◽

Syntactic Dependencies ◽

Compositional Distributional Semantics

This article describes a compositional model based on syntactic dependencies which has been designed to build contextualized word vectors, by following linguistic principles related to the concept of selectional preferences. The compositional strategy proposed in the current work has been evaluated on a syntactically controlled and multilingual dataset, and compared with Transformer BERT-like models, such as Sentence BERT, the state-of-the-art in sentence similarity. For this purpose, we created two new test datasets for Portuguese and Spanish on the basis of that defined for the English language, containing expressions with noun-verb-noun transitive constructions. The results we have obtained show that the linguistic-based compositional approach turns out to be competitive with Transformer models.

Download Full-text

distributional semantics
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Attack of the snowclones: A corpus-based analysis of extravagant formulaic patterns

Neural Embeddings for Text Analysis: A Case Study in Neoliberal Discourse

Distributional semantics insights on agentive suffix rivalry in French

Constructional Change and Distributional Semantics

Skills2Graph: Processing million Job Ads to face the Job Skill Mismatch Problem

Creating Welsh Language Word Embeddings

Grounding semantic transparency in context

Incremental Composition in Distributional Semantics

Adults adapt to child speech in semantic use

Compositional Distributional Semantics with Syntactic Dependencies and Selectional Preferences

Export Citation Format

distributional semanticsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Attack of the snowclones: A corpus-based analysis of extravagant formulaic patterns

Neural Embeddings for Text Analysis: A Case Study in Neoliberal Discourse

Distributional semantics insights on agentive suffix rivalry in French

Constructional Change and Distributional Semantics

Skills2Graph: Processing million Job Ads to face the Job Skill Mismatch Problem

Creating Welsh Language Word Embeddings

Grounding semantic transparency in context

Incremental Composition in Distributional Semantics

Adults adapt to child speech in semantic use

Compositional Distributional Semantics with Syntactic Dependencies and Selectional Preferences

distributional semantics
Recently Published Documents