scholarly journals Constructional Change and Distributional Semantics

2021 ◽  
pp. 248-269
2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Kristel Van Goethem ◽  
Muriel Norde

AbstractDutch features several morphemes with “privative” semantics that occur as left-hand members in compounds (e.g., imitatieleer ‘imitation leather’, kunstgras ‘artificial grass’, nepjuwelen ‘fake jewels’). Some of these “fake” morphemes display great categorical flexibility and innovative adjectival uses. Nep, for instance, is synchronically attested as an inflected adjective (e.g., neppe cupcake ‘fake cupcake’). In this paper, we combine an extensive corpus study of eight Dutch “fake” morphemes with statistical methods in distributional semantics and collexeme analysis in order to compare their semantic and morphological properties and to find out which factors are the driving forces behind their exceptional “extravagant” morphological behavior. Our analyses show that debonding and adjectival reanalysis are triggered by an interplay of two factors, i.e., type frequency and semantic coherence, which allow us to range the eight morphemes on a cline from more schematic to more substantive “fake” constructions.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Tian Shen ◽  
R. Harald Baayen

Abstract In structuralist linguistics, compounds are argued not to constitute morphological categories, due to the absence of systematic form-meaning correspondences. This study investigates subsets of compounds for which systematic form-meaning correspondences are present: adjective–noun compounds in Mandarin. We show that there are substantial differences in the productivity of these compounds. One set of productivity measures (the count of types, the count of hapax legomena, and the estimated count of unseen types) reflect compounds’ profitability. By contrast, the category-conditioned degree of productivity is found to correlate with the internal semantic transparency of the words belonging to a morphological category. Greater semantic transparency, gauged by distributional semantics, predicts greater category-conditioned productivity. This dovetails well with the hypothesis that semantic transparency is a prerequisite for a word formation process to be productive.


2021 ◽  
Vol 11 (12) ◽  
pp. 5743
Author(s):  
Pablo Gamallo

This article describes a compositional model based on syntactic dependencies which has been designed to build contextualized word vectors, by following linguistic principles related to the concept of selectional preferences. The compositional strategy proposed in the current work has been evaluated on a syntactically controlled and multilingual dataset, and compared with Transformer BERT-like models, such as Sentence BERT, the state-of-the-art in sentence similarity. For this purpose, we created two new test datasets for Portuguese and Spanish on the basis of that defined for the English language, containing expressions with noun-verb-noun transitive constructions. The results we have obtained show that the linguistic-based compositional approach turns out to be competitive with Transformer models.


Morphology ◽  
2021 ◽  
Author(s):  
Rossella Varvara ◽  
Gabriella Lapesa ◽  
Sebastian Padó

AbstractWe present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, ‘the evaluation’) and nominal infinitives (e.g., das Evaluieren, ‘the evaluating’). Among the many available event nominalization patterns for German, we selected these two because they are both highly productive and challenging from the semantic point of view. Both patterns are known to keep a tight relation with the event denoted by the base verb, but with different nuances. Our study targets a better understanding of the differences in their semantic import.The key notion of our comparison is that of semantic transparency, and we propose a usage-based characterization of the relationship between derived nominals and their bases. Using methods from distributional semantics, we bring to bear two concrete measures of transparency which highlight different nuances: the first one, cosine, detects nominalizations which are semantically similar to their bases; the second one, distributional inclusion, detects nominalizations which are used in a subset of the contexts of the base verb. We find that only the inclusion measure helps in characterizing the difference between the two types of nominalizations, in relation with the traditionally considered variable of relative frequency (Hay, 2001). Finally, the distributional analysis allows us to frame our comparison in the broader coordinates of the inflection vs. derivation cline.


2021 ◽  
pp. 1-11
Author(s):  
V.S. Anoop ◽  
P. Deepak ◽  
S. Asharaf

Online social networks are considered to be one of the most disruptive platforms where people communicate with each other on any topic ranging from funny cat videos to cancer support. The widespread diffusion of mobile platforms such as smart-phones causes the number of messages shared in such platforms to grow heavily, thus more intelligent and scalable algorithms are needed for efficient extraction of useful information. This paper proposes a method for retrieving relevant information from social network messages using a distributional semantics-based framework powered by topic modeling. The proposed framework combines the Latent Dirichlet Allocation and distributional representation of phrases (Phrase2Vec) for effective information retrieval from online social networks. Extensive and systematic experiments on messages collected from Twitter (tweets) show this approach outperforms some state-of-the-art approaches in terms of precision and accuracy and better information retrieval is possible using the proposed method.


2015 ◽  
Vol 41 (1) ◽  
pp. 165-173 ◽  
Author(s):  
Fabio Massimo Zanzotto ◽  
Lorenzo Ferrone ◽  
Marco Baroni

Distributional semantics has been extended to phrases and sentences by means of composition operations. We look at how these operations affect similarity measurements, showing that similarity equations of an important class of composition methods can be decomposed into operations performed on the subparts of the input phrases. This establishes a strong link between these models and convolution kernels.


2010 ◽  
Vol 16 (4) ◽  
pp. 469-491 ◽  
Author(s):  
YVES PEIRSMAN ◽  
DIRK GEERAERTS ◽  
DIRK SPEELMAN

AbstractLanguages are not uniform. Speakers of different language varieties use certain words differently – more or less frequently, or with different meanings. We argue that distributional semantics is the ideal framework for the investigation of such lexical variation. We address two research questions and present our analysis of the lexical variation between Belgian Dutch and Netherlandic Dutch. The first question involves a classic application of distributional models: the automatic retrieval of synonyms. We use corpora of two different language varieties to identify the Netherlandic Dutch synonyms for a set of typically Belgian words. Second, we address the problem of automatically identifying words that are typical of a given lect, either because of their high frequency or because of their divergent meaning. Overall, we show that distributional models are able to identify more lectal markers than traditional keyword methods. Distributional models also have a bias towards a different type of variation. In summary, our results demonstrate how distributional semantics can help research in variational linguistics, with possible future applications in lexicography or terminology extraction.


2016 ◽  
Vol 42 (4) ◽  
pp. 637-660 ◽  
Author(s):  
Germán Kruszewski ◽  
Denis Paperno ◽  
Raffaella Bernardi ◽  
Marco Baroni

Logical negation is a challenge for distributional semantics, because predicates and their negations tend to occur in very similar contexts, and consequently their distributional vectors are very similar. Indeed, it is not even clear what properties a “negated” distributional vector should possess. However, when linguistic negation is considered in its actual discourse usage, it often performs a role that is quite different from straightforward logical negation. If someone states, in the middle of a conversation, that “This is not a dog,” the negation strongly suggests a restricted set of alternative predicates that might hold true of the object being talked about. In particular, other canids and middle-sized mammals are plausible alternatives, birds are less likely, skyscrapers and other large buildings virtually impossible. Conversational negation acts like a graded similarity function, of the sort that distributional semantics might be good at capturing. In this article, we introduce a large data set of alternative plausibility ratings for conversationally negated nominal predicates, and we show that simple similarity in distributional semantic space provides an excellent fit to subject data. On the one hand, this fills a gap in the literature on conversational negation, proposing distributional semantics as the right tool to make explicit predictions about potential alternatives of negated predicates. On the other hand, the results suggest that negation, when addressed from a broader pragmatic perspective, far from being a nuisance, is an ideal application domain for distributional semantic methods.


Sign in / Sign up

Export Citation Format

Share Document