corpus data Latest Research Papers

A Corpus-Based Study on China English in the English Translation of Tao Te Ching

International Journal of Linguistics Literature & Translation ◽

10.32996/ijllt.2022.5.1.8 ◽

2022 ◽

Vol 5 (1) ◽

pp. 59-65

Author(s):

Jiaqi Jiao

Keyword(s):

English Translation ◽

Quantitative Research ◽

The Other ◽

Research Gaps ◽

Tao Te Ching ◽

China English ◽

Corpus Data

This study aims to examine the features of China English in the translation of Chinese classics by comparing two versions of Tao Te Ching based on corpus data. Of the two English versions, one was translated by a well-known Chinese translator—Xu Yuanchong, and the other was translated by an American sinologist—Arthur Waley. This study found that Xu’s translation indicates more features of China English compared with Waley’s translation according to three major aspects. First, Xu’s translation is more concise, employing fewer words to translate Tao Te Ching. Second, Xu’s version features fewer clauses and more clear sentences. Third, the paratactic nature of China English is reflected in Xu’s translation, which has more content words and less cohesiveness. This study reveals the characteristics of China English in translation texts and partly fills the research gaps regarding the quantitative research in this field.

Download Full-text

Testing a computational model of causative overgeneralizations: Child judgment and production data from English, Hebrew, Hindi, Japanese and K’iche’

Open Research Europe ◽

10.12688/openreseurope.13008.2 ◽

2022 ◽

Vol 1 ◽

pp. 1

Author(s):

Ben Ambridge ◽

Laura Doherty ◽

Ramya Maitreyee ◽

Tomoko Tatsumi ◽

Shira Zicherman ◽

...

Keyword(s):

Computational Model ◽

Language Learners ◽

Argument Structure ◽

Type A ◽

Semantic Feature ◽

Production Data ◽

Grammaticality Judgment ◽

Verb Argument Structure ◽

Corpus Data ◽

Judgment Data

How do language learners avoid the production of verb argument structure overgeneralization errors (*The clown laughed the man c.f. The clown made the man laugh), while retaining the ability to apply such generalizations productively when appropriate? This question has long been seen as one that is both particularly central to acquisition research and particularly challenging. Focussing on causative overgeneralization errors of this type, a previous study reported a computational model that learns, on the basis of corpus data and human-derived verb-semantic-feature ratings, to predict adults’ by-verb preferences for less- versus more-transparent causative forms (e.g., * The clown laughed the man vs The clown made the man laugh) across English, Hebrew, Hindi, Japanese and K’iche Mayan. Here, we tested the ability of this model (and an expanded version with multiple hidden layers) to explain binary grammaticality judgment data from children aged 4;0-5;0, and elicited-production data from children aged 4;0-5;0 and 5;6-6;6 (N=48 per language). In general, the model successfully simulated both children’s judgment and production data, with correlations of r=0.5-0.6 and r=0.75-0.85, respectively, and also generalized to unseen verbs. Importantly, learners of all five languages showed some evidence of making the types of overgeneralization errors – in both judgments and production – previously observed in naturalistic studies of English (e.g., *I’m dancing it). Together with previous findings, the present study demonstrates that a simple learning model can explain (a) adults’ continuous judgment data, (b) children’s binary judgment data and (c) children’s production data (with no training of these datasets), and therefore constitutes a plausible mechanistic account of the acquisition of verbs’ argument structure restrictions.

Download Full-text

Generating semantic maps through multidimensional scaling: linguistic applications and theory

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2021-0018 ◽

2022 ◽

Vol 0 (0) ◽

Author(s):

Martijn van der Klis ◽

Jos Tellings

Keyword(s):

Multidimensional Scaling ◽

Past Research ◽

Statistical Technique ◽

Theoretical Frameworks ◽

Parallel Corpus ◽

Semantic Maps ◽

Research Questions ◽

Is Theory ◽

Corpus Data ◽

Mathematical Foundations

Abstract This paper reports on the state-of-the-art in application of multidimensional scaling (MDS) techniques to create semantic maps in linguistic research. MDS refers to a statistical technique that represents objects (lexical items, linguistic contexts, languages, etc.) as points in a space so that close similarity between the objects corresponds to close distances between the corresponding points in the representation. We focus on the use of MDS in combination with parallel corpus data as used in research on cross-linguistic variation. We first introduce the mathematical foundations of MDS and then give an exhaustive overview of past research that employs MDS techniques in combination with parallel corpus data. We propose a set of terminology to succinctly describe the key parameters of a particular MDS application. We then show that this computational methodology is theory-neutral, i.e. it can be employed to answer research questions in a variety of linguistic theoretical frameworks. Finally, we show how this leads to two lines of future developments for MDS research in linguistics.

Download Full-text

Poetic Diction and Standard Language: Predictive Aspect

Vestnik Volgogradskogo gosudarstvennogo universiteta Serija 2 Jazykoznanije ◽

10.15688/jvolsu2.2021.5.15 ◽

2022 ◽

pp. 191-204

Author(s):

Svetlana Shevchenko

Keyword(s):

Dynamic Characteristics ◽

The Other ◽

Poetic Language ◽

Dynamic Changes ◽

Lexical Unit ◽

Evolutionary Changes ◽

Different Types ◽

Poetic Diction ◽

Corpus Data ◽

The One

The article deals with the interdiction convergence on the example of evolutionary changes in lexical semantics of poetic language. The current study contributes to the development of the methodology for studying the language evolutionary processes. The paper describes certain trends of dynamic changes and their specifics; it gives some prediction about the further lexis convergence of different types of functional styles. The findings contribute to the development of lexicography which is going to reflect not only static but also dynamic characteristics of lexical units including stylistic ones. The subjectivity of labeling poetic vocabulary in dictionaries can be partially removed through the analysis of corpus data by comparing frequency indices in different subsections, however this method is not always accurate, moreover, it doesnt effectively trace evolutionary changes. The data from the psycholinguistic experiments can help reveal the dynamics of changes. On the one hand, the results of scaling show the extent of poetry in connotative meanings; on the other hand, the open-response associative experiment allows us to calculate the archaization index of a lexeme through summing up the numerical values of certain selected parameters. The research gives obvious evidence of active archaization of some specific poetic lexemes. The findings also prove that the dynamic changes in stylistic connotation are not synchronous with the changes in the denotative layer of a lexical unit.

Download Full-text

Managing Synchronic Corpus Data with the British National Corpus (BNC)

10.7551/mitpress/12200.003.0043 ◽

2022 ◽

Keyword(s):

Corpus Data ◽

British National Corpus ◽

National Corpus

Download Full-text

Absolute Participial Construction Theory: Controversial Issues

Vestnik Volgogradskogo gosudarstvennogo universiteta Serija 2 Jazykoznanije ◽

10.15688/jvolsu2.2021.5.14 ◽

2022 ◽

pp. 177-190

Author(s):

Yulia Bogoyavlenskaya

Keyword(s):

Scientific Literature ◽

Old French ◽

French Language ◽

Controversial Issues ◽

Complex Sentence ◽

Latin Language ◽

Corpus Data ◽

The Absolute ◽

Linguistic Structures ◽

Syntactic Optionality

The study focuses on current problems associated with the evolution of absolute participial construction and its linguistic status in the French language. It has been established that, borrowed from classical Latin, the absolute construction with an ablative was accepted to the Old French language, presumably in the 13 th –14 th centuries thanks to translations from the Latin language. Widely used in literature, the construction caused disputes among grammarians and only at the beginning of the 20 th century it was recognized as normative. In the second part of the article, a review of the Russian and foreign scientific literature is made, the most controversial issues and the author's own position based on corpus data are formulated. The properties inherent in all types of absolute participial constructions are determined: binarity, semantic duality, expression of predominantly temporary, causal meaning or value of an accompanying action, mobility, syntactic optionality in relation to a matrix sentence, the possibility of functioning only as part of a complex sentence. It was revealed that this construction is an economical formal way of expressing a proposition based on a secondary predicative connection. The features of constructions with present participles, past participles and complex past participles are analyzed. The conclusion is made about the need for a differentiated approach to the analysis of these types of absolute structures. The prospect of further studies of linguistic structures is shown.

Download Full-text

Google Translate Performance in Translating English Passive Voice into Indonesian

PIONEER: Journal of Language and Literature ◽

10.36841/pioneer.v13i2.1292 ◽

2021 ◽

Vol 13 (2) ◽

pp. 271

Author(s):

Nadia Khumairo Ma'shumah ◽

Isra F. Sianipar ◽

Cynthia Yanda Salsabila

Keyword(s):

Machine Translation ◽

Morphological Changes ◽

Online News ◽

Comparative Methods ◽

Passive Voice ◽

Active Voice ◽

Corpus Data ◽

The Way

A scant number of Google Translate users and researchers continue to be skeptical of the current Google Translate's performance as a machine translation tool. As English passive voice translation often brings problems, especially when translated into Indonesian which rich of affixes, this study works to analyze the way Google Translate (MT) translates English passive voice into Indonesian and to investigate whether Google Translate (MT) can do modulation. The data in this research were in the form of clauses and sentences with passive voice taken from corpus data. It included 497 news articles from the online news platform ‘GlobalVoices,' which were processed with AntConc 3.5.8 software. The data in this research were analyzed quantitatively and qualitatively to achieve broad objectives, depth of understanding, and the corroboration. Meanwhile, the comparative methods were used to analyze both source and target texts. Through the cautious process of collecting and analyzing the data, the results showed that (1) GT (via NMT) was able to translate the English passive voice by distinguishing morphological changes in Indonesian passive voice (2) GT was able to modulate English passive voice into Indonesian base verbs and Indonesian active voice.

Download Full-text

3 Exploring Learner Corpus Data for Language Testing and Assessment Purposes: The Case of Verb + Noun Collocations

Perspectives on the L2 Phrasicon ◽

10.21832/9781788924863-004 ◽

2021 ◽

pp. 49-71

Author(s):

Henrik Gyllstad ◽

Per Snoder

Keyword(s):

Language Testing ◽

Learner Corpus ◽

Language Testing And Assessment ◽

Corpus Data

Download Full-text

Since/Because Alternation: Insights from Clause Structures in Nigerian English

English Studies at NBU ◽

10.33919/esnbu.21.2.3 ◽

2021 ◽

Vol 7 (2) ◽

pp. 167-186

Author(s):

Mayowa Akinlotan

Keyword(s):

Cognitive Functions ◽

Measurement Method ◽

Number Of Factors ◽

Text Types ◽

Simple Measurement ◽

Corpus Data ◽

Local Languages ◽

Do So

The choice between since and because allows language users to provide rationality which is part of the cognitive functions of language. Different conditions have been shown to explicate this alternation, with little attention paid to the clausal weight. The present paper shows how expression of rationality is alternated between choosing a since or because, since both have the semantic capacity to do so, in certain contexts. The study uses a simple measurement method to show the extent to which clausal weight relates to this alternation. Relying on corpus data from a well-known variety representing Nigerian English, the present study shows that the choice between since and because is related to a number of factors such as the type of text producing the usages. With 1074 usages showing such interchangeable usages extracted from academic and media text types in written Nigerian English, it is shown that, at least in the variety under examination, the choice of since over because as a rationality expresser is scarce, and that overall pattern can be predicted on the basis of certain contexts including clausal weight and ordering pattern. The scarcity of since as a rationality expresser is perhaps a reflection of interference from the local languages, which do not have semantic equivalents.

Download Full-text

SEMANTIC PROSODY AND PREFERENCE OF “HEALTHY” AND “UNHEALTHY” COLLOCATIONS IN COVID-19 CORPUS

LANGUAGE LITERACY Journal of Linguistics Literature and Language Teaching ◽

10.30743/ll.v5i2.4480 ◽

2021 ◽

Vol 5 (2) ◽

pp. 356-365

Author(s):

Nafilaturif'ah Nafilaturif'ah ◽

Mohamad Irham Poluwa

Keyword(s):

Qualitative Research ◽

Research Method ◽

Sole Source ◽

Animal Disease ◽

Qualitative Research Method ◽

Lexical Meaning ◽

Positive Meaning ◽

Corpus Data ◽

Human Animal ◽

Semantic Prosody

This study is conducted in order to know the collocations of ‘healthy’ and ‘unhealthy’ as well as to explore the lexical meaning of those collocations. Corpus-based approach is employed in this study since the sole source of the data is the corpus data. Qualitative research method is used in order to find the hypotheses from the corpus data which is taken from Sketch Engine. The results demonstrate that the collocations of two node words are dissimilar in the categorization. ‘healthy’ node word indicates that three major semantic preferences are associated with it - human, animal, disease. On the contrary, the semantic preferences of ‘unhealthy’ node word are diverse. Thus, the classification is based on the meaning of the collocations. The collocations with negative meaning occur more frequently than those with positive meaning. It is due to the fact that they use the prefixes –in and –un which create the opposite meaning of the original word. Therefore, the negative semantic prosody is more frequently found the two node words – ‘healthy’ and ‘unhealthy’.

Download Full-text

corpus data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Corpus-Based Study on China English in the English Translation of Tao Te Ching

Testing a computational model of causative overgeneralizations: Child judgment and production data from English, Hebrew, Hindi, Japanese and K’iche’

Generating semantic maps through multidimensional scaling: linguistic applications and theory

Poetic Diction and Standard Language: Predictive Aspect

Managing Synchronic Corpus Data with the British National Corpus (BNC)

Absolute Participial Construction Theory: Controversial Issues

Google Translate Performance in Translating English Passive Voice into Indonesian

3 Exploring Learner Corpus Data for Language Testing and Assessment Purposes: The Case of Verb + Noun Collocations

Since/Because Alternation: Insights from Clause Structures in Nigerian English

SEMANTIC PROSODY AND PREFERENCE OF “HEALTHY” AND “UNHEALTHY” COLLOCATIONS IN COVID-19 CORPUS

Export Citation Format

corpus dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Corpus-Based Study on China English in the English Translation of Tao Te Ching

Testing a computational model of causative overgeneralizations: Child judgment and production data from English, Hebrew, Hindi, Japanese and K’iche’

Generating semantic maps through multidimensional scaling: linguistic applications and theory

Poetic Diction and Standard Language: Predictive Aspect

Managing Synchronic Corpus Data with the British National Corpus (BNC)

Absolute Participial Construction Theory: Controversial Issues

Google Translate Performance in Translating English Passive Voice into Indonesian

3 Exploring Learner Corpus Data for Language Testing and Assessment Purposes: The Case of Verb + Noun Collocations

Since/Because Alternation: Insights from Clause Structures in Nigerian English

SEMANTIC PROSODY AND PREFERENCE OF “HEALTHY” AND “UNHEALTHY” COLLOCATIONS IN COVID-19 CORPUS

corpus data
Recently Published Documents