List of West Balkan (South Slavic) Corpora and Language Resources

2013 ◽  
Author(s):  
Nikola Dobric
Keyword(s):  
2010 ◽  
Author(s):  
Kartik Bhavsar ◽  
Reanna Poncheri Harman ◽  
Amber Harris ◽  
Kathryn Nelson ◽  
Eric A. Surface ◽  
...  

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pilar López-Úbeda ◽  
Alexandra Pomares-Quimbaya ◽  
Manuel Carlos Díaz-Galiano ◽  
Stefan Schulz

Abstract Background Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical terms. However, most of them are unambiguous within text limited to a given clinical specialty. This is one rationale besides others to classify clinical text by the clinical specialty to which they belong. Results This paper addresses this limitation by proposing and applying a method that automatically extracts Spanish medical terms classified and weighted per sub-domain, using Spanish MEDLINE titles and abstracts as input. The hypothesis is biomedical NLP tasks benefit from collections of domain terms that are specific to clinical subdomains. We use PubMed queries that generate sub-domain specific corpora from Spanish titles and abstracts, from which token n-grams are collected and metrics of relevance, discriminatory power, and broadness per sub-domain are computed. The generated term set, called Spanish core vocabulary about clinical specialties (SCOVACLIS), was made available to the scientific community and used in a text classification problem obtaining improvements of 6 percentage points in the F-measure compared to the baseline using Multilayer Perceptron, thus demonstrating the hypothesis that a specialized term set improves NLP tasks. Conclusion The creation and validation of SCOVACLIS support the hypothesis that specific term sets reduce the level of ambiguity when compared to a specialty-independent and broad-scope vocabulary.


ZDM ◽  
2021 ◽  
Author(s):  
Sandra Crespo ◽  
Diana Bowen ◽  
Tarik Buli ◽  
Nicole Bannister ◽  
Crystal Kalinec-Craig

2021 ◽  
Vol 8 (1) ◽  
pp. 108-131
Author(s):  
Nyi Nyi Kyaw

AbstractThis article highlights the convenient excuse of (il)legality used by (1) religious majoritarian mobs to justify attacks against places of worship and religious buildings of minorities; and (2) police and local authorities to absolve themselves of the failure to uphold public order and the rule of law, protect religious minorities, and to punish religious minorities. This article traces the emergence of legal violence in the form of anti-mosque vigilante extremism in Myanmar from 2012 onwards and analyzes cases of attacks against: (1) “illegal” mosques; (2) madrasas being used as or reconstructed into mosques; (3) buildings allegedly being constructed as mosques; (4) private homes and public spaces being used as mosques; and cases of (5) closed mosques not being allowed to reopen. The author primarily used Myanmar-language resources as well as interviews to conduct the research.


Author(s):  
Mark Snaith ◽  
Nicholas Conway ◽  
Tessa Beinema ◽  
Dominic De Franco ◽  
Alison Pease ◽  
...  

AbstractLanguage resources for studying doctor–patient interaction are rare, primarily due to the ethical issues related to recording real medical consultations. Rarer still are resources that involve more than one healthcare professional in consultation with a patient, despite many chronic conditions requiring multiple areas of expertise for effective treatment. In this paper, we present the design, construction and output of the Patient Consultation Corpus, a multimodal corpus of simulated consultations between a patient portrayed by an actor, and at least two healthcare professionals with different areas of expertise. As well as the transcribed text from each consultation, the corpus also contains audio and video where for each consultation: the audio consists of individual tracks for each participant, allowing for clear identification of speakers; the video consists of two framings for each participant—upper-body and face—allowing for close analysis of behaviours and gestures. Having presented the design and construction of the corpus, we then go on to briefly describe how the multi-modal nature of the corpus allows it to be analysed from several different perspectives.


Sign in / Sign up

Export Citation Format

Share Document