Nonuniform language in technical writing: Detection and correction

2020 ◽  
pp. 1-22
Author(s):  
Weibo Wang ◽  
Aminul Islam ◽  
Abidalrahman Moh’d ◽  
Axel J. Soto ◽  
Evangelos E. Milios

Abstract Technical writing in professional environments, such as user manual authoring, requires the use of uniform language. Nonuniform language refers to sentences in a technical document that are intended to have the same meaning within a similar context, but use different words or writing style. Addressing this nonuniformity problem requires the performance of two tasks. The first task, which we named nonuniform language detection (NLD), is detecting such sentences. We propose an NLD method that utilizes different similarity algorithms at lexical, syntactic, semantic and pragmatic levels. Different features are extracted and integrated by applying a machine learning classification method. The second task, which we named nonuniform language correction (NLC), is deciding which sentence among the detected ones is more appropriate for that context. To address this problem, we propose an NLC method that combines contraction removal, near-synonym choice, and text readability comparison. We tested our methods using smartphone user manuals. We finally compared our methods against state-of-the-art methods in paraphrase detection (for NLD) and against expert annotators (for both NLD and NLC). The experiments demonstrate that the proposed methods achieve performance that matches expert annotators.

2018 ◽  
Vol 7 (4) ◽  
pp. 603-622 ◽  
Author(s):  
Leonardo Gutiérrez-Gómez ◽  
Jean-Charles Delvenne

Abstract Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when it is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field of a scientific collaboration network. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as the number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of the nodes situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of ‘fingerprints’ to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task.


2017 ◽  
Vol 108 (1) ◽  
pp. 307-318 ◽  
Author(s):  
Eleftherios Avramidis

AbstractA deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics.Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.


Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 947
Author(s):  
Rongqun Peng ◽  
Yingxi Lou ◽  
Michel Kadoch ◽  
Mohamed Cheriet

With the continuous development of tourism, the integration of the Internet of Things (IoT) into tourism projects is considered a very promising technology. Smart tourism aims to use the IoT to maximize information communication; that is, the IoT technology will become an important element to meet the needs of a new generation of tourists. Therefore, in this study, we propose a human-guided machine learning classification method based on tourist selection behavior. This classification method can effectively help tourists make a decision in choosing a certain tourist destination. The results obtained from the cross-validation experiments and performance evaluation prove the effectiveness of this method.


2020 ◽  
Vol 6 (6) ◽  
pp. 39 ◽  
Author(s):  
Adel S. Assiri ◽  
Saima Nazir ◽  
Sergio A. Velastin

Breast cancer is the most common cause of death for women worldwide. Thus, the ability of artificial intelligence systems to detect possible breast cancer is very important. In this paper, an ensemble classification mechanism is proposed based on a majority voting mechanism. First, the performance of different state-of-the-art machine learning classification algorithms were evaluated for the Wisconsin Breast Cancer Dataset (WBCD). The three best classifiers were then selected based on their F3 score. F3 score is used to emphasize the importance of false negatives (recall) in breast cancer classification. Then, these three classifiers, simple logistic regression learning, support vector machine learning with stochastic gradient descent optimization and multilayer perceptron network, are used for ensemble classification using a voting mechanism. We also evaluated the performance of hard and soft voting mechanism. For hard voting, majority-based voting mechanism was used and for soft voting we used average of probabilities, product of probabilities, maximum of probabilities and minimum of probabilities-based voting methods. The hard voting (majority-based voting) mechanism shows better performance with 99.42%, as compared to the state-of-the-art algorithm for WBCD.


2021 ◽  
pp. 27-38
Author(s):  
Alessio Bernardo ◽  
Emanuele Della Valle

Data continuously gathered monitoring the spreading of the COVID-19 pandemic form an unbounded flow of data. Accurately forecasting if the infections will increase or decrease has a high impact, but it is challenging because the pandemic spreads and contracts periodically. Technically, the flow of data is said to be imbalanced and subject to concept drifts because signs of decrements are the minority class during the spreading periods, while they become the majority class in the contraction periods and the other way round. In this paper, we propose a case study applying the Continuous Synthetic Minority Oversampling Technique (C-SMOTE), a novel meta-strategy to pipeline with Streaming Machine Learning (SML) classification algorithms, to forecast the COVID-19 pandemic trend. Benchmarking SML pipelinesthat use C-SMOTE against state-of-the-art methods on a COVID-19 dataset, we bring statistical evidence that models learned using C-SMOTE are better.


2021 ◽  
Vol 23 (2) ◽  
pp. 395-404
Author(s):  
Satish Kumar ◽  
Paras Kumar ◽  
Girish Kumar

In the broad framework of degradation assessment of bearing, the final objectives of bearing condition monitoring is to evaluate different degradation states and to estimate the quantitative analysis of degree of performance degradation. Machine learning classification matrices have been used to train models based on health data and real time feedback. Diagnostic and prognostic models based on data driven perspective have been used in the prior research work to improve the bearing degradation assessment. Industry 4.0 has required the research in advanced diagnostic and prognostic algorithm to enhance the accuracy of models. A classification model which is based on machine learning classification matrix to assess the degradation of bearing is proposed to improve the accuracy of classification model. Review work demonstrates the comparisons among the available state-of-the-art methods. In the end, unexplored research technical challenges and niches of opportunity for future researchers are discussed.


Sign in / Sign up

Export Citation Format

Share Document