scholarly journals Active Learning with Subsequence Sampling Strategy for Sequence Labeling Tasks

2011 ◽  
Vol 18 (2) ◽  
pp. 153-173
Author(s):  
Dittaya Wanvarie ◽  
Hiroya Takamura ◽  
Manabu Okumura
Author(s):  
Kohei Shintani ◽  
Tomotaka Sugai ◽  
Keisuke Ishizaki ◽  
Nicolas Knudde ◽  
Ivo Couckuyt ◽  
...  

Abstract The purpose of this paper is to propose a new Set-based concurrent engineering method using Bayesian active learning and to show an application to a multi-disciplinary design optimization problem. In the early stages of the system design process, it is required to set a target value considering the uncertainty of design conditions. If any change of design condition occurs by an external factor in the later development process, the predefined target value cannot be held, and critical rework can be inevitable. To avoid this issue, it is important in the early design stage to solve not only a single target solution but also feasible design solutions that satisfy all multi-disciplinary requirements. In order to discover the feasible region with limited resources, an efficient sampling strategy using CAE simulation is necessary. In this study, a sampling strategy based on Bayesian active learning is proposed to discover a feasible region of multi-disciplinary constraints concurrently. In the proposed method, Gaussian Process models of the multi-disciplinary constraints are trained. Based on posterior distributions of trained Gaussian Processes, new acquisition function by combining two different types of acquisition functions, Probability of Feasibility and Entropy Search is proposed and maximized to generate new sampling points to improve the prediction accuracy of feasible region effectively. To show the effectiveness of the proposed Set-based concurrent engineering method to a multi-disciplinary design problem, a suspension design problem is demonstrated.


Author(s):  
JUN LONG ◽  
JIANPING YIN ◽  
EN ZHU ◽  
WENTAO ZHAO

Active learning is an important approach to reduce data-collection costs for inductive learning problems by sampling only the most informative instances for labeling. We focus here on the sampling criterion for how to select these most informative instances. Three contributions are made in this paper. First, in contrast to the leading sampling strategy of halving the volume of version space, we present the sampling strategy of reducing the volume of version space by more than half with the assumption of target function being chosen from nonuniform distribution over version space. Second, we propose the idea of sampling the instances that would be most possibly misclassified. Third, we develop a sampling method named CBMPMS (Committee Based Most Possible Misclassification Sampling) which samples the instances that have the largest probability to be misclassified by the current classifier. Comparing the proposed CBMPMS method with the existing active learning methods, when the classifiers achieve the same accuracy, the former method will sample fewer times than the latter ones. The experiments show that the proposed method outperforms the traditional sampling methods on most selected datasets.


2017 ◽  
Vol 26 (01) ◽  
pp. 35-46 ◽  
Author(s):  
Allan Fong ◽  
Jessica Howe ◽  
Katharine Adams ◽  
Raj Ratwani

SummaryThe widespread adoption of health information technology (HIT) has led to new patient safety hazards that are often difficult to identify. Patient safety event reports, which are self-reported descriptions of safety hazards, provide one view of potential HIT-related safety events. However, identifying HIT-related reports can be challenging as they are often categorized under other more predominate clinical categories. This challenge of identifying HIT-related reports is exacerbated by the increasing number and complexity of reports which pose challenges to human annotators that must manually review reports. In this paper, we apply active learning techniques to support classification of patient safety event reports as HIT-related. We evaluated different strategies and demonstrated a 30% increase in average precision of a confirmatory sampling strategy over a baseline no active learning approach after 10 learning iterations.


2021 ◽  
Vol 15 (1) ◽  
pp. 99-114
Author(s):  
Ankit Agrawal ◽  
Sarsij Tripathi ◽  
Manu Vardhan

Active learning approach is well known method for labeling huge un-annotated dataset requiring minimal effort and is conducted in a cost efficient way. This approach selects and adds most informative instances to the training set iteratively such that the performance of learner improves with each iteration. Named entity recognition (NER) is a key task for information extraction in which entities present in sequences are labeled with correct class. The traditional query sampling strategies for the active learning only considers the final probability value of the model to select the most informative instances. In this paper, we have proposed a new active learning algorithm based on the hybrid query sampling strategy which also considers the sentence similarity along with the final probability value of the model and compared them with four other well known pool based uncertainty query sampling strategies based active learning approaches for named entity recognition (NER) i.e. least confident sampling, margin of confidence sampling, ratio of confidence sampling and entropy query sampling strategies. The experiments have been performed over three different biomedical NER datasets of different domains and a Spanish language NER dataset. We found that all the above approaches are able to reach to the performance of supervised learning based approach with much less annotated data requirement for training in comparison to that of supervised approach. The proposed active learning algorithm performs well and further reduces the annotation cost in comparison to the other sampling strategies based active algorithm in most of the cases.


2014 ◽  
Vol 2014 ◽  
pp. 1-11
Author(s):  
Ankur Srivastava ◽  
Andrew J. Meade

Wind tunnel tests to measure unsteady cavity flow pressure measurements can be expensive, lengthy, and tedious. In this work, the feasibility of an active machine learning technique to design wind tunnel runs using proxy data is tested. The proposed active learning scheme used scattered data approximation in conjunction with uncertainty sampling (US). We applied the proposed intelligent sampling strategy in characterizing cavity flow classes at subsonic and transonic speeds and demonstrated that the scheme has better classification accuracies, using fewer training points, than a passive Latin Hypercube Sampling (LHS) strategy.


Author(s):  
Jian Wu ◽  
Anqian Guo ◽  
Victor S. Sheng ◽  
Pengpeng Zhao ◽  
Zhiming Cui

Multi-label active learning for image classification has been a popular research topic. It faces several challenges, even though related work has made great progress. Existing studies on multi-label active learning do not pay attention to the cleanness of sample data. In reality, data are easily polluted by external influences that are likely to disturb the exploration of data space and have a negative effect on model training. Previous methods of label correlation mining, which are purely based on observed label distribution, are defective. Apart from neglecting noise influence, they also cannot acquire sufficient relevant information. In fact, they neglect inner relation mapping from example space to label space, which is an implicit way of modeling label relationships. To solve these issues, we develop a novel multi-label active learning with low-rank application (ENMAL) algorithm in this paper. A low-rank model is constructed to quantize noise level, and the example-label pairs that contain less noise are emphasized when sampling. A low-rank mapping matrix is learned to signify the mapping relation of a multi-label domain to capture a more comprehensive and reasonable label correlation. Integrating label correlation with uncertainty and considering sample noise, an efficient sampling strategy is developed. We extend ENMAL with automatic labeling (denoted as AL-ENMAL) to further reduce the annotation workload of active learning. Empirical research demonstrates the efficacy of our approaches.


Sign in / Sign up

Export Citation Format

Share Document