unlabelled data
Recently Published Documents


TOTAL DOCUMENTS

86
(FIVE YEARS 50)

H-INDEX

9
(FIVE YEARS 2)

2022 ◽  
Author(s):  
Mingjian Wen ◽  
Samuel M. Blau ◽  
Xiaowei Xie ◽  
Shyam Dwaraknath ◽  
Kristin A. Persson

Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem—classifying reactions into distinct families—and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.


2021 ◽  
Vol 11 (6) ◽  
pp. 703-711
Author(s):  
Anuja Jana Naik ◽  
Gopalakrishna Madigondanahalli Thimmaiah

Detection of anomalies in crowded videos has become an eminent field of research in the community of computer vision. Variation in scene normalcy obtained by training labeled and unlabelled data is identified as Anomaly by diverse traditional approaches. There is no hardcore isolation among anomalous and non-anomalous events; it can mislead the learning process. This paper plans to develop an efficient model for anomaly detection in crowd videos. The video frames are generated for accomplishing that, and feature extraction is adopted. The feature extraction methods like Histogram of Oriented Gradients (HOG) and Local Gradient Pattern (LGP) are used. Further, the meta-heuristic training-based Self Organized Map (SOM) is used for detection and localization. The training of SOM is enhanced by the Fruit Fly Optimization Algorithm (FOA). Moreover, the flow of objects and their directions are determined for localizing the anomaly objects in the detected videos. Finally, comparing the state-of-the-art techniques shows that the proposed model outperforms most competing models on the standard video surveillance dataset.


2021 ◽  
Author(s):  
Mingjian Wen ◽  
Samuel M. Blau ◽  
Xiaowei Xie ◽  
Shyam Dwaraknath ◽  
Kristin A. Persson

Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem—classifying reactions into distinct families—and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets. In addition to reaction classification, the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.


Author(s):  
Pritom Bhowmik ◽  
◽  
Arabinda Saha Partha ◽  

Machine learning teaches computers to think in a similar way to how humans do. An ML models work by exploring data and identifying patterns with minimal human intervention. A supervised ML model learns by mapping an input to an output based on labeled examples of input-output (X, y) pairs. Moreover, an unsupervised ML model works by discovering patterns and information that was previously undetected from unlabelled data. As an ML project is an extensively iterative process, there is always a need to change the ML code/model and datasets. However, when an ML model achieves 70-75% of accuracy, then the code or algorithm most probably works fine. Nevertheless, in many cases, e.g., medical or spam detection models, 75% accuracy is too low to deploy in production. A medical model used in susceptible tasks such as detecting certain diseases must have an accuracy label of 98-99%. Furthermore, that is a big challenge to achieve. In that scenario, we may have a good working model, so a model-centric approach may not help much achieve the desired accuracy threshold. However, improving the dataset will improve the overall performance of the model. Improving the dataset does not always require bringing more and more data into the dataset. Improving the quality of the data by establishing a reasonable baseline level of performance, labeler consistency, error analysis, and performance auditing will thoroughly improve the model's accuracy. This review paper focuses on the data-centric approach to improve the performance of a production machine learning model.


2021 ◽  
Author(s):  
Mohiuddin Md Abdul Qudar ◽  
Palak Bhatia ◽  
Vijay Mago

AI ◽  
2021 ◽  
Vol 2 (4) ◽  
pp. 497-511
Author(s):  
Theiab Alzahrani ◽  
Baidaa Al-Bander ◽  
Waleed Al-Nuaimy

Makeup can disguise facial features, which results in degradation in the performance of many facial-related analysis systems, including face recognition, facial landmark characterisation, aesthetic quantification and automated age estimation methods. Thus, facial makeup is likely to directly affect several real-life applications such as cosmetology and virtual cosmetics recommendation systems, security and access control, and social interaction. In this work, we conduct a comparative study and design automated facial makeup detection systems leveraging multiple learning schemes from a single unconstrained photograph. We have investigated and studied the efficacy of deep learning models for makeup detection incorporating the use of transfer learning strategy with semi-supervised learning using labelled and unlabelled data. First, during the supervised learning, the VGG16 convolution neural network, pre-trained on a large dataset, is fine-tuned on makeup labelled data. Secondly, two unsupervised learning methods, which are self-learning and convolutional auto-encoder, are trained on unlabelled data and then incorporated with supervised learning during semi-supervised learning. Comprehensive experiments and comparative analysis have been conducted on 2479 labelled images and 446 unlabelled images collected from six challenging makeup datasets. The obtained results reveal that the convolutional auto-encoder merged with supervised learning gives the best makeup detection performance achieving an accuracy of 88.33% and area under ROC curve of 95.15%. The promising results obtained from conducted experiments reveal and reflect the efficiency of combining different learning strategies by harnessing labelled and unlabelled data. It would also be advantageous to the beauty industry to develop such computational intelligence methods.


Author(s):  
Dabbara Praveen

Intelligent video recognition with in-depth learning concept will create a self-paced video analytics program. CCTV cameras are used in all areas where safety is paramount. Manual monitoring seems tedious and time-consuming. Security can be defined by different words in different contexts such as identity theft, violence, explosions etc. Security monitoring is a tedious and time-consuming task. In this project we will analyse video feeds in real time and identify any unusual items such as violence or theft. The concept of in-depth learning simulates the functioning of the human brain in processing data for use in acquisition, speech recognition, decision making, etc. This will depend without human guidance, from unstructured and unlabelled data.


2021 ◽  
Vol 7 ◽  
pp. e650
Author(s):  
Mohammad Ali Humayun ◽  
Hayati Yassin ◽  
Pg Emeroylariffion Abas

The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jifeng Guo ◽  
Wenbo Sun ◽  
Zhiqi Pang ◽  
Yuxiao Fei ◽  
Yu Chen

The current unsupervised domain adaptation person re-identification (re-ID) method aims to solve the domain shift problem and applies prior knowledge learned from labelled data in the source domain to unlabelled data in the target domain for person re-ID. At present, the unsupervised domain adaptation person re-ID method based on pseudolabels has obtained state-of-the-art performance. This method obtains pseudolabels via a clustering algorithm and uses these pseudolabels to optimize a CNN model. Although it achieves optimal performance, the model cannot be further optimized due to the existence of noisy labels in the clustering process. In this paper, we propose a stable median centre clustering (SMCC) for the unsupervised domain adaptation person re-ID method. SMCC adaptively mines credible samples for optimization purposes and reduces the impact of label noise and outliers on training to improve the performance of the resulting model. In particular, we use the intracluster distance confidence measure of the sample and its K-reciprocal nearest neighbour cluster proportion in the clustering process to select credible samples and assign different weights according to the intracluster sample distance confidence of samples to measure the distances between different clusters, thereby making the clustering results more robust. The experiments show that our SMCC method can select credible and stable samples for training and improve performance of the unsupervised domain adaptation model. Our code is available at https://github.com/sunburst792/SMCC-method/tree/master.


Author(s):  
M. Coenen ◽  
T. Schack ◽  
D. Beyer ◽  
C. Heipke ◽  
M. Haist

Abstract. In order to leverage and profit from unlabelled data, semi-supervised frameworks for semantic segmentation based on consistency training have been proven to be powerful tools to significantly improve the performance of purely supervised segmentation learning. However, the consensus principle behind consistency training has at least one drawback, which we identify in this paper: imbalanced label distributions within the data. To overcome the limitations of standard consistency training, we propose a novel semi-supervised framework for semantic segmentation, introducing additional losses based on prior knowledge. Specifically, we propose a lightweight architecture consisting of a shared encoder and a main decoder, which is trained in a supervised manner. An auxiliary decoder is added as additional branch in order to make use of unlabelled data based on consensus training, and we add additional constraints derived from prior information on the class distribution and on auto-encoder regularisation. Experiments performed on our concrete aggregate dataset presented in this paper demonstrate the effectiveness of the proposed approach, outperforming the segmentation results achieved by purely supervised segmentation and standard consistency training.


Sign in / Sign up

Export Citation Format

Share Document