scholarly journals A text-Image feature mapping algorithm based on transfer learning

Open Physics ◽  
2018 ◽  
Vol 16 (1) ◽  
pp. 1139-1148
Author(s):  
Deng Pan ◽  
Hyunho Yang

Abstract The traditional uniform distribution algorithm does not filter the image data when extracting the approximate features of text-image data under the event, so the similarity between the image data and the text is low, which leads to low accuracy of the algorithm. This paper proposes a text-image feature mapping algorithm based on transfer learning. The existing data is filtered by ‘clustering technology’ to obtain similar data with the target data. The significant text features are calculated through the latent Dirichlet allocation (LDA) model and information gain based on Gibbs sampling. Bag of visual word (BOVW) model and Naive Bayesian method are used to model image data. With the help of the text-image co-occurrence data in the same event, the text feature distribution is mapped to the image feature space, and the feature distribution of image data under the same event is approximated. Experimental results show that the proposed algorithm can obtain the feature distribution of image data under different events, and the average cosine similarity is as high as 92%, the average dispersion is as low as 0.06%, and the accuracy of the algorithm is high.

2021 ◽  
Vol 9 (2) ◽  
pp. 157
Author(s):  
Xi Yu ◽  
Bing Ouyang ◽  
Jose C. Principe

Deep neural networks provide remarkable performances on supervised learning tasks with extensive collections of labeled data. However, creating such large well-annotated data sets requires a considerable amount of resources, time and effort, especially for underwater images data sets such as corals and marine animals. Therefore, the overreliance on labels is one of the main obstacles for widespread applications of deep learning methods. In order to overcome this need for large annotated dataset, this paper proposes a label-efficient deep learning framework for image segmentation using only very sparse point-supervision. Our approach employs a latent Dirichlet allocation (LDA) with spatial coherence on feature space to iteratively generate pseudo labels. The method requires, as an initial condition, a Wide Residual Network (WRN) trained with sparse labels and mutual information constraints. The proposed method is evaluated on the sparsely labeled coral image data set collected from the Pulley Ridge region in the Gulf of Mexico. Experiments show that our method can improve image segmentation performance against sparsely labeled samples and achieves better results compared with other semi-supervised approaches.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Qi Cheng ◽  
Bo He ◽  
Chengkui Zhao ◽  
Hongyuan Bi ◽  
Duojiao Chen ◽  
...  

Abstract Background Microexons are a particular kind of exon of less than 30 nucleotides in length. More than 60% of annotated human microexons were found to have high levels of sequence conservation, suggesting their potential functions. There is thus a need to develop a method for predicting functional microexons. Results Given the lack of a publicly available functional label for microexons, we employed a transfer learning skill called Transfer Component Analysis (TCA) to transfer the knowledge obtained from feature mapping for the prediction of functional microexons. To provide reference knowledge, microindels were chosen because of their similarities to microexons. Then, Support Vector Machine (SVM) was used to train a classification model in the newly built feature space for the functional microindels. With the trained model, functional microexons were predicted. We also built a tool based on this model to predict other functional microexons. We then used this tool to predict a total of 19 functional microexons reported in the literature. This approach successfully predicted 16 out of 19 samples, giving accuracy greater than 80%. Conclusions In this study, we proposed a method for predicting functional microexons and applied it, with the predictive results being largely consistent with records in the literature.


2020 ◽  
Vol 17 (1) ◽  
pp. 319-328
Author(s):  
Ade Muchlis Maulana Anwar ◽  
Prihastuti Harsani ◽  
Aries Maesya

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.


2013 ◽  
Vol 33 (1) ◽  
pp. 76-79
Author(s):  
Jiamin LIU ◽  
Huiyan WANG ◽  
Xiaoli ZHOU ◽  
Fulin LUO

2021 ◽  
Vol 29 (1) ◽  
pp. 19-36
Author(s):  
Çağín Polat ◽  
Onur Karaman ◽  
Ceren Karaman ◽  
Güney Korkmaz ◽  
Mehmet Can Balcı ◽  
...  

BACKGROUND: Chest X-ray imaging has been proved as a powerful diagnostic method to detect and diagnose COVID-19 cases due to its easy accessibility, lower cost and rapid imaging time. OBJECTIVE: This study aims to improve efficacy of screening COVID-19 infected patients using chest X-ray images with the help of a developed deep convolutional neural network model (CNN) entitled nCoV-NET. METHODS: To train and to evaluate the performance of the developed model, three datasets were collected from resources of “ChestX-ray14”, “COVID-19 image data collection”, and “Chest X-ray collection from Indiana University,” respectively. Overall, 299 COVID-19 pneumonia cases and 1,522 non-COVID 19 cases are involved in this study. To overcome the probable bias due to the unbalanced cases in two classes of the datasets, ResNet, DenseNet, and VGG architectures were re-trained in the fine-tuning stage of the process to distinguish COVID-19 classes using a transfer learning method. Lastly, the optimized final nCoV-NET model was applied to the testing dataset to verify the performance of the proposed model. RESULTS: Although the performance parameters of all re-trained architectures were determined close to each other, the final nCOV-NET model optimized by using DenseNet-161 architecture in the transfer learning stage exhibits the highest performance for classification of COVID-19 cases with the accuracy of 97.1 %. The Activation Mapping method was used to create activation maps that highlights the crucial areas of the radiograph to improve causality and intelligibility. CONCLUSION: This study demonstrated that the proposed CNN model called nCoV-NET can be utilized for reliably detecting COVID-19 cases using chest X-ray images to accelerate the triaging and save critical time for disease control as well as assisting the radiologist to validate their initial diagnosis.


2011 ◽  
Vol 2011 ◽  
pp. 1-28 ◽  
Author(s):  
Zhongqiang Chen ◽  
Zhanyan Liang ◽  
Yuan Zhang ◽  
Zhongrong Chen

Grayware encyclopedias collect known species to provide information for incident analysis, however, the lack of categorization and generalization capability renders them ineffective in the development of defense strategies against clustered strains. A grayware categorization framework is therefore proposed here to not only classify grayware according to diverse taxonomic features but also facilitate evaluations on grayware risk to cyberspace. Armed with Support Vector Machines, the framework builds learning models based on training data extracted automatically from grayware encyclopedias and visualizes categorization results with Self-Organizing Maps. The features used in learning models are selected with information gain and the high dimensionality of feature space is reduced by word stemming and stopword removal process. The grayware categorizations on diversified features reveal that grayware typically attempts to improve its penetration rate by resorting to multiple installation mechanisms and reduced code footprints. The framework also shows that grayware evades detection by attacking victims' security applications and resists being removed by enhancing its clotting capability with infected hosts. Our analysis further points out that species in categoriesSpywareandAdwarecontinue to dominate the grayware landscape and impose extremely critical threats to the Internet ecosystem.


Atmosphere ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 828
Author(s):  
Wai Lun Lo ◽  
Henry Shu Hung Chung ◽  
Hong Fu

Estimation of Meteorological visibility from image characteristics is a challenging problem in the research of meteorological parameters estimation. Meteorological visibility can be used to indicate the weather transparency and this indicator is important for transport safety. This paper summarizes the outcomes of the experimental evaluation of a Particle Swarm Optimization (PSO) based transfer learning method for meteorological visibility estimation method. This paper proposes a modified approach of the transfer learning method for visibility estimation by using PSO feature selection. Image data are collected at fixed location with fixed viewing angle. The database images were gone through a pre-processing step of gray-averaging so as to provide information of static landmark objects for automatic extraction of effective regions from images. Effective regions are then extracted from image database and the image features are then extracted from the Neural Network. Subset of Image features are selected based on the Particle Swarming Optimization (PSO) methods to obtain the image feature vectors for each effective sub-region. The image feature vectors are then used to estimate the visibilities of the images by using the Multiple Support Vector Regression (SVR) models. Experimental results show that the proposed method can give an accuracy more than 90% for visibility estimation and the proposed method is effective and robust.


Author(s):  
Sharmad Joshi ◽  
Jessie Ann Owens ◽  
Shlok Shah ◽  
Thilanka Munasinghe
Keyword(s):  

2019 ◽  
Author(s):  
Derek Howard ◽  
Marta M Maslej ◽  
Justin Lee ◽  
Jacob Ritchie ◽  
Geoffrey Woollard ◽  
...  

BACKGROUND Mental illness affects a significant portion of the worldwide population. Online mental health forums can provide a supportive environment for those afflicted and also generate a large amount of data that can be mined to predict mental health states using machine learning methods. OBJECTIVE This study aimed to benchmark multiple methods of text feature representation for social media posts and compare their downstream use with automated machine learning (AutoML) tools. We tested on datasets that contain posts labeled for perceived suicide risk or moderator attention in the context of self-harm. Specifically, we assessed the ability of the methods to prioritize posts that a moderator would identify for immediate response. METHODS We used 1588 labeled posts from the Computational Linguistics and Clinical Psychology (CLPsych) 2017 shared task collected from the Reachout.com forum. Posts were represented using lexicon-based tools, including Valence Aware Dictionary and sEntiment Reasoner, Empath, and Linguistic Inquiry and Word Count, and also using pretrained artificial neural network models, including DeepMoji, Universal Sentence Encoder, and Generative Pretrained Transformer-1 (GPT-1). We used Tree-based Optimization Tool and Auto-Sklearn as AutoML tools to generate classifiers to triage the posts. RESULTS The top-performing system used features derived from the GPT-1 model, which was fine-tuned on over 150,000 unlabeled posts from Reachout.com. Our top system had a macroaveraged F1 score of 0.572, providing a new state-of-the-art result on the CLPsych 2017 task. This was achieved without additional information from metadata or preceding posts. Error analyses revealed that this top system often misses expressions of hopelessness. In addition, we have presented visualizations that aid in the understanding of the learned classifiers. CONCLUSIONS In this study, we found that transfer learning is an effective strategy for predicting risk with relatively little labeled data and noted that fine-tuning of pretrained language models provides further gains when large amounts of unlabeled text are available.


2020 ◽  
Vol 49 (3) ◽  
pp. 421-437
Author(s):  
Genggeng Liu ◽  
Lin Xie ◽  
Chi-Hua Chen

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.


Sign in / Sign up

Export Citation Format

Share Document