Semi-supervised learning of Hidden Markov Models for biological sequence analysis

Ioannis A Tamposis; Konstantinos D Tsirigos; Margarita C Theodoropoulou; Panagiota I Kontou; Pantelis G Bagos

doi:10.1093/bioinformatics/bty910

Semi-supervised learning of Hidden Markov Models for biological sequence analysis

Bioinformatics ◽

10.1093/bioinformatics/bty910 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2208-2215 ◽

Cited By ~ 5

Author(s):

Ioannis A Tamposis ◽

Konstantinos D Tsirigos ◽

Margarita C Theodoropoulou ◽

Panagiota I Kontou ◽

Pantelis G Bagos

Keyword(s):

Sequence Analysis ◽

Supervised Learning ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Transmembrane Protein ◽

Training Data ◽

Supplementary Information ◽

Training Procedure ◽

Partially Labeled Data

Abstract Motivation Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications. Results We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Hidden Markov models in biological sequence analysis

IBM Journal of Research and Development ◽

10.1147/rd.453.0449 ◽

2001 ◽

Vol 45 (3.4) ◽

pp. 449-454 ◽

Cited By ~ 43

Author(s):

E. Birney

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis

Download Full-text

High Speed Biological Sequence Analysis With Hidden Markov Models on Reconfigurable Platforms

IEEE Transactions on Information Technology in Biomedicine ◽

10.1109/titb.2007.904632 ◽

2009 ◽

Vol 13 (5) ◽

pp. 740-746 ◽

Cited By ~ 13

Author(s):

T.F. Oliver ◽

B. Schmidt ◽

Y. Jakop ◽

D.L. Maskell

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

High Speed ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis ◽

Reconfigurable Platforms

Download Full-text

Best α-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information

Protein Science ◽

10.1110/ps.04625404 ◽

2004 ◽

Vol 13 (7) ◽

pp. 1908-1917 ◽

Cited By ~ 165

Author(s):

Håkan Viklund ◽

Arne Elofsson

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Transmembrane Protein ◽

Evolutionary Information ◽

Protein Topology ◽

Transmembrane Protein Topology

Download Full-text

Propositionalisation of Profile Hidden Markov Models for Biological Sequence Analysis

AI 2008: Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-89378-3_27 ◽

2008 ◽

pp. 278-288 ◽

Cited By ~ 1

Author(s):

Stefan Mutter ◽

Bernhard Pfahringer ◽

Geoffrey Holmes

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Biological Sequence ◽

Biological Sequence Analysis ◽

Profile Hidden Markov Models

Download Full-text

Supervised learning of hidden Markov models for sequence discrimination

Proceedings of the first annual international conference on Computational molecular biology - RECOMB '97 ◽

10.1145/267521.267551 ◽

1997 ◽

Cited By ~ 2

Author(s):

Hiroshi Mamitsuka

Keyword(s):

Supervised Learning ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov

Download Full-text

Joint semi-supervised learning of Hidden Conditional Random Fields and Hidden Markov Models

Pattern Recognition Letters ◽

10.1016/j.patrec.2013.03.028 ◽

2014 ◽

Vol 37 ◽

pp. 161-171 ◽

Cited By ~ 4

Author(s):

Yann Soullard ◽

Martin Saveski ◽

Thierry Artières

Keyword(s):

Supervised Learning ◽

Hidden Markov Models ◽

Random Fields ◽

Conditional Random Fields ◽

Markov Models ◽

Hidden Markov

Download Full-text

Observable Operator Models for Discrete Stochastic Time Series

Neural Computation ◽

10.1162/089976600300015411 ◽

2000 ◽

Vol 12 (6) ◽

pp. 1371-1398 ◽

Cited By ~ 48

Author(s):

Herbert Jaeger

Keyword(s):

Hidden Markov Models ◽

Stochastic Systems ◽

Markov Models ◽

Learning Algorithm ◽

Hidden Markov ◽

Training Data ◽

Constructive Learning ◽

Proper Subclass ◽

Stochastic Time Series ◽

Dependent Processes

A widely used class of models for stochastic systems is hidden Markov models. Systems that can be modeled by hidden Markov models are a proper subclass of linearly dependent processes, a class of stochastic systems known from mathematical investigations carried out over the past four decades. This article provides a novel, simple characterization of linearly dependent processes, called observable operator models. The mathematical properties of observable operator models lead to a constructive learning algorithm for the identification of linearly dependent processes. The core of the algorithm has a time complexity of O (N + nm3), where N is the size of training data, n is the number of distinguishable outcomes of observations, and m is model state-space dimension.

Download Full-text

Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification

10.21437/interspeech.2016-1360 ◽

2016 ◽

Author(s):

Kantapon Kaewtip ◽

Charles Taylor ◽

Abeer Alwan

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Training Data ◽

Noise Robust

Download Full-text

Fuzzy Profile Hidden Markov Models for Protein Sequence Analysis

2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology ◽

10.1109/cibcb.2005.1594950 ◽

2005 ◽

Author(s):

N.P. Bidargaddi ◽

M. Chetty ◽

J. Kamruzzaman

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

Protein Sequence ◽

Markov Models ◽

Hidden Markov ◽

Protein Sequence Analysis ◽

Profile Hidden Markov Models

Download Full-text

Bayesian Basecalling for DNA Sequence Analysis using Hidden Markov Models

2006 40th Annual Conference on Information Sciences and Systems ◽

10.1109/ciss.2006.286391 ◽

2006 ◽

Author(s):

Kuo-ching Liang ◽

Xiaodong Wang ◽

Dimitris Anastassiou

Keyword(s):

Sequence Analysis ◽

Hidden Markov Models ◽

Dna Sequence ◽

Markov Models ◽

Hidden Markov ◽

Dna Sequence Analysis

Download Full-text