A hierarchical machine learning framework for the analysis of large scale animal movement data

Abstract Background In recent years the field of movement ecology has been revolutionized by our ability to collect high-accuracy, fine scale telemetry data from individual animals and groups. This growth in our data collection capacity has led to the development of statistical techniques that integrate telemetry data with random walk models to infer key parameters of the movement dynamics. While much progress has been made in the use of these models, several challenges remain. Notably robust and scalable methods are required for quantifying parameter uncertainty, coping with intermittent location fixes, and analysing the very large volumes of data being generated. Methods In this work we implement a novel approach to movement modelling through the use of multilevel Gaussian processes. The hierarchical structure of the method enables the inference of continuous latent behavioural states underlying movement processes. For efficient inference on large data sets, we approximate the full likelihood using trajectory segmentation and sample from posterior distributions using gradient-based Markov chain Monte Carlo methods. Results While formally equivalent to many continuous-time movement models, our Gaussian process approach provides flexible, powerful models that can detect multiscale patterns and trends in movement trajectory data. We illustrate a further advantage to our approach in that inference can be performed using highly efficient, GPU-accelerated machine learning libraries. Conclusions Multilevel Gaussian process models offer efficient inference for large-volume movement data sets, along with the fitting of complex flexible models. Applications of this approach include inferring the mean location of a migration route and quantifying significant changes, detecting diurnal activity patterns, or identifying the onset of directed persistent movements.

Download Full-text

SAMGEP: A Novel Method for Prediction of Phenotype Event Times Using the Electronic Health Record

10.1101/2021.03.07.21253096 ◽

2021 ◽

Author(s):

Yuri Ahuja ◽

Chuan Hong ◽

Zongqi Xia ◽

Tianxi Cai

Keyword(s):

Machine Learning ◽

Electronic Health Record ◽

Gaussian Process ◽

Process Models ◽

Supervised Machine Learning ◽

Cumulative Probability ◽

Model Parameters ◽

Health Record ◽

Electronic Health ◽

Event Times

ABSTRACTObjectiveWhile there exist numerous methods to predict binary phenotypes using electronic health record (EHR) data, few exist for prediction of phenotype event times, or equivalently phenotype state progression. Estimating such quantities could enable more powerful use of EHR data for temporal analyses such as survival and disease progression. We propose Semi-supervised Adaptive Markov Gaussian Embedding Process (SAMGEP), a semi-supervised machine learning algorithm to predict phenotype event times using EHR data.MethodsSAMGEP broadly consists of four steps: (i) assemble time-evolving EHR features predictive of the target phenotype event, (ii) optimize weights for combining raw features and feature embeddings into dense patient-timepoint embeddings, (iii) fit supervised and semi-supervised Markov Gaussian Process models to this embedding progression to predict marginal phenotype probabilities at each timepoint, and (iv) take a weighted average of these supervised and semi-supervised predictions. SAMGEP models latent phenotype states as a binary Markov process, conditional on which patient-timepoint embeddings are assumed to follow a Gaussian Process.ResultsSAMGEP achieves significantly improved AUCs and F1 scores relative to common machine learning approaches in both simulations and a real-world task using EHR data to predict multiple sclerosis relapse. It is particularly adept at predicting a patient’s longitudinal phenotype course, which can be used to estimate population-level cumulative probability and count process estimators. Reassuringly, it is robust to a variety of generative model parameters.DiscussionSAMGEP’s event time predictions can be used to estimate accurate phenotype progression curves for use in downstream temporal analyses, such as a survival study for comparative effectiveness research.

Download Full-text

Machine Learning-Enabled Prediction of Transient Injection Map In Automotive Injectors With Uncertainty Quantification

10.1115/icef2021-67888 ◽

2021 ◽

Author(s):

Sudeepta Mondal ◽

Gina M. Magnotti ◽

Bethany Lusch ◽

Romit Maulik ◽

Roberto Torelli

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Void Fraction ◽

Internal Flow ◽

Operating Conditions ◽

Process Models ◽

Data Driven ◽

Design Parameters ◽

Uncertainty Estimates ◽

Gaussian Process Models

Abstract Accurate prediction of injection profiles is a critical aspect of linking injector operation with engine performance and emissions. However, highly resolved injector simulations can take one to two weeks of wall-clock time, which is incompatible with engine design cycles with desired turnaround times of less than a day. Hence, it is important to reduce the time-to-solution of the internal flow simulations by several orders of magnitude to make it compatible with engine simulations. This work demonstrates a data-driven approach for tackling the computational overhead of injector simulations, whereby the transient injection profiles are emulated for a side-oriented, single-hole diesel injector using a Bayesian machine-learning framework. First, an interpretable Bayesian learning strategy was employed to understand the effect of design parameters on the total void fraction field. Then, autoencoders are utilized for efficient dimensionality reduction of the flowfields. Gaussian process models are finally used to predict the spatio-temporal void fraction field at the injector exit for unknown operating conditions. The Gaussian process models produce principled uncertainty estimates associated with the emulated flowfields, which provide the engine designer with valuable information of where the data-driven predictions can be trusted in the design space. The Bayesian flowfield predictions are compared with the corresponding predictions from a deep neural network, which has been transfer-learned from static needle simulations from a previous work by the authors. The emulation framework can predict the void fraction field at the exit of the orifice within a few seconds, thus achieving a speed-up factor of up to 38 million over the traditional simulation-based approach of generating transient injection maps.

Download Full-text

Data from fitting Gaussian process models to various data sets using eight Gaussian process software packages

Data in Brief ◽

10.1016/j.dib.2017.12.012 ◽

2018 ◽

Vol 18 ◽

pp. 684-687 ◽

Cited By ~ 1

Author(s):

Collin B. Erickson ◽

Bruce E. Ankenman ◽

Susan M. Sanchez

Keyword(s):

Gaussian Process ◽

Process Models ◽

Data Sets ◽

Software Packages ◽

Gaussian Process Models

Download Full-text

Learning and Animal Movement

Frontiers in Ecology and Evolution ◽

10.3389/fevo.2021.681704 ◽

2021 ◽

Vol 9 ◽

Author(s):

Mark A. Lewis ◽

William F. Fagan ◽

Marie Auger-Méthé ◽

Jacqueline Frair ◽

John M. Fryxell ◽

...

Keyword(s):

Machine Learning ◽

Animal Movement ◽

Movement Ecology ◽

Field Studies ◽

Movement Data ◽

Emerging Trends ◽

Recent Developments ◽

Ecology Of Learning ◽

Psychological Origin ◽

Task Oriented

Integrating diverse concepts from animal behavior, movement ecology, and machine learning, we develop an overview of the ecology of learning and animal movement. Learning-based movement is clearly relevant to ecological problems, but the subject is rooted firmly in psychology, including a distinct terminology. We contrast this psychological origin of learning with the task-oriented perspective on learning that has emerged from the field of machine learning. We review conceptual frameworks that characterize the role of learning in movement, discuss emerging trends, and summarize recent developments in the analysis of movement data. We also discuss the relative advantages of different modeling approaches for exploring the learning-movement interface. We explore in depth how individual and social modalities of learning can matter to the ecology of animal movement, and highlight how diverse kinds of field studies, ranging from translocation efforts to manipulative experiments, can provide critical insight into the learning process in animal movement.

Download Full-text

A prediction and imputation method for marine animal movement data

PeerJ Computer Science ◽

10.7717/peerj-cs.656 ◽

2021 ◽

Vol 7 ◽

pp. e656

Author(s):

Xinqing Li ◽

Tanguy Tresor Sindihebura ◽

Lei Zhou ◽

Carlos M. Duarte ◽

Daniel P. Costa ◽

...

Keyword(s):

Deep Learning ◽

Trajectory Analysis ◽

Animal Movement ◽

Imputation Method ◽

Marine Animal ◽

Movement Trajectory ◽

Trajectory Prediction ◽

Movement Data ◽

Data Prediction ◽

Deep Learning Model

Data prediction and imputation are important parts of marine animal movement trajectory analysis as they can help researchers understand animal movement patterns and address missing data issues. Compared with traditional methods, deep learning methods can usually provide enhanced pattern extraction capabilities, but their applications in marine data analysis are still limited. In this research, we propose a composite deep learning model to improve the accuracy of marine animal trajectory prediction and imputation. The model extracts patterns from the trajectories with an encoder network and reconstructs the trajectories using these patterns with a decoder network. We use attention mechanisms to highlight certain extracted patterns as well for the decoder. We also feed these patterns into a second decoder for prediction and imputation. Therefore, our approach is a coupling of unsupervised learning with the encoder and the first decoder and supervised learning with the encoder and the second decoder. Experimental results demonstrate that our approach can reduce errors by at least 10% on average comparing with other methods.

Download Full-text

Towards Large-scale Gaussian Process Models for Efficient Bayesian Machine Learning

Proceedings of the 9th International Conference on Data Science, Technology and Applications ◽

10.5220/0009874702750282 ◽

2020 ◽

Cited By ~ 1

Author(s):

Fabian Berns ◽

Christian Beecks

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Large Scale ◽

Process Models ◽

Gaussian Process Models ◽

Bayesian Machine Learning

Download Full-text

Machine Learning Approaches for the Analysis of Non-Metallic Inclusion Data Sets

AISTech2019 Proceedings of the Iron and Steel Technology Conference ◽

10.33313/377/275 ◽

2019 ◽

Author(s):

M. Webler ◽

B. Abdulsalam

Keyword(s):

Machine Learning ◽

Data Sets ◽

Learning Approaches ◽

Metallic Inclusion

Download Full-text

Identification of Continuous-time Nonlinear Systems via Local Gaussian Process Models

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.134.1708 ◽

2014 ◽

Vol 134 (11) ◽

pp. 1708-1715

Author(s):

Tomohiro Hachino ◽

Kazuhiro Matsushita ◽

Hitoshi Takata ◽

Seiji Fukushima ◽

Yasutaka Igarashi

Keyword(s):

Nonlinear Systems ◽

Gaussian Process ◽

Continuous Time ◽

Process Models ◽

Gaussian Process Models

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

Analysis of the Bath Motion in the MM-SQC Dynamics Using Unsupervised Machine Learning Dimensionality Reduction Approaches: Principal Component Analysis

10.26434/chemrxiv.13332530 ◽

2020 ◽

Author(s):

Jiawei Peng ◽

Yu Xie ◽

Deping Hu ◽

Zhenggang Lan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Collective Motion ◽

Principal Component ◽

Component Analysis ◽

Nonadiabatic Dynamics ◽

Trajectory Data ◽

Unsupervised Machine Learning ◽

Physical Knowledge ◽

Vibronic Couplings

The system-plus-bath model is an important tool to understand nonadiabatic dynamics for large molecular systems. The understanding of the collective motion of a huge number of bath modes is essential to reveal their key roles in the overall dynamics. We apply the principal component analysis (PCA) to investigate the bath motion based on the massive data generated from the MM-SQC (symmetrical quasi-classical dynamics method based on the Meyer-Miller mapping Hamiltonian) nonadiabatic dynamics of the excited-state energy transfer dynamics of Frenkel-exciton model. The PCA method clearly clarifies that two types of bath modes, which either display the strong vibronic couplings or have the frequencies close to electronic transition, are very important to the nonadiabatic dynamics. These observations are fully consistent with the physical insights. This conclusion is obtained purely based on the PCA understanding of the trajectory data, without the large involvement of pre-defined physical knowledge. The results show that the PCA approach, one of the simplest unsupervised machine learning methods, is very powerful to analyze the complicated nonadiabatic dynamics in condensed phase involving many degrees of freedom.

Download Full-text