scholarly journals A hierarchical machine learning framework for the analysis of large scale animal movement data

2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Colin J. Torney ◽  
Juan M. Morales ◽  
Dirk Husmeier

Abstract Background In recent years the field of movement ecology has been revolutionized by our ability to collect high-accuracy, fine scale telemetry data from individual animals and groups. This growth in our data collection capacity has led to the development of statistical techniques that integrate telemetry data with random walk models to infer key parameters of the movement dynamics. While much progress has been made in the use of these models, several challenges remain. Notably robust and scalable methods are required for quantifying parameter uncertainty, coping with intermittent location fixes, and analysing the very large volumes of data being generated. Methods In this work we implement a novel approach to movement modelling through the use of multilevel Gaussian processes. The hierarchical structure of the method enables the inference of continuous latent behavioural states underlying movement processes. For efficient inference on large data sets, we approximate the full likelihood using trajectory segmentation and sample from posterior distributions using gradient-based Markov chain Monte Carlo methods. Results While formally equivalent to many continuous-time movement models, our Gaussian process approach provides flexible, powerful models that can detect multiscale patterns and trends in movement trajectory data. We illustrate a further advantage to our approach in that inference can be performed using highly efficient, GPU-accelerated machine learning libraries. Conclusions Multilevel Gaussian process models offer efficient inference for large-volume movement data sets, along with the fitting of complex flexible models. Applications of this approach include inferring the mean location of a migration route and quantifying significant changes, detecting diurnal activity patterns, or identifying the onset of directed persistent movements.

2021 ◽  
Author(s):  
Yuri Ahuja ◽  
Chuan Hong ◽  
Zongqi Xia ◽  
Tianxi Cai

ABSTRACTObjectiveWhile there exist numerous methods to predict binary phenotypes using electronic health record (EHR) data, few exist for prediction of phenotype event times, or equivalently phenotype state progression. Estimating such quantities could enable more powerful use of EHR data for temporal analyses such as survival and disease progression. We propose Semi-supervised Adaptive Markov Gaussian Embedding Process (SAMGEP), a semi-supervised machine learning algorithm to predict phenotype event times using EHR data.MethodsSAMGEP broadly consists of four steps: (i) assemble time-evolving EHR features predictive of the target phenotype event, (ii) optimize weights for combining raw features and feature embeddings into dense patient-timepoint embeddings, (iii) fit supervised and semi-supervised Markov Gaussian Process models to this embedding progression to predict marginal phenotype probabilities at each timepoint, and (iv) take a weighted average of these supervised and semi-supervised predictions. SAMGEP models latent phenotype states as a binary Markov process, conditional on which patient-timepoint embeddings are assumed to follow a Gaussian Process.ResultsSAMGEP achieves significantly improved AUCs and F1 scores relative to common machine learning approaches in both simulations and a real-world task using EHR data to predict multiple sclerosis relapse. It is particularly adept at predicting a patient’s longitudinal phenotype course, which can be used to estimate population-level cumulative probability and count process estimators. Reassuringly, it is robust to a variety of generative model parameters.DiscussionSAMGEP’s event time predictions can be used to estimate accurate phenotype progression curves for use in downstream temporal analyses, such as a survival study for comparative effectiveness research.


2021 ◽  
Author(s):  
Sudeepta Mondal ◽  
Gina M. Magnotti ◽  
Bethany Lusch ◽  
Romit Maulik ◽  
Roberto Torelli

Abstract Accurate prediction of injection profiles is a critical aspect of linking injector operation with engine performance and emissions. However, highly resolved injector simulations can take one to two weeks of wall-clock time, which is incompatible with engine design cycles with desired turnaround times of less than a day. Hence, it is important to reduce the time-to-solution of the internal flow simulations by several orders of magnitude to make it compatible with engine simulations. This work demonstrates a data-driven approach for tackling the computational overhead of injector simulations, whereby the transient injection profiles are emulated for a side-oriented, single-hole diesel injector using a Bayesian machine-learning framework. First, an interpretable Bayesian learning strategy was employed to understand the effect of design parameters on the total void fraction field. Then, autoencoders are utilized for efficient dimensionality reduction of the flowfields. Gaussian process models are finally used to predict the spatio-temporal void fraction field at the injector exit for unknown operating conditions. The Gaussian process models produce principled uncertainty estimates associated with the emulated flowfields, which provide the engine designer with valuable information of where the data-driven predictions can be trusted in the design space. The Bayesian flowfield predictions are compared with the corresponding predictions from a deep neural network, which has been transfer-learned from static needle simulations from a previous work by the authors. The emulation framework can predict the void fraction field at the exit of the orifice within a few seconds, thus achieving a speed-up factor of up to 38 million over the traditional simulation-based approach of generating transient injection maps.


2021 ◽  
Vol 9 ◽  
Author(s):  
Mark A. Lewis ◽  
William F. Fagan ◽  
Marie Auger-Méthé ◽  
Jacqueline Frair ◽  
John M. Fryxell ◽  
...  

Integrating diverse concepts from animal behavior, movement ecology, and machine learning, we develop an overview of the ecology of learning and animal movement. Learning-based movement is clearly relevant to ecological problems, but the subject is rooted firmly in psychology, including a distinct terminology. We contrast this psychological origin of learning with the task-oriented perspective on learning that has emerged from the field of machine learning. We review conceptual frameworks that characterize the role of learning in movement, discuss emerging trends, and summarize recent developments in the analysis of movement data. We also discuss the relative advantages of different modeling approaches for exploring the learning-movement interface. We explore in depth how individual and social modalities of learning can matter to the ecology of animal movement, and highlight how diverse kinds of field studies, ranging from translocation efforts to manipulative experiments, can provide critical insight into the learning process in animal movement.


2021 ◽  
Vol 7 ◽  
pp. e656
Author(s):  
Xinqing Li ◽  
Tanguy Tresor Sindihebura ◽  
Lei Zhou ◽  
Carlos M. Duarte ◽  
Daniel P. Costa ◽  
...  

Data prediction and imputation are important parts of marine animal movement trajectory analysis as they can help researchers understand animal movement patterns and address missing data issues. Compared with traditional methods, deep learning methods can usually provide enhanced pattern extraction capabilities, but their applications in marine data analysis are still limited. In this research, we propose a composite deep learning model to improve the accuracy of marine animal trajectory prediction and imputation. The model extracts patterns from the trajectories with an encoder network and reconstructs the trajectories using these patterns with a decoder network. We use attention mechanisms to highlight certain extracted patterns as well for the decoder. We also feed these patterns into a second decoder for prediction and imputation. Therefore, our approach is a coupling of unsupervised learning with the encoder and the first decoder and supervised learning with the encoder and the second decoder. Experimental results demonstrate that our approach can reduce errors by at least 10% on average comparing with other methods.


2014 ◽  
Vol 134 (11) ◽  
pp. 1708-1715
Author(s):  
Tomohiro Hachino ◽  
Kazuhiro Matsushita ◽  
Hitoshi Takata ◽  
Seiji Fukushima ◽  
Yasutaka Igarashi

2020 ◽  
Author(s):  
Marc Philipp Bahlke ◽  
Natnael Mogos ◽  
Jonny Proppe ◽  
Carmen Herrmann

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.


2020 ◽  
Author(s):  
Jiawei Peng ◽  
Yu Xie ◽  
Deping Hu ◽  
Zhenggang Lan

The system-plus-bath model is an important tool to understand nonadiabatic dynamics for large molecular systems. The understanding of the collective motion of a huge number of bath modes is essential to reveal their key roles in the overall dynamics. We apply the principal component analysis (PCA) to investigate the bath motion based on the massive data generated from the MM-SQC (symmetrical quasi-classical dynamics method based on the Meyer-Miller mapping Hamiltonian) nonadiabatic dynamics of the excited-state energy transfer dynamics of Frenkel-exciton model. The PCA method clearly clarifies that two types of bath modes, which either display the strong vibronic couplings or have the frequencies close to electronic transition, are very important to the nonadiabatic dynamics. These observations are fully consistent with the physical insights. This conclusion is obtained purely based on the PCA understanding of the trajectory data, without the large involvement of pre-defined physical knowledge. The results show that the PCA approach, one of the simplest unsupervised machine learning methods, is very powerful to analyze the complicated nonadiabatic dynamics in condensed phase involving many degrees of freedom.


Sign in / Sign up

Export Citation Format

Share Document