scholarly journals Machine Learning-Based Model Selection and Parameter Estimation from Kinetic Data of Complex First-Order Reaction Systems

Author(s):  
László Zimányi ◽  
Áron Sipos ◽  
Ferenc Sarlós ◽  
Rita Nagypál ◽  
Géza Groma

<a>Dealing with a system of first-order reactions is a recurrent problem in chemometrics, especially in the analysis of data obtained by spectroscopic methods. Here we argue that global multiexponential fitting, the still common way to solve this kind of problems has serious weaknesses, in contrast to the available contemporary methods of sparse modeling. Combining the advantages of group-lasso and elastic net – the statistical methods proven to be very powerful in other areas – we obtained an optimization problem tunable to result in from very sparse to very dense distribution over a large pre-defined grid of time constants, fitting both simulated and experimental multiwavelength spectroscopic data with very high performance. Moreover, it was found that the optimal values of the tuning hyperparameters can be selected by a machine-learning algorithm based on a Bayesian optimization procedure, utilizing a widely used and a novel version of cross-validation. The applied algorithm recovered very exactly the true sparse kinetic parameters of an extremely complex simulated model of the bacteriorhodopsin photocycle, as well as the wide peak of hypothetical distributed kinetics in the presence of different levels of noise. It also performed well in the analysis of the ultrafast experimental fluorescence kinetics data detected on the coenzyme FAD in a very wide logarithmic time window.</a>

2020 ◽  
Author(s):  
László Zimányi ◽  
Áron Sipos ◽  
Ferenc Sarlós ◽  
Rita Nagypál ◽  
Géza Groma

<a>Dealing with a system of first-order reactions is a recurrent problem in chemometrics, especially in the analysis of data obtained by spectroscopic methods. Here we argue that global multiexponential fitting, the still common way to solve this kind of problems has serious weaknesses, in contrast to the available contemporary methods of sparse modeling. Combining the advantages of group-lasso and elastic net – the statistical methods proven to be very powerful in other areas – we obtained an optimization problem tunable to result in from very sparse to very dense distribution over a large pre-defined grid of time constants, fitting both simulated and experimental multiwavelength spectroscopic data with very high performance. Moreover, it was found that the optimal values of the tuning hyperparameters can be selected by a machine-learning algorithm based on a Bayesian optimization procedure, utilizing a widely used and a novel version of cross-validation. The applied algorithm recovered very exactly the true sparse kinetic parameters of an extremely complex simulated model of the bacteriorhodopsin photocycle, as well as the wide peak of hypothetical distributed kinetics in the presence of different levels of noise. It also performed well in the analysis of the ultrafast experimental fluorescence kinetics data detected on the coenzyme FAD in a very wide logarithmic time window.</a>


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255675
Author(s):  
László Zimányi ◽  
Áron Sipos ◽  
Ferenc Sarlós ◽  
Rita Nagypál ◽  
Géza I. Groma

Dealing with a system of first-order reactions is a recurrent issue in chemometrics, especially in the analysis of data obtained by spectroscopic methods applied on complex biological systems. We argue that global multiexponential fitting, the still common way to solve such problems, has serious weaknesses compared to contemporary methods of sparse modeling. Combining the advantages of group lasso and elastic net—the statistical methods proven to be very powerful in other areas—we created an optimization problem tunable from very sparse to very dense distribution over a large pre-defined grid of time constants, fitting both simulated and experimental multiwavelength spectroscopic data with high computational efficiency. We found that the optimal values of the tuning hyperparameters can be selected by a machine-learning algorithm based on a Bayesian optimization procedure, utilizing widely used or novel versions of cross-validation. The derived algorithm accurately recovered the true sparse kinetic parameters of an extremely complex simulated model of the bacteriorhodopsin photocycle, as well as the wide peak of hypothetical distributed kinetics in the presence of different noise levels. It also performed well in the analysis of the ultrafast experimental fluorescence kinetics data detected on the coenzyme FAD in a very wide logarithmic time window. We conclude that the primary application of the presented algorithms—implemented in available software—covers a wide area of studies on light-induced physical, chemical, and biological processes carried out with different spectroscopic methods. The demand for this kind of analysis is expected to soar due to the emerging ultrafast multidimensional infrared and electronic spectroscopic techniques that provide very large and complex datasets. In addition, simulations based on our methods could help in designing the technical parameters of future experiments for the verification of particular hypothetical models.


Foods ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 763
Author(s):  
Ran Yang ◽  
Zhenbo Wang ◽  
Jiajia Chen

Mechanistic-modeling has been a useful tool to help food scientists in understanding complicated microwave-food interactions, but it cannot be directly used by the food developers for food design due to its resource-intensive characteristic. This study developed and validated an integrated approach that coupled mechanistic-modeling and machine-learning to achieve efficient food product design (thickness optimization) with better heating uniformity. The mechanistic-modeling that incorporated electromagnetics and heat transfer was previously developed and validated extensively and was used directly in this study. A Bayesian optimization machine-learning algorithm was developed and integrated with the mechanistic-modeling. The integrated approach was validated by comparing the optimization performance with a parametric sweep approach, which is solely based on mechanistic-modeling. The results showed that the integrated approach had the capability and robustness to optimize the thickness of different-shape products using different initial training datasets with higher efficiency (45.9% to 62.1% improvement) than the parametric sweep approach. Three rectangular-shape trays with one optimized thickness (1.56 cm) and two non-optimized thicknesses (1.20 and 2.00 cm) were 3-D printed and used in microwave heating experiments, which confirmed the feasibility of the integrated approach in thickness optimization. The integrated approach can be further developed and extended as a platform to efficiently design complicated microwavable foods with multiple-parameter optimization.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 656
Author(s):  
Xavier Larriva-Novo ◽  
Víctor A. Villagrá ◽  
Mario Vega-Barbas ◽  
Diego Rivera ◽  
Mario Sanz Rodrigo

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.


2021 ◽  
Author(s):  
Inger Persson ◽  
Andreas Östling ◽  
Martin Arlbrandt ◽  
Joakim Söderberg ◽  
David Becedas

BACKGROUND Despite decades of research, sepsis remains a leading cause of mortality and morbidity in ICUs worldwide. The key to effective management and patient outcome is early detection, where no prospectively validated machine learning prediction algorithm is available for clinical use in Europe today. OBJECTIVE To develop a high-performance machine learning sepsis prediction algorithm based on routinely collected ICU data, designed to be implemented in Europe. METHODS The machine learning algorithm is developed using Convolutional Neural Network, based on the Massachusetts Institute of Technology Lab for Computational Physiology MIMIC-III Clinical Database, focusing on ICU patients aged 18 years or older. Twenty variables are used for prediction, on an hourly basis. Onset of sepsis is defined in accordance with the international Sepsis-3 criteria. RESULTS The developed algorithm NAVOY Sepsis uses 4 hours of input and can with high accuracy predict patients with high risk of developing sepsis in the coming hours. The prediction performance is superior to that of existing sepsis early warning scoring systems, and competes well with previously published prediction algorithms designed to predict sepsis onset in accordance with the Sepsis-3 criteria, as measured by the area under the receiver operating characteristics curve (AUROC) and the area under the precision-recall curve (AUPRC). NAVOY Sepsis yields AUROC = 0.90 and AUPRC = 0.62 for predictions up to 3 hours before sepsis onset. The predictive performance is externally validated on hold-out test data, where NAVOY Sepsis is confirmed to predict sepsis with high accuracy. CONCLUSIONS An algorithm with excellent predictive properties has been developed, based on variables routinely collected at ICUs. This algorithm is to be further validated in an ongoing prospective randomized clinical trial and will be CE marked as Software as a Medical Device, designed for commercial use in European ICUs.


Author(s):  
Olfa Hamdi-Larbi ◽  
Ichrak Mehrez ◽  
Thomas Dufaud

Many applications in scientific computing process very large sparse matrices on parallel architectures. The presented work in this paper is a part of a project where our general aim is to develop an auto-tuner system for the selection of the best matrix compression format in the context of high-performance computing. The target smart system can automatically select the best compression format for a given sparse matrix, a numerical method processing this matrix, a parallel programming model and a target architecture. Hence, this paper describes the design and implementation of the proposed concept. We consider a case study consisting of a numerical method reduced to the sparse matrix vector product (SpMV), some compression formats, the data parallel as a programming model and, a distributed multi-core platform as a target architecture. This study allows extracting a set of important novel metrics and parameters which are relative to the considered programming model. Our metrics are used as input to a machine-learning algorithm to predict the best matrix compression format. An experimental study targeting a distributed multi-core platform and processing random and real-world matrices shows that our system can improve in average up to 7% the accuracy of the machine learning.


2021 ◽  
Author(s):  
Ekaterina Gurina ◽  
Ksenia Antipova ◽  
Nikita Klyuchnikov ◽  
Dmitry Koroteev

Abstract Drilling accidents prediction is the important task in well construction. Drilling support software allows observing the drilling parameters for multiple wells at the same time and artificial intelligence helps detecting the drilling accident predecessor ahead the emergency situation. We present machine learning (ML) algorithm for prediction of such accidents as stuck, mud loss, fluid show, washout, break of drill string and shale collar. The model for forecasting the drilling accidents is based on the "Bag-of-features" approach, which implies the use of distributions of the directly recorded data as the main features. Bag-of-features implies the labeling of small parts of data by the particular symbol, named codeword. Building histograms of symbols for the data segment, one could use the histogram as an input for the machine learning algorithm. Fragments of real-time mud log data were used to create the model. We define more than 1000 drilling accident predecessors for more than 60 real accidents and about 2500 normal drilling cases as a training set for ML model. The developed model analyzes real-time mud log data and calculates the probability of accident. The result is presented as a probability curve for each type of accident; if the critical probability value is exceeded, the user is notified of the risk of an accident. The Bag-of-features model shows high performance by validation both on historical data and in real time. The prediction quality does not vary field to field and could be used in different fields without additional training of the ML model. The software utilizing the ML model has microservice architecture and is integrated with the WITSML data server. It is capable of real-time accidents forecasting without human intervention. As a result, the system notifies the user in all cases when the situation in the well becomes similar to the pre-accident one, and the engineer has enough time to take the necessary actions to prevent an accident.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
T. Bagni ◽  
G. Bovone ◽  
A. Rack ◽  
D. Mauro ◽  
C. Barth ◽  
...  

AbstractThe electro-mechanical and electro-thermal properties of high-performance Restacked-Rod-Process (RRP) Nb3Sn wires are key factors in the realization of compact magnets above 15 T for the future particle physics experiments. Combining X-ray micro-tomography with unsupervised machine learning algorithm, we provide a new tool capable to study the internal features of RRP wires and unlock different approaches to enhance their performances. Such tool is ideal to characterize the distribution and morphology of the voids that are generated during the heat treatment necessary to form the Nb3Sn superconducting phase. Two different types of voids can be detected in this type of wires: one inside the copper matrix and the other inside the Nb3Sn sub-elements. The former type can be related to Sn leaking from sub-elements to the copper matrix which leads to poor electro-thermal stability of the whole wire. The second type is detrimental for the electro-mechanical performance of the wires as superconducting wires experience large electromagnetic stresses in high field and high current conditions. We analyze these aspects thoroughly and discuss the potential of the X-ray tomography analysis tool to help modeling and predicting electro-mechanical and electro-thermal behavior of RRP wires and optimize their design.


Sign in / Sign up

Export Citation Format

Share Document