Predicting Variant-Driven Multi-Wave Pattern of COVID-19 via a Machine Learning Analysis of Spike Protein Mutations

Author(s):  
Adele de Hoffer ◽  
Shahram Vatani ◽  
Corentin Cot ◽  
Giacomo Cacciapaglia ◽  
Maria Luisa Chiusano ◽  
...  

Abstract Never before such a vast amount of data, including genome sequencing, has been collected for any viral pandemic than for the current case of COVID-19. This offers the possibility to trace the virus evolution and to assess the role mutations play in its spread within the population, in real time. To this end, we focused on the Spike protein for its central role in mediating viral outbreak and replication in host cells. Employing the Levenshtein distance on the Spike protein sequences, we designed a machine learning algorithm yielding a temporal clustering of the available dataset. From this, we were able to identify and define emerging persistent variants that are in agreement with known evidences. Our novel algorithm allowed us to define persistent variants as chains that remain stable over time and to highlight emerging variants of epidemiological interest as branching events that occur over time. Hence, we determined the relationship and temporal connection between variants of interest and the ensuing passage to dominance of the current variants of concern. Remarkably, the analysis and the relevant tools introduced in our work serve as an early warning for the emergence of new persistent variants once the associated cluster reaches 1% of the time-binned sequence data. We validated our approach and its effectiveness on the onset of the Alpha variant of concern. We further predict that the recently identified lineage AY.4.2 (‘Delta plus’) is causing a new emerging variant. Comparing our findings with the epidemiological data we demonstrated that each new wave is dominated by a new emerging variant, thus confirming the hypothesis of the existence of a strong correlation between the birth of variants and the pandemic multi-wave temporal pattern. The above allows us to introduce the epidemiology of variants that we described via the Mutation epidemiological Renormalisation Group (MeRG) framework.

2021 ◽  
Author(s):  
Adele de Hoffer ◽  
Shahram Vatani ◽  
Corentin Cot ◽  
Giacomo Cacciapaglia ◽  
Francesco Conventi ◽  
...  

Never before such a vast amount of data has been collected for any viral pandemic than for the current case of COVID-19. This offers the possibility to answer a number of highly relevant questions, regarding the evolution of the virus and the role mutations play in its spread among the population. We focus on spike proteins, as they bear the main responsibility for the effectiveness of the virus diffusion by controlling the interactions with the host cells. Using the available temporal structure of the sequencing data for the SARS-CoV-2 spike protein in the UK, we demonstrate that every wave of the pandemic is dominated by a different variant. Consequently, the time evolution of each variant follows a temporal structure encoded in the epidemiological Renormalisation Group approach to compartmental models. Machine learning is the tool of choice to determine the variants at play, independent of (but complementary to) the virological classification. Our Machine Learning algorithm on spike protein sequencing provides a simple and unbiased way to identify, classify and track relevant virus variants without any prior knowledge of their characteristics. Hence, we propose a new tool that can help preventing and forecasting the emergence of new waves, and that can be used by decision makers to define short and long term strategies to curb the current COVID-19 pandemic or future ones.


2021 ◽  
Author(s):  
Mustapha Abba ◽  
Chidozie Nduka ◽  
Seun Anjorin ◽  
Shukri Mohamed ◽  
Emmanuel Agogo ◽  
...  

BACKGROUND Due to scientific and technical advancements in the field, published hypertension research has developed during the last decade. Given the huge amount of scientific material published in this field, identifying the relevant information is difficult. We employed topic modelling, which is a strong approach for extracting useful information from enormous amounts of unstructured text. OBJECTIVE To utilize a machine learning algorithm to uncover hidden topics and subtopics from 100 years of peer-reviewed hypertension publications and identify temporal trends. METHODS The titles and abstracts of hypertension papers indexed in PubMed were examined. We used the Latent Dirichlet Allocation (LDA) model to select 20 primary subjects and then ran a trend analysis to see how popular they were over time. RESULTS We gathered 581,750 hypertension-related research articles from 1900 to 2018 and divided them into 20 categories. Preclinical, risk factors, complications, and therapy studies were the categories used to categorise the publications. We discovered themes that were becoming increasingly ‘hot,' becoming less ‘cold,' and being published seldom. Risk variables and major cardiovascular events subjects displayed very dynamic patterns over time (how? – briefly detail here). The majority of the articles (71.2%) had a negative valency, followed by positive (20.6%) and neutral valencies (8.2 percent). Between 1980 and 2000, negative sentiment articles fell somewhat, while positive and neutral sentiment articles climbed significantly. CONCLUSIONS This unique machine learning methodology provided fascinating insights on current hypertension research trends. This method allows researchers to discover study subjects and shifts in study focus, and in the end, it captures the broader picture of the primary concepts in current hypertension research articles. CLINICALTRIAL Not applicable


2021 ◽  
Vol 30 (1) ◽  
pp. 93-110
Author(s):  
Tianyi Wang ◽  

Differential equations are widely used to model systems that change over time, some of which exhibit chaotic behaviors. This paper proposes two new methods to classify these behaviors that are utilized by a supervised machine learning algorithm. Dissipative chaotic systems, in contrast to conservative chaotic systems, seem to follow a certain visual pattern. Also, the machine learning program written in the Wolfram Language is utilized to classify chaotic behavior with an accuracy around 99.1±1.1%.


2019 ◽  
Vol 28 (07) ◽  
pp. 1950022 ◽  
Author(s):  
Haiou Qin ◽  
Du Zhang ◽  
Xibin Sun ◽  
Jiahua Tang ◽  
Jun Peng

One of the emerging research opportunities in machine learning is to develop computing systems that learn many tasks continuously and improve the performance of learned tasks incrementally over time. In real world, learners have to adapt to labeled and unlabeled samples from various tasks which arrive randomly. In this paper, we propose an efficient algorithm called Efficient Perpetual Learning Algorithm (EPLA) which is suitable for learning multiple tasks in both offline and online settings. The algorithm, which is an extension of ELLA,4 is part of what we call perpetual learning that can learn new tasks or refine knowledge of learned tasks for improved performance with newly arrived labeled samples in an incremental fashion. Several salient features exist for EPLA. The learning episodes are triggered via either extrinsic or intrinsic stimuli. Agent systems based on the proposed algorithm can be engaged in an open-ended and alternating sequence of learning episodes and working episodes. Unlabeled samples can be used to self-train the learner in small data setting. Compared with ELLA, EPLA shows almost equivalent performance without memorizing any labeled samples learned previously.


Author(s):  
Du Zhang ◽  
Meiliu Lu

One of the long-term research goals in machine learning is how to build never-ending learners. The state-of-the-practice in the field of machine learning thus far is still dominated by the one-time learner paradigm: some learning algorithm is utilized on data sets to produce certain model or target function, and then the learner is put away and the model or function is put to work. Such a learn-once-apply-next (or LOAN) approach may not be adequate in dealing with many real world problems and is in sharp contrast with the human’s lifelong learning process. On the other hand, learning can often be brought on through overcoming some inconsistent circumstances. This paper proposes a framework for perpetual learning agents that are capable of continuously refining or augmenting their knowledge through overcoming inconsistencies encountered during their problem-solving episodes. The never-ending nature of a perpetual learning agent is embodied in the framework as the agent’s continuous inconsistency-induced belief revision process. The framework hinges on the agents recognizing inconsistency in data, information, knowledge, or meta-knowledge, identifying the cause of inconsistency, revising or augmenting beliefs to explain, resolve, or accommodate inconsistency. The authors believe that inconsistency can serve as one of the important learning stimuli toward building perpetual learning agents that incrementally improve their performance over time.


Author(s):  
Lizhou Zhang ◽  
Cody B Jackson ◽  
Huihui Mou ◽  
Amrita Ojha ◽  
Erumbi S Rangarajan ◽  
...  

ABSTRACTSARS coronavirus 2 (SARS-CoV-2) isolates encoding a D614G mutation in the viral spike (S) protein predominate over time in locales where it is found, implying that this change enhances viral transmission. We therefore compared the functional properties of the S proteins with aspartic acid (SD614) and glycine (SG614) at residue 614. We observed that retroviruses pseudotyped with SG614 infected ACE2-expressing cells markedly more efficiently than those with SD614. This greater infectivity was correlated with less S1 shedding and greater incorporation of the S protein into the pseudovirion. Similar results were obtained using the virus-like particles produced with SARS-CoV-2 M, N, E, and S proteins. However, SG614 did not bind ACE2 more efficiently than SD614, and the pseudoviruses containing these S proteins were neutralized with comparable efficiencies by convalescent plasma. These results show SG614 is more stable than SD614, consistent with epidemiological data suggesting that viruses with SG614 transmit more efficiently.


2022 ◽  
Vol 119 (4) ◽  
pp. e2113118119
Author(s):  
Juan Rodriguez-Rivas ◽  
Giancarlo Croce ◽  
Maureen Muscat ◽  
Martin Weigt

The emergence of new variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a major concern given their potential impact on the transmissibility and pathogenicity of the virus as well as the efficacy of therapeutic interventions. Here, we predict the mutability of all positions in SARS-CoV-2 protein domains to forecast the appearance of unseen variants. Using sequence data from other coronaviruses, preexisting to SARS-CoV-2, we build statistical models that not only capture amino acid conservation but also more complex patterns resulting from epistasis. We show that these models are notably superior to conservation profiles in estimating the already observable SARS-CoV-2 variability. In the receptor binding domain of the spike protein, we observe that the predicted mutability correlates well with experimental measures of protein stability and that both are reliable mutability predictors (receiver operating characteristic areas under the curve ∼0.8). Most interestingly, we observe an increasing agreement between our model and the observed variability as more data become available over time, proving the anticipatory capacity of our model. When combined with data concerning the immune response, our approach identifies positions where current variants of concern are highly overrepresented. These results could assist studies on viral evolution and future viral outbreaks and, in particular, guide the exploration and anticipation of potentially harmful future SARS-CoV-2 variants.


Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4288 ◽  
Author(s):  
Ahmad Rezaei ◽  
Tyler J. Cuthbert ◽  
Mohsen Gholami ◽  
Carlo Menon

Wearable electronics are recognized as a vital tool for gathering in situ kinematic information of human body movements. In this paper, we describe the production of a core–sheath fiber strain sensor from readily available materials in a one-step dip-coating process, and demonstrate the development of a smart sleeveless shirt for measuring the kinematic angles of the trunk relative to the pelvis in complicated three-dimensional movements. The sensor’s piezoresistive properties and characteristics were studied with respect to the type of core material used. Sensor performance was optimized by straining above the intended working region to increase the consistency and accuracy of the piezoresistive sensor. The accuracy of the sensor when tracking random movements was tested using a rigorous 4-h random wave pattern to mimic what would be required for satisfactory use in prototype devices. By processing the raw signal with a machine learning algorithm, we were able to track a strain of random wave patterns to a normalized root mean square error of 1.6%, highlighting the consistency and reproducible behavior of the relatively simple sensor. Then, we evaluated the performance of these sensors in a prototype motion capture shirt, in a study with 12 participants performing a set of eight different types of uniaxial and multiaxial movements. A machine learning random forest regressor model estimated the trunk flexion, lateral bending, and rotation angles with errors of 4.26°, 3.53°, and 3.44° respectively. These results demonstrate the feasibility of using smart textiles for capturing complicated movements and a solution for the real-time monitoring of daily activities.


2021 ◽  
Author(s):  
Eric W. Bell ◽  
Jacob H. Schwartz ◽  
Peter L. Freddolino ◽  
Yang Zhang

AbstractProteome-wide identification of protein-protein interactions is a formidable task which has yet to be sufficiently addressed by experimental methodologies. Many computational methods have been developed to predict proteome-wide interaction networks, but few leverage both the sensitivity of structural information and the wide availability of sequence data. We present PEPPI, a pipeline which integrates structural similarity, sequence similarity, functional association data, and machine learning-based classification through a naïve Bayesian classifier model to accurately predict protein-protein interactions at a proteomic scale. Through benchmarking against a set of 798 ground truth interactions and an equal number of noninteractions, we have found that PEPPI attains 4.5% higher AUROC than the best of other state-of-the-art methods. As a proteomic-scale application, PEPPI was applied to model the interactions which occur between SARS-CoV-2 and human host cells during coronavirus infection, where 403 high-confidence interactions were identified with predictions covering 73% of a gold standard dataset from PSICQUIC and demonstrating significant complementarity with the most recent high-throughput experiments. PEPPI is available both as a webserver and in a standalone version and should be a powerful and generally applicable tool for computational screening of protein-protein interactions.


2021 ◽  
Author(s):  
Regina Reis da Costa Alves ◽  
Frederico Caetano Jandre de Assis Tavares ◽  
José Manoel Seixas ◽  
Anete Trajman

Tuberculosis (TB) is a contagious disease which is among the top 10 causes of death in the world. In order to eliminate the disease by 2050, the treatment of TB infection (TBI) is essential, which requires radiological reports to exclude active tuberculosis. The automatic X-ray classifiers used today are based on models that do not guarantee the retention of knowledge if they need to learn new tasks over time. This work proposes the introduction of the lifelong machine learning (LML) paradigm in automatic X-ray classifiers aimed at helping to diagnose active TB (ATB). Two LML algorithms, Efficient Lifelong Learning Algorithm (ELLA) and Learning without Forgetting (LwF), are applied to the TB and pneumonia classification tasks. The results show that it is possible to keep the performance in both tasks with the LML paradigm.


Sign in / Sign up

Export Citation Format

Share Document