scholarly journals A machine learning approach for the factorization of psychometric data with application to the Delis Kaplan Executive Function System

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
J. A. Camilleri ◽  
S. B. Eickhoff ◽  
S. Weis ◽  
J. Chen ◽  
J. Amunts ◽  
...  

AbstractWhile a replicability crisis has shaken psychological sciences, the replicability of multivariate approaches for psychometric data factorization has received little attention. In particular, Exploratory Factor Analysis (EFA) is frequently promoted as the gold standard in psychological sciences. However, the application of EFA to executive functioning, a core concept in psychology and cognitive neuroscience, has led to divergent conceptual models. This heterogeneity severely limits the generalizability and replicability of findings. To tackle this issue, in this study, we propose to capitalize on a machine learning approach, OPNMF (Orthonormal Projective Non-Negative Factorization), and leverage internal cross-validation to promote generalizability to an independent dataset. We examined its application on the scores of 334 adults at the Delis–Kaplan Executive Function System (D-KEFS), while comparing to standard EFA and Principal Component Analysis (PCA). We further evaluated the replicability of the derived factorization across specific gender and age subsamples. Overall, OPNMF and PCA both converge towards a two-factor model as the best data-fit model. The derived factorization suggests a division between low-level and high-level executive functioning measures, a model further supported in subsamples. In contrast, EFA, highlighted a five-factor model which reflects the segregation of the D-KEFS battery into its main tasks while still clustering higher-level tasks together. However, this model was poorly supported in the subsamples. Thus, the parsimonious two-factors model revealed by OPNMF encompasses the more complex factorization yielded by EFA while enjoying higher generalizability. Hence, OPNMF provides a conceptually meaningful, technically robust, and generalizable factorization for psychometric tools.

Catalysts ◽  
2020 ◽  
Vol 10 (3) ◽  
pp. 291 ◽  
Author(s):  
Anamya Ajjolli Nagaraja ◽  
Philippe Charton ◽  
Xavier F. Cadet ◽  
Nicolas Fontaine ◽  
Mathieu Delsaut ◽  
...  

The metabolic engineering of pathways has been used extensively to produce molecules of interest on an industrial scale. Methods like gene regulation or substrate channeling helped to improve the desired product yield. Cell-free systems are used to overcome the weaknesses of engineered strains. One of the challenges in a cell-free system is selecting the optimized enzyme concentration for optimal yield. Here, a machine learning approach is used to select the enzyme concentration for the upper part of glycolysis. The artificial neural network approach (ANN) is known to be inefficient in extrapolating predictions outside the box: high predicted values will bump into a sort of “glass ceiling”. In order to explore this “glass ceiling” space, we developed a new methodology named glass ceiling ANN (GC-ANN). Principal component analysis (PCA) and data classification methods are used to derive a rule for a high flux, and ANN to predict the flux through the pathway using the input data of 121 balances of four enzymes in the upper part of glycolysis. The outcomes of this study are i. in silico selection of optimum enzyme concentrations for a maximum flux through the pathway and ii. experimental in vitro validation of the “out-of-the-box” fluxes predicted using this new approach. Surprisingly, flux improvements of up to 63% were obtained. Gratifyingly, these improvements are coupled with a cost decrease of up to 25% for the assay.


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S331-S332
Author(s):  
Nusrat J Epsi ◽  
John H Powers ◽  
David A Lindholm ◽  
David A Lindholm ◽  
Alison Helfrich ◽  
...  

Abstract Background The novel coronavirus disease 2019 (COVID-19) pandemic remains a global challenge. Accurate COVID-19 prognosis remains an important aspect of clinical management. While many prognostic systems have been proposed, most are derived from analyses of individual symptoms or biomarkers. Here, we take a machine learning approach to first identify discrete clusters of early stage-symptoms which may delineate groups with distinct symptom phenotypes. We then sought to identify whether these groups correlate with subsequent disease severity. Methods The Epidemiology, Immunology, and Clinical Characteristics of Emerging Infectious Diseases with Pandemic Potential (EPICC) study is a longitudinal cohort study with data and biospecimens collected from nine military treatment facilities over 1 year of follow-up. Demographic and clinical characteristics were measured with interviews and electronic medical record review. Early symptoms by organ-domain were measured by FLU-PRO-plus surveys collected for 14 days post-enrollment, with surveys completed a median 14.5 (Interquartile Range, IQR = 13) days post-symptom onset. Using these FLU-PRO-plus responses, we applied principal component analysis followed by unsupervised machine learning algorithm k-means to identify groups with distinct clusters of symptoms. We then fit multivariate logistic regression models to determine how these early-symptom clusters correlated with hospitalization risk after controlling for age, sex, race, and obesity. Results Using SARS-CoV-2 positive participants (n = 1137) from the EPICC cohort (Figure 1), we transformed reported symptoms into domains and identified three groups of participants with distinct clusters of symptoms. Logistic regression demonstrated that cluster-2 was associated with an approximately three-fold increased odds [3.01 (95% CI: 2-4.52); P < 0.001] of hospitalization which remained significant after controlling for other factors [2.97 (95% CI: 1.88-4.69); P < 0.001]. (A) Baseline characteristics of SARS-CoV-2 positive participants. (B) Heatmap comparing FLU-PRO response in each participant. (C) Principal component analysis followed by k-means clustering identified three groups of participants. (D) Crude and adjusted association of identified cluster with hospitalization. Conclusion Our findings have identified three distinct groups with early-symptom phenotypes. With further validation of the clusters’ significance, this tool could be used to improve COVID-19 prognosis in a precision medicine framework and may assist in patient triaging and clinical decision-making. Disclaimer Disclosures David A. Lindholm, MD, American Board of Internal Medicine (Individual(s) Involved: Self): Member of Auxiliary R&D Infectious Disease Item-Writer Task Force. No financial support received. No exam questions will be disclosed ., Other Financial or Material Support Ryan C. Maves, MD, EMD Serono (Advisor or Review Panel member)Heron Therapeutics (Advisor or Review Panel member) Simon Pollett, MBBS, Astra Zeneca (Other Financial or Material Support, HJF, in support of USU IDCRP, funded under a CRADA to augment the conduct of an unrelated Phase III COVID-19 vaccine trial sponsored by AstraZeneca as part of USG response (unrelated work))


2021 ◽  
Author(s):  
Arjun Singh

Abstract Drug discovery is incredibly time-consuming and expensive, averaging over 10 years and $985 million per drug. Calculating the binding affinity between a target protein and a ligand is critical for discovering viable drugs. Although supervised machine learning (ML) can predict binding affinity accurately, models experience severe overfitting due to an inability to identify informative properties of protein-ligand complexes. This study used unsupervised ML to reveal underlying protein-ligand characteristics that strongly influence binding affinity. Protein-ligand 3D models were collected from the PDBBind database and vectorized into 2422 features per complex. Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), K-Means Clustering, and heatmaps were used to identify groups of complexes and the features responsible for the separation. ML benchmarking was used to determine the features’ effect on ML performance. The PCA heatmap revealed groups of complexes with binding affinity of pKd < 6 and pKd > 8, and identified the number of CCCH and CCCCCH fragments in the ligand as the most responsible features. A high correlation of 0.8337, their ability to explain 18% of the binding affinity’s variance, and an error increase of 0.09 in Decision Trees when trained without the two features suggests that the fragments exist within a larger ligand substructure that significantly influences binding affinity. This discovery is a baseline for informative ligand representations to be generated so that ML models overfit less and can more reliably identify novel drug candidates. Future work will focus on validating the ligand substructure’s presence and discovering more informative intra-ligand relationships.


Sign in / Sign up

Export Citation Format

Share Document