Characterisation, identification, clustering, and classification of disease

A. J. Webster; K. Gaitskell; I. Turnbull; B. J. Cairns; R. Clarke

doi:10.1038/s41598-021-84860-z

Characterisation, identification, clustering, and classification of disease

Scientific Reports ◽

10.1038/s41598-021-84860-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

A. J. Webster ◽

K. Gaitskell ◽

I. Turnbull ◽

B. J. Cairns ◽

R. Clarke

Keyword(s):

Risk Factors ◽

Statistical Power ◽

Proportional Hazards ◽

Disease Risk ◽

Neurological Diseases ◽

Disease Incidence ◽

Disease Onset ◽

Laboratory Findings ◽

Clustering And Classification ◽

New Perspective

AbstractThe importance of quantifying the distribution and determinants of multimorbidity has prompted novel data-driven classifications of disease. Applications have included improved statistical power and refined prognoses for a range of respiratory, infectious, autoimmune, and neurological diseases, with studies using molecular information, age of disease incidence, and sequences of disease onset (“disease trajectories”) to classify disease clusters. Here we consider whether easily measured risk factors such as height and BMI can effectively characterise diseases in UK Biobank data, combining established statistical methods in new but rigorous ways to provide clinically relevant comparisons and clusters of disease. Over 400 common diseases were selected for analysis using clinical and epidemiological criteria, and conventional proportional hazards models were used to estimate associations with 12 established risk factors. Several diseases had strongly sex-dependent associations of disease risk with BMI. Importantly, a large proportion of diseases affecting both sexes could be identified by their risk factors, and equivalent diseases tended to cluster adjacently. These included 10 diseases presently classified as “Symptoms, signs, and abnormal clinical and laboratory findings, not elsewhere classified”. Many clusters are associated with a shared, known pathogenesis, others suggest likely but presently unconfirmed causes. The specificity of associations and shared pathogenesis of many clustered diseases provide a new perspective on the interactions between biological pathways, risk factors, and patterns of disease such as multimorbidity.

Download Full-text

Characterisation, identification, clustering, and classification of disease

10.1101/2020.11.26.20227629 ◽

2020 ◽

Author(s):

A.J. Webster ◽

K. Gaitskell ◽

I. Turnbull ◽

B.J. Cairns ◽

R. Clarke

Keyword(s):

Risk Factors ◽

Statistical Power ◽

Proportional Hazards ◽

Proportional Hazards Model ◽

Disease Risk ◽

Neurological Diseases ◽

Disease Incidence ◽

Disease Onset ◽

Laboratory Findings ◽

Clustering And Classification

Data-driven classifications are improving statistical power and refining prognoses for a range of respiratory, infectious, autoimmune, and neurological diseases. Studies have used molecular information, age of disease incidence, and sequences of disease onset (“disease trajectories”). Here we consider whether easily measured risk factors such as height and BMI can usefully characterise diseases in UK Biobank data, combining established statistical methods in new but rigorous ways to provide clinically relevant comparisons and clusters of disease. Over 400 common diseases were selected for study on the basis of clinical and epidemiological criteria, and a conventional proportional hazards model was used to estimate associations with 12 established risk factors. Comparing men and women, several diseases had strongly sex-dependent associations of disease risk with BMI. Despite this, a large proportion of diseases affecting both sexes could be identified by their risk factors, and equivalent diseases tended to cluster adjacently. This included 10 diseases presently classified as “Symptoms, signs, and abnormal clinical and laboratory findings, not elsewhere classified”. Many clusters are associated with a shared, known pathogenesis, others suggest likely but presently unconfirmed causes. The specificity of associations and shared pathogenesis of many clustered diseases, provide a new perspective on the interactions between biological pathways, risk factors, and patterns of disease such as multimorbidity.

Download Full-text

761Characterisation and clustering of diseases by their association with well-known risk factors

International Journal of Epidemiology ◽

10.1093/ije/dyab168.705 ◽

2021 ◽

Vol 50 (Supplement_1) ◽

Author(s):

Anthony Webster ◽

Kezia Gaitskell ◽

Iain Turnbull ◽

Ben Cairns ◽

Robert Clarke

Keyword(s):

Risk Factors ◽

Multiple Testing ◽

Statistical Power ◽

Proportional Hazards ◽

Proportional Hazards Model ◽

Neurological Diseases ◽

Disease Onset ◽

Significant Risk ◽

Multivariate Statistical ◽

Age Related

Abstract Background Data-driven classifications are improving statistical power, refining prognoses, and improving our understanding of autoimmune, respiratory, infectious, and neurological diseases. Classifications have used molecular information, age of incidence, and sequences of disease onset (“disease trajectories”). Here we consider whether associations with easily-measured established risk factors such as height and BMI can usefully characterise disease. Methods UK Biobank data and their linked hospital episode statistics were used to study 172 common age-related diseases. A proportional hazards model was used to estimate associations with potential risk-factors and to adjust for well-known confounders. Diseases were compared and hierarchically clustered using novel but rigorous multivariate statistical methods. Results For diseases affecting both sexes, over 38% can be uniquely identified by their associations with risk factors. Equivalent diseases often clustered adjacently. After an FDR multiple-testing adjustment, roughly 5% have statistically significant differences. Similar remarks applied to several symptoms of unknown cause. Many clustered diseases are associated with a shared, known pathogenesis, others suggest likely but unconfirmed causes. Conclusions Risk factors for disease can be surprisingly precise and can be used to cluster diseases in a meaningful way. Risk factors for men and women may differ for some diseases. Several symptoms of unknown cause have disease-specific, statistically significant risk factors. Key messages Big datasets and modern statistics are providing new insights into the relationships between diseases and their associations with risk-factors. Diseases can be identified and clustered by their associations with well-known risk factors.

Download Full-text

Causal attribution fractions for epidemiological studies, applied to a UK Biobank study of smoking and BMI

10.1101/2021.12.24.21268368 ◽

2021 ◽

Author(s):

Anthony Webster

Keyword(s):

Risk Factors ◽

Proportional Hazards ◽

Disease Risk ◽

Causal Attribution ◽

Disease Incidence ◽

World Health Organisation ◽

Epidemiological Studies ◽

World Health ◽

Proportional Hazard ◽

Uk Biobank

Epidemiological studies often use proportional hazard models to estimate associations between potential risk factors and disease risk. It is emphasised that when the "backdoor criteria" from causal-inference applies, if diseases are sufficiently rare, then the proportional hazard model can be used to estimate causal associations. When the "frontdoor criteria" applies (allowing causal estimates with unmeasured confounders), similar estimates are found to mediation analyses with measured confounders. Reasons for this are discussed. An attribution fraction is constructed using the average causal effects (ACE) of exposures on the population, and simple methods for its evaluation are suggested. It differs from the attribution fraction used by the World Health Organisation (WHO), except for specific circumstances where the latter can agree or provide a bound. A counterfactual argument determines an individual's attribution fraction Af in terms of proportional hazard estimates, as Af = 1 − 1/R, where R is an individual's relative risk. Causally meaningful attribution fractions cannot be constructed for all known risk factors or confounders, but there are important cases where they can. As an example, systematic proportional hazards studies with UK Biobank data estimate the attribution fractions of smoking and BMI for 226 diseases. The attribution of risk is characterised in terms of disease chapters from the International Classification of Diseases (ICD-10), and the diseases most strongly attributed to smoking and BMI are identified. The result is a quantitative characterisation of the causal influence of smoking and BMI on the landscape of disease incidence in the UK Biobank population.

Download Full-text

Defining heart disease risk for death in COVID-19 infection

QJM ◽

10.1093/qjmed/hcaa246 ◽

2020 ◽

Vol 113 (12) ◽

pp. 876-882 ◽

Cited By ~ 1

Author(s):

J Li ◽

T Guo ◽

D Dong ◽

X Zhang ◽

X Chen ◽

...

Keyword(s):

Risk Factors ◽

Cox Regression ◽

Disease Risk ◽

Demographic Data ◽

Kidney Injury ◽

Laboratory Findings ◽

Course Of Disease ◽

Clinical Observations ◽

Independent Risk Factors ◽

Heart Disease Risk

Summary Background Cardiovascular disease (CVD) was in common in coronavirus disease 2019 (COVID-19) patients and associated with unfavorable outcomes. We aimed to compare the clinical observations and outcomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-infected patients with or without CVD. Methods Patients with laboratory-confirmed SARS-CoV-2 infection were clinically evaluated at Wuhan Seventh People’s Hospital, Wuhan, China, from 23 January to 14 March 2020. Demographic data, laboratory findings, comorbidities, treatments and outcomes were collected and analyzed in COVID-19 patients with and without CVD. Results Among 596 patients with COVID-19, 215 (36.1%) of them with CVD. Compared with patients without CVD, these patients were significantly older (66 vs. 52 years) and had higher proportion of men (52.5% vs. 43.8%). Complications in the course of disease were more common in patients with CVD, included acute respiratory distress syndrome (22.8% vs. 8.1%), malignant arrhythmias (3.7% vs. 1.0%) including ventricular tachycardia/ventricular fibrillation, acute coagulopathy(7.9% vs. 1.8%) and acute kidney injury (11.6% vs. 3.4%). The rate of glucocorticoid therapy (36.7% vs. 25.5%), Vitamin C (23.3% vs. 11.8%), mechanical ventilation (21.9% vs. 7.6%), intensive care unit admission (12.6% vs. 3.7%) and mortality (16.7% vs. 4.7%) were higher in patients with CVD (both P < 0.05). The multivariable Cox regression models showed that older age (≥65 years old) (HR 3.165, 95% CI 1.722–5.817) and patients with CVD (HR 2.166, 95% CI 1.189–3.948) were independent risk factors for death. Conclusions CVD are independent risk factors for COVID-19 patients. COVID-19 patients with CVD were more severe and had higher mortality rate, early intervention and vigilance should be taken.

Download Full-text

Variable course of Unverricht-Lundborg disease

Neurology ◽

10.1212/wnl.0000000000004518 ◽

2017 ◽

Vol 89 (16) ◽

pp. 1691-1697 ◽

Cited By ~ 8

Author(s):

Laura Canafoglia ◽

Edoardo Ferlazzo ◽

Roberto Michelucci ◽

Pasquale Striano ◽

Adriana Magaudda ◽

...

Keyword(s):

Risk Factors ◽

Cognitive Impairment ◽

Cognitive Decline ◽

Time Course ◽

Proportional Hazards ◽

Cell Damage ◽

Age At Onset ◽

Disease Onset ◽

Rank Test ◽

Genetically Determined

Objective:To explore the course of Unverricht-Lundborg disease (EPM1) and identify the risk factors for severity, we investigated the time course of symptoms and prognostic factors already detectable near to disease onset.Methods:We retrospectively evaluated the features of 59 Italian patients carrying the CSTB expansion mutation, and coded the information every 5 years after the disease onset in order to describe the cumulative time-dependent probability of reaching disabling myoclonus, relevant cognitive impairment, and inability to work, and evaluated the influence of early factors using the log-rank test. The risk factors were included in a Cox multivariate proportional hazards regression model.Results:Disabling myoclonus occurred an average of 32 years after disease onset, whereas cognitive impairment occurred a little later. An age at onset of less than 12 years, the severity of myoclonus at the time of first assessment, and seizure persistence more than 10 years after onset affected the timing of disabling myoclonus and cognitive decline. Most patients became unable to work years before the appearance of disabling myoclonus or cognitive decline.Conclusions:A younger age at onset, early severe myoclonus, and seizure persistence are predictors of a more severe outcome. All of these factors may be genetically determined, but the greater hyperexcitability underlying more severe seizures and myoclonus at onset may also play a role by increasing cell damage due to reduced cystatin B activity.

Download Full-text

Persistence of biologic treatments in psoriatic arthritis: a population-based study in Sweden

Rheumatology Advances in Practice ◽

10.1093/rap/rkaa070 ◽

2020 ◽

Vol 4 (2) ◽

Author(s):

Kirk Geale ◽

Ingrid Lindberg ◽

Emma C Paulsson ◽

E Christina M Wennerström ◽

Anna Tjärnlund ◽

...

Keyword(s):

Risk Factors ◽

Real World ◽

Lower Risk ◽

Proportional Hazards ◽

Reference Group ◽

Treatment Plan ◽

Disease Onset ◽

Population Based ◽

Cox Proportional Hazards ◽

Biologic Treatment

Abstract Objectives TNF inhibitors (TNFis) and IL inhibitors are effective treatments for PsA. Treatment non-persistence (drug survival, discontinuation) is a measure of effectiveness, tolerability and patient satisfaction or preferences in real-world clinical practice. Persistence on these treatments is not well understood in European PsA populations. The aim of this study was to compare time to non-persistence for either ustekinumab (IL-12/23 inhibitor) or secukinumab (IL-17 inhibitor) to a reference group of adalimumab (TNFi) treatment exposures in PsA patients and identify risk factors for non-persistence. Methods A total of 4649 exposures of adalimumab, ustekinumab, and secukinumab in 3918 PsA patients were identified in Swedish longitudinal population-based registry data. Kaplan–Meier curves were constructed to measure treatment-specific real-world risk of non-persistence and adjusted Cox proportional hazards models were estimated to identify risk factors associated with non-persistence. Results Ustekinumab was associated with a lower risk of non-persistence relative to adalimumab in biologic-naïve [hazard ratio (HR) 0.48 (95% CI 0.33, 0.69)] and biologic-experienced patients [HR 0.65 (95% CI 0.56, 0.76)], while secukinumab was associated with a lower risk in biologic-naïve patients [HR 0.65 (95% CI 0.49, 0.86)] but a higher risk of non-persistence in biologic-experienced patients [HR 1.20 (95% CI 1.03, 1.40)]. Biologic non-persistence was also associated with female sex, axial involvement, recent disease onset, biologic treatment experience and no psoriasis. Conclusion Ustekinumab exhibits a favourable treatment persistency profile relative to adalimumab overall and across lines of treatment. The performance of secukinumab is dependent on biologic experience. Persistence and risk factors for non-persistence should be accounted for when determining an optimal treatment plan for patients.

Download Full-text

Renal function, cardiovascular disease risk factors' prevalence and 5-year disease incidence; the role of diet, exercise, lipids and inflammation markers: the ATTICA study

QJM ◽

10.1093/qjmed/hcq045 ◽

2010 ◽

Vol 103 (6) ◽

pp. 413-422 ◽

Cited By ~ 18

Author(s):

C. Chrysohoou ◽

D. B. Panagiotakos ◽

C. Pitsavos ◽

J. Skoumas ◽

M. Toutouza ◽

...

Keyword(s):

Risk Factors ◽

Cardiovascular Disease ◽

Renal Function ◽

Disease Risk ◽

Disease Incidence ◽

Cardiovascular Disease Risk ◽

Cardiovascular Disease Risk Factors ◽

Inflammation Markers

Download Full-text

Vascular Hypothesis of Alzheimer Disease

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvbaha.120.311911 ◽

2021 ◽

Author(s):

Sanny Scheffer ◽

Dorien M.A. Hermkens ◽

Louise van der Weerd ◽

Helga E. de Vries ◽

Mat J.A.P. Daemen

Keyword(s):

Risk Factors ◽

Cardiovascular Disease ◽

Alzheimer Disease ◽

Disease Risk ◽

Cardiovascular Disease Risk ◽

Disease Onset ◽

Vascular Risk Factors ◽

Chronic Cerebral Hypoperfusion ◽

Cerebral Hypoperfusion ◽

Vascular Risk

Alzheimer disease (AD) is marked by profound neurodegeneration, neuroinflammation, and cognitive decline. Pathologically, AD is characterized by the accumulation of extracellular amyloid and intraneuronal tangles, consisting of hyperphosphorylated tau. To date, factors leading to disease onset and progression are still an important topic of investigation. Various epidemiological studies revealed cardiovascular disease as an important contributor to the development and progression of AD, leading to the so-called vascular hypothesis. Vascular risk factors, such as hypertension, diabetes, and hyperhomocysteinemia, are associated with a significantly increased chance of developing AD, suggesting an additive or even synergistic effect. These vascular risk factors are often linked to a reduction in cerebral blood flow and the resulting chronic cerebral hypoperfusion is suggested to play a key role in the onset of AD. However, the causal effects of such vascular risk factors for AD onset remain largely unknown. Evidence from animal studies support that chronic cerebral hypoperfusion induction causes a strong aggravation of AD-related pathology, but a comprehensive overview of how the various cardiovascular disease risk factors contribute to disease is lacking. Therefore, we here critically review current literature, to unravel the existing evidence derived from in vivo mouse studies and define the role of cardiovascular disease and chronic cerebral hypoperfusion in AD development. We conclude that, although many aspects of the vascular hypothesis are well supported by observational studies, in-depth mechanistic studies and well-designed randomized controlled trials are highly needed to establish temporal and causal relationships. Described new insights can have major prospective potential for therapeutic interventions.

Download Full-text

A catalogue of omics biological ageing clocks reveals substantial commonality and associations with disease risk

10.1101/2021.02.01.429117 ◽

2021 ◽

Author(s):

Erin Macdonald-Dunlop ◽

Nele Taba ◽

Lucija Klaric ◽

Azra Frkatovic ◽

Rosie Walker ◽

...

Keyword(s):

Risk Factors ◽

Health Outcomes ◽

Functional Capacity ◽

Disease Risk ◽

Disease Incidence ◽

Chronological Age ◽

Biological Ageing ◽

Health Measures ◽

The Difference ◽

Future Work

AbstractBiological age (BA), a measure of functional capacity and prognostic of health outcomes that discriminates between individuals of the same chronological age (chronAge), has been estimated using a variety of biomarkers. Previous comparative studies have mainly used epigenetic models (clocks), we use ~1000 participants to create eleven omics ageing clocks, with correlations of 0.45-0.97 with chronAge, even with substantial sub-setting of biomarkers. These clocks track common aspects of ageing with 94% of the variance in chronAge being shared among clocks. The difference between BA and chronAge - omics clock age acceleration (OCAA) - often associates with health measures. One year’s OCAA typically has the same effect on risk factors/10-year disease incidence as 0.46/0.45 years of chronAge. Epigenetic and IgG glycomics clocks appeared to track generalised ageing while others capture specific risks. We conclude BA is measurable and prognostic and that future work should prioritise health outcomes over chronAge.

Download Full-text

Clinical Course and Features of Seizures Associated With LGI1-Antibody Encephalitis

Neurology ◽

10.1212/wnl.0000000000012465 ◽

2021 ◽

pp. 10.1212/WNL.0000000000012465

Author(s):

Kelsey M. Smith ◽

Divyanshu Dubey ◽

Greta B. Liebo ◽

Eoin P. Flanagan ◽

Jeffrey W. Britton

Keyword(s):

Risk Factors ◽

Proportional Hazards ◽

Age Of Onset ◽

Disease Onset ◽

Cox Proportional Hazards ◽

Binary Logistic Regression Analysis ◽

Chronic Epilepsy ◽

Female Sex ◽

Steroid Sparing

Objective:To determine risk factors associated with clinical relapses and development of chronic epilepsy in patients with anti-leucine-rich glioma-inactivated 1 (LGI1) IgG encephalitis.Methods:Patients with seizures related to LGI1-antibody encephalitis with ≥ 24 months of follow-up from disease onset were identified in the Mayo Clinic electronic medical record and Neuroimmunology lab records. Charts were reviewed to determine clinical factors, seizure types, imaging, treatment, occurrence of relapse, and outcome. Binary logistic regression analysis was performed to identify predictors of the development of chronic epilepsy. Univariate Cox proportional hazards regression was used to examine the influence of baseline characteristics on relapse risk.Results:Forty-nine patients with LGI1-antibody encephalitis and acute symptomatic seizures were identified. Almost all patients (n=48, 98%) were treated with immunotherapy. Eight had definite, and two had possible chronic epilepsy at last follow-up (10/49, 20.4%). Female sex (P=0.048) and younger age at disease onset (P=0.02) were associated with development of chronic epilepsy. Relapses occurred in 20 (40.8%), with a median time to first relapse of 7.5 months (range 3-94 months). Initial treatment with chronic steroid sparing immunotherapy was associated with reduced risk of relapse (hazards ratio=0.28, 95% CI 0.11-0.73, P=0.009).Conclusions:Chronic epilepsy occurred in 20.4% of our patients with LGI1-antibody encephalitis despite aggressive immunotherapy. Risk factors for chronic epilepsy were female sex and earlier age of onset. Relapses occurred in 40.8% of patients with prolonged follow-up, and chronic steroid sparing immunotherapy was associated with a lower relapse rate.

Download Full-text