Classification of Amyloidosis by Model-Assisted Mass Spectrometry-Based Proteomics
Amyloidosis is a rare disease caused by the misfolding and extracellular aggregation of proteins as insoluble fibrillary deposits localized either in specific organs or systemically throughout the body. The organ targeted and the disease progression and outcome is highly dependent on the specific fibril-forming protein, and its accurate identification is essential to the choice of treatment. Mass spectrometry-based proteomics has become the method of choice for the identification of the amyloidogenic protein. Regrettably, this identification relies on manual and subjective interpretation of mass spectrometry data by an expert, which is undesirable and may bias diagnosis. To circumvent this, we developed a statistical model-assisted method for the unbiased identification of amyloid-containing biopsies and amyloidosis subtyping. Based on data from mass spectrometric analysis of amyloid-containing biopsies and corresponding controls. A Boruta method applied on a random forest classifier was applied to proteomics data obtained from the mass spectrometric analysis of 75 laser dissected Congo Red positive amyloid-containing biopsies and 78 Congo Red negative biopsies to identify novel “amyloid signature” proteins that included clusterin, fibulin-1, vitronectin complement component C9 and also three collagen proteins, as well as the well-known amyloid signature proteins apolipoprotein E, apolipoprotein A4, and serum amyloid P. A SVM learning algorithm were trained on the mass spectrometry data from the analysis of the 75 amyloid-containing biopsies and 78 amyloid-negative control biopsies. The trained algorithm performed superior in the discrimination of amyloid-containing biopsies from controls, with an accuracy of 1.0 when applied to a blinded mass spectrometry validation data set of 103 prospectively collected amyloid-containing biopsies. Moreover, our method successfully classified amyloidosis patients according to the subtype in 102 out of 103 blinded cases. Collectively, our model-assisted approach identified novel amyloid-associated proteins and demonstrated the use of mass spectrometry-based data in clinical diagnostics of disease by the unbiased and reliable model-assisted classification of amyloid deposits and of the specific amyloid subtype.