importance analysis
Recently Published Documents


TOTAL DOCUMENTS

413
(FIVE YEARS 195)

H-INDEX

20
(FIVE YEARS 6)

Author(s):  
Sławomir K. Zieliński ◽  
Paweł Antoniuk ◽  
Hyunkook Lee ◽  
Dale Johnson

AbstractOne of the greatest challenges in the development of binaural machine audition systems is the disambiguation between front and back audio sources, particularly in complex spatial audio scenes. The goal of this work was to develop a method for discriminating between front and back located ensembles in binaural recordings of music. To this end, 22, 496 binaural excerpts, representing either front or back located ensembles, were synthesized by convolving multi-track music recordings with 74 sets of head-related transfer functions (HRTF). The discrimination method was developed based on the traditional approach, involving hand-engineering of features, as well as using a deep learning technique incorporating the convolutional neural network (CNN). According to the results obtained under HRTF-dependent test conditions, CNN showed a very high discrimination accuracy (99.4%), slightly outperforming the traditional method. However, under the HRTF-independent test scenario, CNN performed worse than the traditional algorithm, highlighting the importance of testing the algorithms under HRTF-independent conditions and indicating that the traditional method might be more generalizable than CNN. A minimum of 20 HRTFs are required to achieve a satisfactory generalization performance for the traditional algorithm and 30 HRTFs for CNN. The minimum duration of audio excerpts required by both the traditional and CNN-based methods was assessed as 3 s. Feature importance analysis, based on a gradient attribution mapping technique, revealed that for both the traditional and the deep learning methods, a frequency band between 5 and 6 kHz is particularly important in terms of the discrimination between front and back ensemble locations. Linear-frequency cepstral coefficients, interaural level differences, and audio bandwidth were identified as the key descriptors facilitating the discrimination process using the traditional approach.


Author(s):  
Maria Elena Laino ◽  
Elena Generali ◽  
Tobia Tommasini ◽  
Giovanni Angelotti ◽  
Alessio Aghemo ◽  
...  

IntroductionIdentifying SARS-CoV-2 patients at higher risk of mortality is crucial in the management of a pandemic. Artificial intelligence techniques allow to analyze big amount of data to find hidden patterns. We aimed to develop and validate a mortality score at admission for COVID-19 based on high-level machine learning.Material and methodsWe conducted a retrospective cohort study on hospitalized adults COVID-19 patients between March and December 2020. The primary outcome was in-hospital mortality. A machine learning approach on vital parameters, laboratory values, and demographic features was applied to develop different models. Then, a feature importance analysis was performed to reduce the number of variables included in the model, to develop a risk score with good overall performance, that was finally evaluated in terms of discrimination and calibration capabilities. All results underwent cross-validation.Results1,135 consecutive patients (median age 70 years, 64% males) were enrolled, 48 patients were excluded, the cohort was randomly divided in training (760) and test (327). During hospitalization, 251 (22%) patients died. After feature selection, the best performing classifier was random forest (AUC 0.88±0.03). Based on the relative importance of each variable, a pragmatic score was developed, showing good performances (AUC 0.85, ±0.025), and three levels were defined that correlated well with in-hospital mortality.ConclusionsMachine learning techniques were applied in order to develop an accurate in-hospital mortality risk score for COVID-19 based on ten variables. The application of the proposed score has utility in clinical settings to guide the management and prognostication of COVID-19 patients.


Energies ◽  
2022 ◽  
Vol 15 (2) ◽  
pp. 549
Author(s):  
Giuliano Armano ◽  
Paolo Attilio Pegoraro

The design of new monitoring systems for intelligent distribution networks often requires both real-time measurements and pseudomeasurements to be processed. The former are obtained from smart meters, phasor measurement units and smart electronic devices, whereas the latter are predicted using appropriate algorithms—with the typical objective of forecasting the behaviour of power loads and generators. However, depending on the technique used for data encoding, the attempt at making predictions over a period of several days may trigger problems related to the high number of features. To contrast this issue, feature importance analysis becomes a tool of primary importance. This article is aimed at illustrating a technique devised to investigate the importance of features on data deemed relevant for predicting the next hour demand of aggregated, medium-voltage electrical loads. The same technique allows us to inspect the hidden layers of multilayer perceptrons entrusted with making the predictions, since, ultimately, the content of any hidden layer can be seen as an alternative encoding of the input data. The possibility of inspecting hidden layers can give wide support to researchers in a number of relevant tasks, including the appraisal of the generalisation capability reached by a multilayer perceptron and the identification of neurons not relevant for the prediction task.


2022 ◽  
Vol 14 (2) ◽  
pp. 798
Author(s):  
Snezhana Gocheva-Ilieva ◽  
Atanas Ivanov ◽  
Maya Stoimenova-Minova

A novel framework for stacked regression based on machine learning was developed to predict the daily average concentrations of particulate matter (PM10), one of Bulgaria’s primary health concerns. The measurements of nine meteorological parameters were introduced as independent variables. The goal was to carefully study a limited number of initial predictors and extract stochastic information from them to build an extended set of data that allowed the creation of highly efficient predictive models. Four base models using random forest, CART ensemble and bagging, and their rotation variants, were built and evaluated. The heterogeneity of these base models was achieved by introducing five types of diversities, including a new simplified selective ensemble algorithm. The predictions from the four base models were then used as predictors in multivariate adaptive regression splines (MARS) models. All models were statistically tested using out-of-bag or with 5-fold and 10-fold cross-validation. In addition, a variable importance analysis was conducted. The proposed framework was used for short-term forecasting of out-of-sample data for seven days. It was shown that the stacked models outperformed all single base models. An index of agreement IA = 0.986 and a coefficient of determination of about 95% were achieved.


2022 ◽  
Author(s):  
Anthony Onoja ◽  
Nicola Picchiotti ◽  
Chiara Fallerini ◽  
Margherita Baldassarri ◽  
Francesca Fava ◽  
...  

Abstract We employed a multifaceted computational strategy to identify the genetic factors contributing to increased risk of severe COVID-19 infection from a Whole Exome Sequencing (WES) dataset of a cohort of 2000 Italian patients. We coupled a stratified k-fold screening, to rank variants more associated with severity, with training of multiple supervised classifiers, to predict severity on the basis of screened features. Feature importance analysis from tree-based models allowed to identify a handful of 16 variants with highest support which, together with age and gender covariates, were found to be most predictive of COVID-19 severity. When tested on a follow-up cohort, our ensemble of models predicted severity with good accuracy (ACC=81.88%; ROC_AUC=96%; MCC=61.55%). Principal Component Analysis (PCA) and clustering of patients on important variants orthogonally identified two groups of individuals with a higher fraction of severe cases. Our model recapitulated a vast literature of emerging molecular mechanisms and genetic factors linked to COVID-19 response and extends previous landmark Genome Wide Association Studies (GWAS). It revealed a network of interplaying genetic signatures converging on established immune system and inflammatory processes linked to viral infection response, such as JAK-STAT, Cytokine, Interleukin, and C-type lectin receptor signaling. It also identified additional processes cross-talking with immune pathways, such as GPCR signalling, which might offer additional opportunities for therapeutic intervention and patient stratification. Publicly available PheWAS datasets revealed that several variants were significantly associated with phenotypic traits such as “Respiratory or thoracic disease”, confirming their link with COVID-19 severity outcome. Taken together, our analysis suggests that curated genetic information can be effectively integrated along with other patient clinical covariates to forecast COVID-19 disease severity and dissect the underlying host genetic mechanisms for personalized medicine treatments.


2022 ◽  
Vol 14 (1) ◽  
pp. 211
Author(s):  
Pengxiang Zhao ◽  
Zohreh Masoumi ◽  
Maryam Kalantari ◽  
Mahtab Aflaki ◽  
Ali Mansourian

Landslides often cause significant casualties and economic losses, and therefore landslide susceptibility mapping (LSM) has become increasingly urgent and important. The potential of deep learning (DL) like convolutional neural networks (CNN) based on landslide causative factors has not been fully explored yet. The main target of this study is the investigation of a GIS-based LSM in Zanjan, Iran and to explore the most important causative factor of landslides in the case study area. Different machine learning (ML) methods have been employed and compared to select the best results in the case study area. The CNN is compared with four ML algorithms, including random forest (RF), artificial neural network (ANN), support vector machine (SVM), and logistic regression (LR). To do so, sixteen landslide causative factors have been extracted and their related spatial layers have been prepared. Then, the algorithms were trained with related landslide and non-landslide points. The results illustrate that the five ML algorithms performed suitably (precision = 82.43–85.6%, AUC = 0.934–0.967). The RF algorithm achieves the best result, while the CNN, SVM, the ANN, and the LR have the best results after RF, respectively, in this case study. Moreover, variable importance analysis results indicate that slope and topographic curvature contribute more to the prediction. The results would be beneficial to planning strategies for landslide risk management.


2021 ◽  
Vol 12 (1) ◽  
pp. 331
Author(s):  
Szymon Bobek ◽  
Sławomir K. Tadeja ◽  
Łukasz Struski ◽  
Przemysław Stachura ◽  
Timoleon Kipouros ◽  
...  

We present a refinement of the Immersive Parallel Coordinates Plots (IPCP) system for Virtual Reality (VR). The evolved system provides data-science analytics built around a well-known method for visualization of multidimensional datasets in VR. The data-science analytics enhancements consist of importance analysis and a number of clustering algorithms including a novel SuMC (Subspace Memory Clustering) solution. These analytical methods were applied to both the main visualizations and supporting cross-dimensional scatter plots. They automate part of the analytical work that in the previous version of IPCP had to be done by an expert. We test the refined system with two sample datasets that represent the optimum solutions of two different multi-objective optimization studies in turbomachinery. The first one describes 54 data items with 29 dimensions (DS1), and the second 166 data items with 39 dimensions (DS2). We include the details of these methods as well as the reasoning behind selecting some methods over others.


2021 ◽  
Vol 1 (1) ◽  
pp. 1-21
Author(s):  
Athanasios Arvanitis ◽  
◽  
Irini Furxhi ◽  
Thomas Tasioulis ◽  
Konstantinos Karatzas ◽  
...  

This paper demonstrates how a short-term prediction of the effective reproduction number (Rt) of COVID-19 in regions of Greece is achieved based on online mobility data. Various machine learning methods are applied to predict Rt and attribute importance analysis is performed to reveal the most important variables that affect the accurate prediction of Rt. Work and Park categories are identified as the most important mobility features when compared to the other attributes, with values of 0.25 and 0.24, respectively. Our results are based on an ensemble of diverse Rt methodologies to provide non-precautious and non-indulgent predictions. Random Forest algorithm achieved the highest R2 (0.8 approximately), Pearson’s and Spearman’s correlation values close to 0.9, outperforming in all metrics the other models. The model demonstrates robust results and the methodology overall represents a promising approach towards COVID-19 outbreak prediction. This paper can help health-related authorities when deciding on non-nosocomial interventions to prevent the spread of COVID-19.


Sign in / Sign up

Export Citation Format

Share Document