Classification and photometric redshift estimation of quasars in photometric surveys

2020 ◽  
Vol 15 (S359) ◽  
pp. 40-41
Author(s):  
L. M. Izuti Nakazono ◽  
C. Mendes de Oliveira ◽  
N. S. T. Hirata ◽  
S. Jeram ◽  
A. Gonzalez ◽  
...  

AbstractWe present a machine learning methodology to separate quasars from galaxies and stars using data from S-PLUS in the Stripe-82 region. In terms of quasar classification, we achieved 95.49% for precision and 95.26% for recall using a Random Forest algorithm. For photometric redshift estimation, we obtained a precision of 6% using k-Nearest Neighbour.

2021 ◽  
Author(s):  
Adrián García Bruzón ◽  
Patricia Arrogante Funes ◽  
Laura Muñoz Moral

<p>The climate change has turned out to be a determining factor in the development of forest in Spain. Production systems have emitted polluting gases and other particles into the atmosphere, for which some plants have not yet developed adaptation systems. Among the most harmful pollutants for the environment are gases such as nitrous oxides, ozone, particulate matter.</p><p>However, this condition is not the same in Peninsular Spain, and the Balearic Islands since the plant compositions differ in the territory and the bioclimatic, topographic, and anthropic characteristics. Monitoring the vegetation with sufficient spatial and temporal resolution, studying variables conditioning plant health is a challenge from the nature of the variables and the amount of data to be handled. </p><p>The Mediterranean forest is one of the most ecosystem affected by climate change because of usually experimented long periods of drought that, in combination with increased temperatures, can drastically reduce the photosynthetic activity of trees and therefore the biomass of forests.</p><p>That is why the application of environmental technologies based on Remote Sensing (which provide plant health indices from passive sensors on satellite platforms and other variables of interest), Geographic Information Systems (to integrate, process, analyze spatial and temporal data) and machine learning models (which facilitate the extraction of relationships between variables, conditioning factors and predict patterns). </p><p>In this regard, this work's objective is to evaluate the possible effect that different pollutants have on the health of the vegetation, measured from the annual values of the Normalized Difference Vegetation Index (NDVI), in the Mediterranean forests of Peninsular Spain. To achieve this, we are used machine learning techniques using the Random Forest algorithm. The study has also been done with various climatic, topographic, and anthropic variables that characterize the forest to carry it out. </p><p>The results showed that certain variables such as the aridity index had generated the NDVI values and therefore plant development, while others are limiting factors such as the concentration of certain pollutants and the direct relationship between them particulates and NOx. This study can verify how the Random Forest algorithm offers reliable results, even when working with heterogeneous variables. </p>


2020 ◽  
Vol 32 (1) ◽  
pp. 39-53
Author(s):  
Dalia Shanshal ◽  
Ceni Babaoglu ◽  
Ayşe Başar

Traffic-related deaths and severe injuries may affect every person on the roads, whether driving, cycling or walking. Toronto, the largest city in Canada and the fourth largest in North America, aims to eliminate traffic-related fatalities and serious injuries on city streets. The aim of this study is to build a prediction model using data analytics and machine learning techniques that learn from past patterns, providing additional data-driven decision support for strategic planning. A detailed exploratory analysis is presented, investigating the relationship between the variables and factors affecting collisions in Toronto. A learning-based model is proposed to predict the fatalities and severe injuries in traffic collisions through a comparison of two predictive models: Lasso Regression and Random Forest. Exploratory data analysis results reveal both spatio-temporal and behavioural patterns such as the prevalence of collisions in intersections, in the spring and summer and aggressive driving and inattentive behaviours in drivers. The prediction results show that the best predictor of injury severity for drivers, cyclists and pedestrians is Random Forest with an accuracy of 0.80, 0.89, and 0.80, respectively. The proposed methods demonstrate the effectiveness of machine learning application to traffic and collision data, both for exploratory and predictive analytics.


2019 ◽  
Vol 20 (S2) ◽  
Author(s):  
Varun Khanna ◽  
Lei Li ◽  
Johnson Fung ◽  
Shoba Ranganathan ◽  
Nikolai Petrovsky

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.


The paper points out forest fire prediction using machine learning models on the basis of viz. DC, Wind, RH out of the several machine learning classifier algorithms, It is relevant that random forest algorithm generates optimum accuracy(99.61%).


2021 ◽  
Author(s):  
Yiqi Jack Gao ◽  
Yu Sun

The start of 2020 marked the beginning of the deadly COVID-19 pandemic caused by the novel SARS-COV-2 from Wuhan, China. As of the time of writing, the virus had infected over 150 million people worldwide and resulted in more than 3.5 million global deaths. Accurate future predictions made through machine learning algorithms can be very useful as a guide for hospitals and policy makers to make adequate preparations and enact effective policies to combat the pandemic. This paper carries out a two pronged approach to analyzing COVID-19. First, the model utilizes the feature significance of random forest regressor to select eight of the most significant predictors (date, new tests, weekly hospital admissions, population density, total tests, total deaths, location, and total cases) for predicting daily increases of Covid-19 cases, highlighting potential target areas in order to achieve efficient pandemic responses. Then it utilizes machine learning algorithms such as linear regression, polynomial regression, and random forest regression to make accurate predictions of daily COVID-19 cases using a combination of this diverse range of predictors and proved to be competent at generating predictions with reasonable accuracy.


2021 ◽  
Vol 8 ◽  
Author(s):  
Guan Wang ◽  
Yanbo Zhang ◽  
Sijin Li ◽  
Jun Zhang ◽  
Dongkui Jiang ◽  
...  

Objective: Preeclampsia affects 2–8% of women and doubles the risk of cardiovascular disease in women after preeclampsia. This study aimed to develop a model based on machine learning to predict postpartum cardiovascular risk in preeclamptic women.Methods: Collecting demographic characteristics and clinical serum markers associated with preeclampsia during pregnancy of 907 preeclamptic women retrospectively, we predicted the cardiovascular risk (ischemic heart disease, ischemic cerebrovascular disease, peripheral vascular disease, chronic kidney disease, metabolic system disease or arterial hypertension). The study samples were divided into training sets and test sets randomly in the ratio of 8:2. The prediction model was developed by 5 different machine learning algorithms, including Random Forest. 10-fold cross-validation was performed on the training set, and the performance of the model was evaluated on the test set.Results: Cardiovascular disease risk occurred in 186 (20.5%) of these women. By weighing area under the curve (AUC), the Random Forest algorithm presented the best performance (AUC = 0.711[95%CI: 0.697–0.726]) and was adopted in the feature selection and the establishment of the prediction model. The most important variables in Random Forest algorithm included the systolic blood pressure, Urea nitrogen, neutrophil count, glucose, and D-Dimer. Random Forest algorithm was well calibrated (Brier score = 0.133) in the test group, and obtained the highest net benefit in the decision curve analysis.Conclusion: Based on the general situation of patients and clinical variables, a new machine learning algorithm was developed and verified for the individualized prediction of cardiovascular risk in post-preeclamptic women.


2020 ◽  
Vol 8 (6) ◽  
pp. 1623-1630

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0260195
Author(s):  
Marcelo Dantas Tavares de Melo ◽  
Jose de Arimatéia Batista Araujo-Filho ◽  
José Raimundo Barbosa ◽  
Camila Rocon ◽  
Carlos Danilo Miranda Regis ◽  
...  

Aims Noncompaction cardiomyopathy (NCC) is considered a genetic cardiomyopathy with unknown pathophysiological mechanisms. We propose to evaluate echocardiographic predictors for rigid body rotation (RBR) in NCC using a machine learning (ML) based model. Methods and results Forty-nine outpatients with NCC diagnosis by echocardiography and magnetic resonance imaging (21 men, 42.8±14.8 years) were included. A comprehensive echocardiogram was performed. The layer-specific strain was analyzed from the apical two-, three, four-chamber views, short axis, and focused right ventricle views using 2D echocardiography (2DE) software. RBR was present in 44.9% of patients, and this group presented increased LV mass indexed (118±43.4 vs. 94.1±27.1g/m2, P = 0.034), LV end-diastolic and end-systolic volumes (P< 0.001), E/e’ (12.2±8.68 vs. 7.69±3.13, P = 0.034), and decreased LV ejection fraction (40.7±8.71 vs. 58.9±8.76%, P < 0.001) when compared to patients without RBR. Also, patients with RBR presented a significant decrease of global longitudinal, radial, and circumferential strain. When ML model based on a random forest algorithm and a neural network model was applied, it found that twist, NC/C, torsion, LV ejection fraction, and diastolic dysfunction are the strongest predictors to RBR with accuracy, sensitivity, specificity, area under the curve of 0.93, 0.99, 0.80, and 0.88, respectively. Conclusion In this study, a random forest algorithm was capable of selecting the best echocardiographic predictors to RBR pattern in NCC patients, which was consistent with worse systolic, diastolic, and myocardium deformation indices. Prospective studies are warranted to evaluate the role of this tool for NCC risk stratification.


Author(s):  
Laura Fragoso-Campón ◽  
Pablo Durán-Barroso ◽  
Elia Rosado

Water resource management in ungauged catchments is complex due to the uncertainties around the hydrological parameters that dominate the streamflow behaviour. These parameters are usually defined by regionalization approaches in which hydrological response patterns are transferred from gauged to ungauged basins. Regression-based methods using physical properties derived from cartographic data sources are widely used. The current remote sensing techniques offer us new standpoints in regionalisation processing since the hydrological response depends on the physical attributes related to the spectral responses of the territory. Moreover, machine learning approaches have not been specifically applied to the regionalization of hydrologic parameters. This work studies the capability of a catchment’s spectral response based on Sentinel-1 and Sentinel-2 data to address a regression-based regionalization of hydrological parameters using a machine learning approach. Hydrological modelling was conducted by the HBV-light model. We tested the random forest algorithm in several regionalization scenarios: the new approach using the catchments’ spectral signature, the traditional method using physical properties and a fusion of them. The calibration results were excellent (median KGE = 0.83), and the regionalized parameters obtained with the random forest algorithm achieved good performance in which the three scenarios showed almost the same goodness of fit (median KGE = 0.45 to 0.50). We found that the effectiveness depends on the climatic environment and that predictions in humid catchments exhibited better performance than those in the driest catchments. The physical approach (median KGE= 0.71) exhibited better performance than the spectral approach (median KGE= 0.64) in humid catchments, whereas spectral regionalization (median KGE= 0.33) outperformed the physical scenario in the driest catchments (median KGE= 0.25). Herein, our results confirm that regionalization is still challenging in Mediterranean climate variants where the new spectral approach showed promising results and time series of satellite data could improve seasonal regionalization methodologies.


Sign in / Sign up

Export Citation Format

Share Document