Cloud Detection Using an Ensemble of Pixel-Based Machine Learning Models Incorporating Unsupervised Classification

Xiaohe Yu; David J. Lary

doi:10.3390/rs13163289

Cloud Detection Using an Ensemble of Pixel-Based Machine Learning Models Incorporating Unsupervised Classification

Remote Sensing ◽

10.3390/rs13163289 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3289

Author(s):

Xiaohe Yu ◽

David J. Lary

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

The United States ◽

Classification Model ◽

Omission Error ◽

Validation Dataset ◽

Commission Error ◽

Learning Models ◽

Cloud Classification ◽

Machine Learning Models

Remote sensing imagery, such as that provided by the United States Geological Survey (USGS) Landsat satellites, has been widely used to study environmental protection, hazard analysis, and urban planning for decades. Clouds are a constant challenge for such imagery and, if not handled correctly, can cause a variety of issues for a wide range of remote sensing analyses. Typically, cloud mask algorithms use the entire image; in this study we present an ensemble of different pixel-based approaches to cloud pixel modeling. Based on four training subsets with a selection of different input features, 12 machine learning models were created. We evaluated these models using the cropped LC8-Biome cloud validation dataset. As a comparison, Fmask was also applied to the cropped scene Biome dataset. One goal of this research is to explore a machine learning modeling approach that uses as small a training data sample as possible but still provides an accurate model. Overall, the model trained on the sample subset (1.3% of the total training samples) that includes unsupervised Self-Organizing Map classification results as an input feature has the best performance. The approach achieves 98.57% overall accuracy, 1.18% cloud omission error, and 0.93% cloud commission error on the 88 cropped test images. By comparison to Fmask 4.0, this model improves the accuracy by 10.12% and reduces the cloud omission error by 6.39%. Furthermore, using an additional eight independent validation images that were not sampled in model training, the model trained on the second largest subset with an additional five features has the highest overall accuracy at 86.35%, with 12.48% cloud omission error and 7.96% cloud commission error. This model’s overall correctness increased by 3.26%, and the cloud omission error decreased by 1.28% compared to Fmask 4.0. The machine learning cloud classification models discussed in this paper could achieve very good performance utilizing only a small portion of the total training pixels available. We showed that a pixel-based cloud classification model, and that as each scene obviously has unique spectral characteristics, and having a small portion of example pixels from each of the sub-regions in a scene can improve the model accuracy significantly.

Download Full-text

Monitoring the Foliar Nutrients Status of Mango Using Spectroscopy-Based Spectral Indices and PLSR-Combined Machine Learning Models

Remote Sensing ◽

10.3390/rs13040641 ◽

2021 ◽

Vol 13 (4) ◽

pp. 641

Author(s):

Gopal Ramdas Mahajan ◽

Bappa Das ◽

Dayesh Murgaokar ◽

Ittai Herrmann ◽

Katja Berger ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Support Vector ◽

Spectral Indices ◽

Learning Models ◽

Leaf Nutrients ◽

Machine Learning Models

Conventional methods of plant nutrient estimation for nutrient management need a huge number of leaf or tissue samples and extensive chemical analysis, which is time-consuming and expensive. Remote sensing is a viable tool to estimate the plant’s nutritional status to determine the appropriate amounts of fertilizer inputs. The aim of the study was to use remote sensing to characterize the foliar nutrient status of mango through the development of spectral indices, multivariate analysis, chemometrics, and machine learning modeling of the spectral data. A spectral database within the 350–1050 nm wavelength range of the leaf samples and leaf nutrients were analyzed for the development of spectral indices and multivariate model development. The normalized difference and ratio spectral indices and multivariate models–partial least square regression (PLSR), principal component regression, and support vector regression (SVR) were ineffective in predicting any of the leaf nutrients. An approach of using PLSR-combined machine learning models was found to be the best to predict most of the nutrients. Based on the independent validation performance and summed ranks, the best performing models were cubist (R2 ≥ 0.91, the ratio of performance to deviation (RPD) ≥ 3.3, and the ratio of performance to interquartile distance (RPIQ) ≥ 3.71) for nitrogen, phosphorus, potassium, and zinc, SVR (R2 ≥ 0.88, RPD ≥ 2.73, RPIQ ≥ 3.31) for calcium, iron, copper, boron, and elastic net (R2 ≥ 0.95, RPD ≥ 4.47, RPIQ ≥ 6.11) for magnesium and sulfur. The results of the study revealed the potential of using hyperspectral remote sensing data for non-destructive estimation of mango leaf macro- and micro-nutrients. The developed approach is suggested to be employed within operational retrieval workflows for precision management of mango orchard nutrients.

Download Full-text

First-Break Picking Classification Models Using Recurrent Neural Network

10.2118/204862-ms ◽

2021 ◽

Author(s):

Mohammed Ayub ◽

SanLinn Kaka

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Neural Network ◽

Contextual Information ◽

Classification Model ◽

Superior Performance ◽

Learning Models ◽

Neural Network Models ◽

Minimum Number ◽

Machine Learning Models

Abstract Manual first-break picking from a large volume of seismic data is extremely tedious and costly. Deployment of machine learning models makes the process fast and cost effective. However, these machine learning models require high representative and effective features for accurate automatic picking. Therefore, First- Break (FB) picking classification model that uses effective minimum number of features and promises performance efficiency is proposed. The variants of Recurrent Neural Networks (RNNs) such as Long ShortTerm Memory (LSTM) and Gated Recurrent Unit (GRU) can retain contextual information from long previous time steps. We deploy this advantage for FB picking as seismic traces are amplitude values of vibration along the time-axis. We use behavioral fluctuation of amplitude as input features for LSTM and GRU. The models are trained on noisy data and tested for generalization on original traces not seen during the training and validation process. In order to analyze the real-time suitability, the performance is benchmarked using accuracy, F1-measure and three other established metrics. We have trained two RNN models and two deep Neural Network models for FB classification using only amplitude values as features. Both LSTM and GRU have the accuracy and F1-measure with a score of 94.20%. With the same features, Convolutional Neural Network (CNN) has an accuracy of 93.58% and F1-score of 93.63%. Again, Deep Neural Network (DNN) model has scores of 92.83% and 92.59% as accuracy and F1-measure, respectively. From the pexperiment results, we see significant superior performance of LSTM and GRU to CNN and DNN when used the same features. For robustness of LSTM and GRU models, the performance is compared with DNN model that is trained using nine features derived from seismic traces and observed that the performance superiority of RNN models. Therefore, it is safe to conclude that RNN models (LSTM and GRU) are capable of classifying the FB events efficiently even by using a minimum number of features that are not computationally expensive. The novelty of our work is the capability of automatic FB classification with the RNN models that incorporate contextual behavioral information without the need for sophisticated feature extraction or engineering techniques that in turn can help in reducing the cost and fostering classification model robust and faster.

Download Full-text

Canopy Height Estimation Using Sentinel Series Images through Machine Learning Models in a Mangrove Forest

Remote Sensing ◽

10.3390/rs12091519 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1519 ◽

Cited By ~ 3

Author(s):

Sujit Madhab Ghosh ◽

Mukunda Dev Behera ◽

Somnath Paramanik

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Forest Canopy ◽

Canopy Height ◽

Series Data ◽

Learning Models ◽

Biophysical Parameters ◽

Height Estimation ◽

Area Index ◽

Machine Learning Models

Canopy height serves as a good indicator of forest carbon content. Remote sensing-based direct estimations of canopy height are usually based on Light Detection and Ranging (LiDAR) or Synthetic Aperture Radar (SAR) interferometric data. LiDAR data is scarcely available for the Indian tropics, while Interferometric SAR data from commercial satellites are costly. High temporal decorrelation makes freely available Sentinel-1 interferometric data mostly unsuitable for tropical forests. Alternatively, other remote sensing and biophysical parameters have shown good correlation with forest canopy height. The study objective was to establish and validate a methodology by which forest canopy height can be estimated from SAR and optical remote sensing data using machine learning models i.e., Random Forest (RF) and Symbolic Regression (SR). Here, we analysed the potential of Sentinel-1 interferometric coherence and Sentinel-2 biophysical parameters to propose a new method for estimating canopy height in the study site of the Bhitarkanika wildlife sanctuary, which has mangrove forests. The results showed that interferometric coherence, and biophysical variables (Leaf Area Index (LAI) and Fraction of Vegetation Cover (FVC)) have reasonable correlation with canopy height. The RF model showed a Root Mean Squared Error (RMSE) of 1.57 m and R2 value of 0.60 between observed and predicted canopy heights; whereas, the SR model through genetic programming demonstrated better RMSE and R2 values of 1.48 and 0.62 m, respectively. The SR also established an interpretable model, which is not possible via any other machine learning algorithms. The FVC was found to be an essential variable for predicting forest canopy height. The canopy height maps correlated with ICESat-2 estimated canopy height, albeit modestly. The study demonstrated the effectiveness of Sentinel series data and the machine learning models in predicting canopy height. Therefore, in the absence of commercial and rare data sources, the methodology demonstrated here offers a plausible alternative for forest canopy height estimation.

Download Full-text

CPT Data Interpretation Employing Different Machine Learning Techniques

Geosciences ◽

10.3390/geosciences11070265 ◽

2021 ◽

Vol 11 (7) ◽

pp. 265

Author(s):

Stefan Rauter ◽

Franz Tschuchnigg

Keyword(s):

Machine Learning ◽

Grain Size ◽

Random Forest ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Cone Penetration ◽

Tip Resistance ◽

Machine Learning Models

The classification of soils into categories with a similar range of properties is a fundamental geotechnical engineering procedure. At present, this classification is based on various types of cost- and time-intensive laboratory and/or in situ tests. These soil investigations are essential for each individual construction site and have to be performed prior to the design of a project. Since Machine Learning could play a key role in reducing the costs and time needed for a suitable site investigation program, the basic ability of Machine Learning models to classify soils from Cone Penetration Tests (CPT) is evaluated. To find an appropriate classification model, 24 different Machine Learning models, based on three different algorithms, are built and trained on a dataset consisting of 1339 CPT. The applied algorithms are a Support Vector Machine, an Artificial Neural Network and a Random Forest. As input features, different combinations of direct cone penetration test data (tip resistance qc, sleeve friction fs, friction ratio Rf, depth d), combined with “defined”, thus, not directly measured data (total vertical stresses σv, effective vertical stresses σ’v and hydrostatic pore pressure u0), are used. Standard soil classes based on grain size distributions and soil classes based on soil behavior types according to Robertson are applied as targets. The different models are compared with respect to their prediction performance and the required learning time. The best results for all targets were obtained with models using a Random Forest classifier. For the soil classes based on grain size distribution, an accuracy of about 75%, and for soil classes according to Robertson, an accuracy of about 97–99%, was reached.

Download Full-text

Diagnosis of Problems in Truck Ore Transport Operations in Underground Mines Using Various Machine Learning Models and Data Collected by Internet of Things Systems

Minerals ◽

10.3390/min11101128 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1128

Author(s):

Sebeom Park ◽

Dahee Jung ◽

Hoang Nguyen ◽

Yosoon Choi

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Production Management ◽

Classification And Regression Tree ◽

Underground Mines ◽

Validation Dataset ◽

Support Vector ◽

Learning Models ◽

K Nearest Neighbor ◽

Machine Learning Models

This study proposes a method for diagnosing problems in truck ore transport operations in underground mines using four machine learning models (i.e., Gaussian naïve Bayes (GNB), k-nearest neighbor (kNN), support vector machine (SVM), and classification and regression tree (CART)) and data collected by an Internet of Things system. A limestone underground mine with an applied mine production management system (using a tablet computer and Bluetooth beacon) is selected as the research area, and log data related to the truck travel time are collected. The machine learning models are trained and verified using the collected data, and grid search through 5-fold cross-validation is performed to improve the prediction accuracy of the models. The accuracy of CART is highest when the parameters leaf and split are set to 1 and 4, respectively (94.1%). In the validation of the machine learning models performed using the validation dataset (1500), the accuracy of the CART was 94.6%, and the precision and recall were 93.5% and 95.7%, respectively. In addition, it is confirmed that the F1 score reaches values as high as 94.6%. Through field application and analysis, it is confirmed that the proposed CART model can be utilized as a tool for monitoring and diagnosing the status of truck ore transport operations.

Download Full-text

Machine Learning Models on ADC Features to Assess Brain Changes of Children With Pierre Robin Sequence

Frontiers in Neurology ◽

10.3389/fneur.2021.580440 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ying Wang ◽

Feng Yang ◽

Meijiao Zhu ◽

Ming Yang

Keyword(s):

Machine Learning ◽

Validation Dataset ◽

Support Vector ◽

Control Group ◽

Pierre Robin Sequence ◽

Learning Models ◽

Robin Sequence ◽

Brain Changes ◽

Pierre Robin ◽

Machine Learning Models

In order to evaluate brain changes in young children with Pierre Robin sequence (PRs) using machine learning based on apparent diffusion coefficient (ADC) features, we retrospectively enrolled a total of 60 cases (42 in the training dataset and 18 in the testing dataset) which included 30 PRs and 30 controls from the Children's Hospital Affiliated to the Nanjing Medical University from January 2017–December 2019. There were 21 and nine PRs cases in each dataset, with the remainder belonging to the control group in the same age range. A total of 105 ADC features were extracted from magnetic resonance imaging (MRI) data. Features were pruned using least absolute shrinkage and selection operator (LASSO) regression and seven ADC features were developed as the optimal signatures for training machine learning models. Support vector machine (SVM) achieved an area under the receiver operating characteristic curve (AUC) of 0.99 for the training set and 0.85 for the testing set. The AUC of the multivariable logistic regression (MLR) and the AdaBoost for the training and validation dataset were 0.98/0.84 and 0.94/0.69, respectively. Based on the ADC features, the two groups of cases (i.e., the PRs group and the control group) could be well-distinguished by the machine learning models, indicating that there is a significant difference in brain development between children with PRs and normal controls.

Download Full-text

A comparison of machine learning models versus clinical evaluation for mortality prediction in patients with sepsis

10.1101/2020.11.24.20237636 ◽

2020 ◽

Author(s):

William P.T.M. van Doorn ◽

Patricia M. Stassen ◽

Hella F. Borggreve ◽

Maaike J. Schalkwijk ◽

Judith Stoffers ◽

...

Keyword(s):

Internal Medicine ◽

Machine Learning ◽

Emergency Department ◽

Risk Stratification ◽

Roc Curve ◽

Risk Scores ◽

Validation Dataset ◽

Learning Models ◽

Clinical Risk ◽

Machine Learning Models

AbstractIntroductionPatients with sepsis who present to an emergency department (ED) have highly variable underlying disease severity, and can be categorized from low to high risk. Development of a risk stratification tool for these patients is important for appropriate triage and early treatment. The aim of this study was to develop machine learning models predicting 31-day mortality in patients presenting to the ED with sepsis and to compare these to internal medicine physicians and clinical risk scores.MethodsA single-center, retrospective cohort study was conducted amongst 1,344 emergency department patients fulfilling sepsis criteria. Laboratory and clinical data that was available in the first two hours of presentation from these patients were randomly partitioned into a development (n=1,244) and validation dataset (n=100). Machine learning models were trained and evaluated on the development dataset and compared to internal medicine physicians and risk scores in the independent validation dataset. The primary outcome was 31-day mortality.ResultsA number of 1,344 patients were included of whom 174 (13.0%) died. Machine learning models trained with laboratory or a combination of laboratory + clinical data achieved an area-under-the ROC curve of 0.82 (95% CI: 0.80-0.84) and 0.84 (95% CI: 0.81-0.87) for predicting 31-day mortality, respectively. In the validation set, models outperformed internal medicine physicians and clinical risk scores in sensitivity (92% vs. 72% vs. 78%;p<0.001,all comparisons) while retaining comparable specificity (78% vs. 74% vs. 72%;p>0.02). The model had higher diagnostic accuracy with an area-under-the-ROC curve of 0.85 (95%CI: 0.78-0.92) compared to abbMEDS (0.63,0.54-0.73), mREMS (0.63,0.54-0.72) and internal medicine physicians (0.74,0.65-0.82).ConclusionMachine learning models outperformed internal medicine physicians and clinical risk scores in predicting 31-day mortality. These models are a promising tool to aid in risk stratification of patients presenting to the ED with sepsis.

Download Full-text

Support Vector Machine And K-Nearest Neighbor Based Liver Disease Classification Model

Indonesian Journal of electronics, electromedical engineering, and medical informatics ◽

10.35882/ijeeemi.v3i1.2 ◽

2021 ◽

Vol 3 (1) ◽

pp. 9-14

Author(s):

Tsehay Admassu Assegie

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Liver Disease ◽

Classification Model ◽

Support Vector ◽

Disease Prediction ◽

Accuracy Score ◽

Learning Models ◽

Accuracy And Precision ◽

Machine Learning Models

Machine-learning approaches have become greatly applicable in disease diagnosis and prediction process. This is because of the accuracy and better precision of the machine learning models in disease prediction. However, different machine learning models have different accuracy and precision on disease prediction. Selecting the better model that would result in better disease prediction accuracy and precision is an open research problem. In this study, we have proposed machine learning model for liver disease prediction using Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) learning algorithms and we have evaluated the accuracy and precision of the models on liver disease prediction using the Indian liver disease data repository. The analysis of result showed 82.90% accuracy for SVM and 72.64% accuracy for the KNN algorithm. Based on the accuracy score of SVM and KNN on experimental test results, the SVM is better in performance on the liver disease prediction than the KNN algorithm.

Download Full-text

Mapping of soil organic carbon using machine learning models: Combination of optical and radar remote sensing data

Soil Science Society of America Journal ◽

10.1002/saj2.20371 ◽

2021 ◽

Author(s):

Yang Zhou ◽

Xiaomin Zhao ◽

Xi Guo ◽

Yi Li

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Remote Sensing Data ◽

Learning Models ◽

Radar Remote Sensing ◽

Sensing Data ◽

Machine Learning Models

Download Full-text

Application of statistical and machine learning models for grassland yield estimation based on a hypertemporal satellite remote sensing time series

2014 IEEE Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2014.6947634 ◽

2014 ◽

Cited By ~ 8

Author(s):

Iftikhar Ali ◽

Fiona Cawkwell ◽

Stuart Green ◽

Ned Dwyer

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Time Series ◽

Satellite Remote Sensing ◽

Learning Models ◽

Yield Estimation ◽

Sensing Time ◽

Machine Learning Models

Download Full-text