Meta-Tree Random Forest: Probabilistic Data-Generative Model and Bayes Optimal Prediction

Nao Dobashi; Shota Saito; Yuta Nakahara; Toshiyasu Matsushima

doi:10.3390/e23060768

Meta-Tree Random Forest: Probabilistic Data-Generative Model and Bayes Optimal Prediction

Entropy ◽

10.3390/e23060768 ◽

2021 ◽

Vol 23 (6) ◽

pp. 768

Author(s):

Nao Dobashi ◽

Shota Saito ◽

Yuta Nakahara ◽

Toshiyasu Matsushima

Keyword(s):

Random Forest ◽

Decision Theory ◽

Explanatory Variable ◽

Maximum Depth ◽

Training Dataset ◽

Optimal Prediction ◽

Model Trees ◽

Model Tree ◽

Bayes Decision Theory ◽

Candidate Set

This paper deals with a prediction problem of a new targeting variable corresponding to a new explanatory variable given a training dataset. To predict the targeting variable, we consider a model tree, which is used to represent a conditional probabilistic structure of a targeting variable given an explanatory variable, and discuss statistical optimality for prediction based on the Bayes decision theory. The optimal prediction based on the Bayes decision theory is given by weighting all the model trees in the model tree candidate set, where the model tree candidate set is a set of model trees in which the true model tree is assumed to be included. Because the number of all the model trees in the model tree candidate set increases exponentially according to the maximum depth of model trees, the computational complexity of weighting them increases exponentially according to the maximum depth of model trees. To solve this issue, we introduce a notion of meta-tree and propose an algorithm called MTRF (Meta-Tree Random Forest) by using multiple meta-trees. Theoretical and experimental analyses of the MTRF show the superiority of the MTRF to previous decision tree-based algorithms.

Download Full-text

Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning within Google Earth Engine

Remote Sensing ◽

10.3390/rs13010010 ◽

2020 ◽

Vol 13 (1) ◽

pp. 10

Author(s):

Andrea Sulova ◽

Jamal Jokar Arsanjani

Keyword(s):

Climate Change ◽

Machine Learning ◽

Random Forest ◽

Google Earth ◽

Summer Season ◽

Driving Factors ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Training Dataset ◽

Google Earth Engine

Recent studies have suggested that due to climate change, the number of wildfires across the globe have been increasing and continue to grow even more. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. Hence, the main objective of this study was to develop an automatized and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available remote sensing data with a reasonable computational expense for injecting into machine learning models. As a result, a data-driven model was set up in Google Earth Engine platform, which is publicly accessible and open for further adjustments. The training dataset was applied to different machine learning algorithms, i.e., Random Forest, Naïve Bayes, and Classification and Regression Tree. The findings show that Random Forest outperformed other algorithms and hence it was used further to explore the driving factors using variable importance analysis. The study indicates the probability of fire occurrences across Australia as well as identifies the potential driving factors of Australian wildfires for the 2019–2020 summer season. The methodical approach and achieved results and drawn conclusions can be of great importance to policymakers, environmentalists, and climate change researchers, among others.

Download Full-text

Data-Driven Classifier for Extreme Outage Prediction Based on Bayes Decision Theory

IEEE Transactions on Power Systems ◽

10.1109/tpwrs.2021.3086031 ◽

2021 ◽

pp. 1-1

Author(s):

Mohammad Shahidehpour ◽

Mostafa Mohammadian ◽

Farrokh Aminifar ◽

Nima Amjady

Keyword(s):

Decision Theory ◽

Data Driven ◽

Bayes Decision Theory

Download Full-text

Model selection and fault detection approach based on Bayes decision theory: Application to changes detection problem in a distillation column

Process Safety and Environmental Protection ◽

10.1016/j.psep.2013.02.004 ◽

2014 ◽

Vol 92 (3) ◽

pp. 215-223 ◽

Cited By ~ 21

Author(s):

Yahya Chetouani

Keyword(s):

Model Selection ◽

Fault Detection ◽

Decision Theory ◽

Distillation Column ◽

Detection Problem ◽

Detection Approach ◽

Theory Application ◽

Changes Detection ◽

Bayes Decision Theory

Download Full-text

Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model

Biogeosciences Discussions ◽

10.5194/bgd-6-5271-2009 ◽

2009 ◽

Vol 6 (3) ◽

pp. 5271-5304 ◽

Cited By ~ 22

Author(s):

M. Jung ◽

M. Reichstein ◽

A. Bondeau

Keyword(s):

Eddy Covariance ◽

Learning Algorithm ◽

Gross Primary Production ◽

Global Network ◽

Data Set ◽

Model Trees ◽

Ensemble Approach ◽

Model Tree ◽

Biosphere Model ◽

Variance Explained

Abstract. Global, spatially and temporally explicit estimates of carbon and water fluxes derived from empirical up-scaling eddy covariance measurements would constitute a new and possibly powerful data stream to study the variability of the global terrestrial carbon and water cycle. This paper introduces and validates a machine learning approach dedicated to the upscaling of observations from the current global network of eddy covariance towers (FLUXNET). We present a new model TRee Induction ALgorithm (TRIAL) that performs hierarchical stratification of the data set into units where particular multiple regressions for a target variable hold. We propose an ensemble approach (Evolving tRees with RandOm gRowth, ERROR) where the base learning algorithm is perturbed in order to gain a diverse sequence of different model trees which evolves over time. We evaluate the efficiency of the model tree ensemble approach using an artificial data set derived from the the Lund-Potsdam-Jena managed Land (LPJmL) biosphere model. We aim at reproducing global monthly gross primary production as simulated by LPJmL from 1998–2005 using only locations and months where high quality FLUXNET data exist for the training of the model trees. The model trees are trained with the LPJmL land cover and meteorological input data, climate data, and the fraction of absorbed photosynthetic active radiation simulated by LPJmL. Given that we know the "true result" in the form of global LPJmL simulations we can effectively study the performance of the model tree ensemble upscaling and associated problems of extrapolation capacity. We show that the model tree ensemble is able to explain 92% of the variability of the global LPJmL GPP simulations. The mean spatial pattern and the seasonal variability of GPP that constitute the largest sources of variance are very well reproduced (96% and 94% of variance explained respectively) while the monthly interannual anomalies which occupy much less variance are less well matched (41% of variance explained). We demonstrate the substantially improved accuracy of the model tree ensemble over individual model trees in particular for the monthly anomalies and for situations of extrapolation. We estimate that roughly one fifth of the domain is subject to extrapolation while the model tree ensemble is still able to reproduce 73% of the LPJmL GPP variability here. This paper presents for the first time a benchmark for a global FLUXNET upscaling approach that will be employed in future studies. Although the real world FLUXNET upscaling is more complicated than for a noise free and reduced complexity biosphere model as presented here, our results show that an empirical upscaling from the current FLUXNET network with a model tree ensemble is feasible and able to extract global patterns of carbon flux variability.

Download Full-text

A class of distortionless codes designed by Bayes decision theory

IEEE Transactions on Information Theory ◽

10.1109/18.133247 ◽

1991 ◽

Vol 37 (5) ◽

pp. 1288-1293 ◽

Cited By ~ 24

Author(s):

T. Matsushima ◽

H. Inazumi ◽

S. Hirasawa

Keyword(s):

Decision Theory ◽

Bayes Decision Theory

Download Full-text

Classifiers Based on Bayes Decision Theory

Pattern Recognition ◽

10.1016/b978-1-59749-272-0.50004-9 ◽

2009 ◽

pp. 13-89 ◽

Cited By ~ 8

Author(s):

Sergios Theodoridis ◽

Konstantinos Koutroumbas

Keyword(s):

Decision Theory ◽

Bayes Decision Theory

Download Full-text

Geomorphometric Methods for Burial Mound Recognition and Extraction from High-Resolution LiDAR DEMs

Sensors ◽

10.3390/s20041192 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1192 ◽

Cited By ~ 2

Author(s):

Mihai Niculiță

Keyword(s):

High Resolution ◽

Random Forest ◽

Latin Hypercube Sampling ◽

Current Method ◽

Training Dataset ◽

Great Promise ◽

Burial Mound ◽

Local Convexity ◽

Full Dataset ◽

Burial Mounds

Archaeological topography identification from high-resolution DEMs (Digital Elevation Models) is a current method that is used with high success in archaeological prospecting of wide areas. I present a methodology through which burial mounds (tumuli) from LiDAR (Light Detection And Ranging) DEMS can be identified. This methodology uses geomorphometric and statistical methods to identify with high accuracy burial mound candidates. Peaks, defined as local elevation maxima are found as a first step. In the second step, local convexity watershed segments and their seeds are compared with positions of local peaks and the peaks that correspond or have in vicinity local convexity segments seeds are selected. The local convexity segments that correspond to these selected peaks are further fed to a Random Forest algorithm together with shape descriptors and descriptive statistics of geomorphometric variables in order to build a model for the classification. Multiple approaches to tune and select the proper training dataset, settings, and variables were tested. The validation of the model was performed on the full dataset where the training was performed and on an external dataset in order to test the usability of the method for other areas in a similar geomorphological and archaeological setting. The validation was performed against manually mapped, and field checked burial mounds from two neighbor study areas of 100 km2 each. The results show that by training the Random Forest on a dataset composed of between 75% and 100% of the segments corresponding to burial mounds and ten times more non-burial mounds segments selected using Latin hypercube sampling, 93% of the burial mound segments from the external dataset are identified. There are 42 false positive cases that need to be checked, and there are two burial mound segments missed. The method shows great promise to be used for burial mound detection on wider areas by delineating a certain number of tumuli mounds for model training.

Download Full-text

Comparison of Random Forest Model and Frequency Ratio Model for Landslide Susceptibility Mapping (LSM) in Yunyang County (Chongqing, China)

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17124206 ◽

2020 ◽

Vol 17 (12) ◽

pp. 4206 ◽

Cited By ~ 3

Author(s):

Yue Wang ◽

Deliang Sun ◽

Haijia Wen ◽

Hong Zhang ◽

Fengtai Zhang

Keyword(s):

Random Forest ◽

Landslide Susceptibility ◽

Frequency Ratio ◽

Coincidence Degree ◽

Susceptibility Mapping ◽

Machine Learning Techniques ◽

Landslide Susceptibility Mapping ◽

Training Dataset ◽

Entire Area ◽

Conditioning Factors

To compare the random forest (RF) model and the frequency ratio (FR) model for landslide susceptibility mapping (LSM), this research selected Yunyang Country as the study area for its frequent natural disasters; especially landslides. A landslide inventory was built by historical records; satellite images; and extensive field surveys. Subsequently; a geospatial database was established based on 987 historical landslides in the study area. Then; all the landslides were randomly divided into two datasets: 70% of them were used as the training dataset and 30% as the test dataset. Furthermore; under five primary conditioning factors (i.e., topography factors; geological factors; environmental factors; human engineering activities; and triggering factors), 22 secondary conditioning factors were selected to form an evaluation factor library for analyzing the landslide susceptibility. On this basis; the RF model training and the FR model mathematical analysis were performed; and the established models were used for the landslide susceptibility simulation in the entire area of Yunyang County. Next; based on the analysis results; the susceptibility maps were divided into five classes: very low; low; medium; high; and very high. In addition; the importance of conditioning factors was ranked and the influence of landslides was explored by using the RF model. The area under the curve (AUC) value of receiver operating characteristic (ROC) curve; precision; accuracy; and recall ratio were used to analyze the predictive ability of the above two LSM models. The results indicated a difference in the performances between the two models. The RF model (AUC = 0.988) performed better than the FR model (AUC = 0.716). Moreover; compared with the FR model; the RF model showed a higher coincidence degree between the areas in the high and the very low susceptibility classes; on the one hand; and the geographical spatial distribution of historical landslides; on the other hand. Therefore; it was concluded that the RF model was more suitable for landslide susceptibility evaluation in Yunyang County; because of its significant model performance; reliability; and stability. The outcome also provided a theoretical basis for application of machine learning techniques (e.g., RF) in landslide prevention; mitigation; and urban planning; so as to deliver an adequate response to the increasing demand for effective and low-cost tools in landslide susceptibility assessments.

Download Full-text