Probabilistic Contextual and Structural Dependencies Learning in Grammar-Based Genetic Programming

2020 ◽  
pp. 1-28
Author(s):  
Pak-Kan Wong ◽  
Man-Leung Wong ◽  
Kwong-Sak Leung

Genetic Programming is a method to automatically create computer programs based on the principles of evolution. The problem of deceptiveness caused by complex dependencies among components of programs is challenging. It is important because it can misguide Genetic Programming to create sub-optimal programs. Besides, a minor modification in the programs may lead to a notable change in the program behaviours and affect the final outputs. This paper presents Grammar-based Genetic Programming with Bayesian Classifiers (GBGPBC) in which the probabilistic dependencies among components of programs are captured using a set of Bayesian network classifiers. Our system was evaluated using a set of benchmark problems (the deceptive maximum problems, the royal tree problems, and the bipolar asymmetric royal tree problems). It was shown to be often more robust and more efficient in searching the best programs than other related Genetic Programming approaches in terms of the total number of fitness evaluation. We studied what factors affect the performance of GBGPBC and discovered that robust variants of GBGPBC were consistently weakly correlated with some complexity measures. Furthermore, our approach has been applied to learn a ranking program on a set of customers in direct marketing. Our suggested solutions help companies to earn significantly more when compared with other solutions produced by several well-known machine learning algorithms, such as neural networks, logistic regression, and Bayesian networks.

2019 ◽  
Vol 21 (3) ◽  
pp. 471-501 ◽  
Author(s):  
Michael Kommenda ◽  
Bogdan Burlacu ◽  
Gabriel Kronberger ◽  
Michael Affenzeller

AbstractIn this paper we analyze the effects of using nonlinear least squares for parameter identification of symbolic regression models and integrate it as local search mechanism in tree-based genetic programming. We employ the Levenberg–Marquardt algorithm for parameter optimization and calculate gradients via automatic differentiation. We provide examples where the parameter identification succeeds and fails and highlight its computational overhead. Using an extensive suite of symbolic regression benchmark problems we demonstrate the increased performance when incorporating nonlinear least squares within genetic programming. Our results are compared with recently published results obtained by several genetic programming variants and state of the art machine learning algorithms. Genetic programming with nonlinear least squares performs among the best on the defined benchmark suite and the local search can be easily integrated in different genetic programming algorithms as long as only differentiable functions are used within the models.


2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Rafael V. Veiga ◽  
Helio J. C. Barbosa ◽  
Heder S. Bernardino ◽  
João M. Freitas ◽  
Caroline A. Feitosa ◽  
...  

2021 ◽  
pp. 1-26
Author(s):  
Wenbin Pei ◽  
Bing Xue ◽  
Lin Shang ◽  
Mengjie Zhang

Abstract High-dimensional unbalanced classification is challenging because of the joint effects of high dimensionality and class imbalance. Genetic programming (GP) has the potential benefits for use in high-dimensional classification due to its built-in capability to select informative features. However, once data is not evenly distributed, GP tends to develop biased classifiers which achieve a high accuracy on the majority class but a low accuracy on the minority class. Unfortunately, the minority class is often at least as important as the majority class. It is of importance to investigate how GP can be effectively utilized for high-dimensional unbalanced classification. In this paper, to address the performance bias issue of GP, a new two-criterion fitness function is developed, which considers two criteria, i.e. the approximation of area under the curve (AUC) and the classification clarity (i.e. how well a program can separate two classes). The obtained values on the two criteria are combined in pairs, instead of summing them together. Furthermore, this paper designs a three-criterion tournament selection to effectively identify and select good programs to be used by genetic operators for generating better offspring during the evolutionary learning process. The experimental results show that the proposed method achieves better classification performance than other compared methods.


2019 ◽  
Vol 8 (5) ◽  
pp. 668 ◽  
Author(s):  
Yang Cao ◽  
Xin Fang ◽  
Johan Ottosson ◽  
Erik Näslund ◽  
Erik Stenberg

Background: Severe obesity is a global public health threat of growing proportions. Accurate models to predict severe postoperative complications could be of value in the preoperative assessment of potential candidates for bariatric surgery. So far, traditional statistical methods have failed to produce high accuracy. We aimed to find a useful machine learning (ML) algorithm to predict the risk for severe complication after bariatric surgery. Methods: We trained and compared 29 supervised ML algorithms using information from 37,811 patients that operated with a bariatric surgical procedure between 2010 and 2014 in Sweden. The algorithms were then tested on 6250 patients operated in 2015. We performed the synthetic minority oversampling technique tackling the issue that only 3% of patients experienced severe complications. Results: Most of the ML algorithms showed high accuracy (>90%) and specificity (>90%) in both the training and test data. However, none of the algorithms achieved an acceptable sensitivity in the test data. We also tried to tune the hyperparameters of the algorithms to maximize sensitivity, but did not yet identify one with a high enough sensitivity that can be used in clinical praxis in bariatric surgery. However, a minor, but perceptible, improvement in deep neural network (NN) ML was found. Conclusion: In predicting the severe postoperative complication among the bariatric surgery patients, ensemble algorithms outperform base algorithms. When compared to other ML algorithms, deep NN has the potential to improve the accuracy and it deserves further investigation. The oversampling technique should be considered in the context of imbalanced data where the number of the interested outcome is relatively small.


Author(s):  
Fitsum Meshesha Kifetew ◽  
Denisse Muñante ◽  
Jesús Gorroñogoitia ◽  
Alberto Siena ◽  
Angelo Susi ◽  
...  

2020 ◽  
Vol 12 (2) ◽  
pp. 147-159
Author(s):  
Madhulata Kumari ◽  
Neeraj Tiwari ◽  
Naidu Subbarao

Aim: We applied genetic programming approaches to understand the impact of descriptors on inhibitory effects of serine protease inhibitors of Mycobacterium tuberculosis ( Mtb) and the discovery of new inhibitors as drug candidates. Materials & methods: The experimental dataset of serine protease inhibitors of Mtb descriptors was optimized by genetic algorithm (GA) along with the correlation-based feature selection (CFS) in order to develop predictive models using machine-learning algorithms. The best model was deployed on a library of 918 phytochemical compounds to screen potential serine protease inhibitors of  Mtb. Quality and performance of the predictive models were evaluated using various standard statistical parameters. Result: The best random forest model with CFS-GA screened 126 anti-tubercular agents out of 918 phytochemical compounds. Also, genetic programing symbolic classification method is optimized descriptors and developed an equation for mathematical models. Conclusion: The use of CFS-GA with random forest-enhanced classification accuracy and predicted new serine protease inhibitors of Mtb, which can be used for better drug development against tuberculosis.


Sign in / Sign up

Export Citation Format

Share Document