Probabilistic Contextual and Structural Dependencies Learning in Grammar-Based Genetic Programming

Evolutionary Computation ◽

10.1162/evco_a_00280 ◽

2020 ◽

pp. 1-28

Author(s):

Pak-Kan Wong ◽

Man-Leung Wong ◽

Kwong-Sak Leung

Keyword(s):

Genetic Programming ◽

Direct Marketing ◽

Machine Learning Algorithms ◽

Benchmark Problems ◽

Bayesian Classifiers ◽

Complexity Measures ◽

Fitness Evaluation ◽

Bayesian Network Classifiers ◽

A Minor ◽

Grammar Based Genetic Programming

Genetic Programming is a method to automatically create computer programs based on the principles of evolution. The problem of deceptiveness caused by complex dependencies among components of programs is challenging. It is important because it can misguide Genetic Programming to create sub-optimal programs. Besides, a minor modification in the programs may lead to a notable change in the program behaviours and affect the final outputs. This paper presents Grammar-based Genetic Programming with Bayesian Classifiers (GBGPBC) in which the probabilistic dependencies among components of programs are captured using a set of Bayesian network classifiers. Our system was evaluated using a set of benchmark problems (the deceptive maximum problems, the royal tree problems, and the bipolar asymmetric royal tree problems). It was shown to be often more robust and more efficient in searching the best programs than other related Genetic Programming approaches in terms of the total number of fitness evaluation. We studied what factors affect the performance of GBGPBC and discovered that robust variants of GBGPBC were consistently weakly correlated with some complexity measures. Furthermore, our approach has been applied to learn a ranking program on a set of customers in direct marketing. Our suggested solutions help companies to earn significantly more when compared with other solutions produced by several well-known machine learning algorithms, such as neural networks, logistic regression, and Bayesian networks.

Download Full-text

Parameter identification for symbolic regression using nonlinear least squares

Genetic Programming and Evolvable Machines ◽

10.1007/s10710-019-09371-3 ◽

2019 ◽

Vol 21 (3) ◽

pp. 471-501 ◽

Cited By ~ 2

Author(s):

Michael Kommenda ◽

Bogdan Burlacu ◽

Gabriel Kronberger ◽

Michael Affenzeller

Keyword(s):

Genetic Programming ◽

Local Search ◽

Parameter Identification ◽

Least Squares ◽

Automatic Differentiation ◽

Nonlinear Least Squares ◽

Symbolic Regression ◽

Machine Learning Algorithms ◽

Benchmark Problems ◽

Marquardt Algorithm

AbstractIn this paper we analyze the effects of using nonlinear least squares for parameter identification of symbolic regression models and integrate it as local search mechanism in tree-based genetic programming. We employ the Levenberg–Marquardt algorithm for parameter optimization and calculate gradients via automatic differentiation. We provide examples where the parameter identification succeeds and fails and highlight its computational overhead. Using an extensive suite of symbolic regression benchmark problems we demonstrate the increased performance when incorporating nonlinear least squares within genetic programming. Our results are compared with recently published results obtained by several genetic programming variants and state of the art machine learning algorithms. Genetic programming with nonlinear least squares performs among the best on the defined benchmark suite and the local search can be easily integrated in different genetic programming algorithms as long as only differentiable functions are used within the models.

Download Full-text

Construction of Bayesian classifiers with GA for response modeling in direct marketing

2009 2nd IEEE International Conference on Computer Science and Information Technology ◽

10.1109/iccsit.2009.5234617 ◽

2009 ◽

Author(s):

Hongmei Shao ◽

Gaofeng Zheng

Keyword(s):

Direct Marketing ◽

Bayesian Classifiers ◽

Response Modeling

Download Full-text

Multiobjective grammar-based genetic programming applied to the study of asthma and allergy epidemiology

BMC Bioinformatics ◽

10.1186/s12859-018-2233-z ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 1

Author(s):

Rafael V. Veiga ◽

Helio J. C. Barbosa ◽

Heder S. Bernardino ◽

João M. Freitas ◽

Caroline A. Feitosa ◽

...

Keyword(s):

Genetic Programming ◽

Asthma And Allergy ◽

Grammar Based Genetic Programming

Download Full-text

Evolving rule induction algorithms with multi-objective grammar-based genetic programming

Knowledge and Information Systems ◽

10.1007/s10115-008-0171-1 ◽

2008 ◽

Vol 19 (3) ◽

pp. 283-309 ◽

Cited By ~ 33

Author(s):

Gisele L. Pappa ◽

Alex A. Freitas

Keyword(s):

Genetic Programming ◽

Rule Induction ◽

Multi Objective ◽

Grammar Based Genetic Programming

Download Full-text

High-dimensional Unbalanced Binary Classification by Genetic Programming with Multi-criterion Fitness Evaluation and Selection

Evolutionary Computation ◽

10.1162/evco_a_00304 ◽

2021 ◽

pp. 1-26

Author(s):

Wenbin Pei ◽

Bing Xue ◽

Lin Shang ◽

Mengjie Zhang

Keyword(s):

Genetic Programming ◽

Fitness Function ◽

Binary Classification ◽

Class Imbalance ◽

Area Under The Curve ◽

High Dimensional ◽

Genetic Operators ◽

Minority Class ◽

Fitness Evaluation ◽

Unbalanced Classification

Abstract High-dimensional unbalanced classification is challenging because of the joint effects of high dimensionality and class imbalance. Genetic programming (GP) has the potential benefits for use in high-dimensional classification due to its built-in capability to select informative features. However, once data is not evenly distributed, GP tends to develop biased classifiers which achieve a high accuracy on the majority class but a low accuracy on the minority class. Unfortunately, the minority class is often at least as important as the majority class. It is of importance to investigate how GP can be effectively utilized for high-dimensional unbalanced classification. In this paper, to address the performance bias issue of GP, a new two-criterion fitness function is developed, which considers two criteria, i.e. the approximation of area under the curve (AUC) and the classification clarity (i.e. how well a program can separate two classes). The obtained values on the two criteria are combined in pairs, instead of summing them together. Furthermore, this paper designs a three-criterion tournament selection to effectively identify and select good programs to be used by genetic operators for generating better offspring during the evolutionary learning process. The experimental results show that the proposed method achieves better classification performance than other compared methods.

Download Full-text

Extending Grammar-based Genetic Programming to Evolve Objects in Java

Artificial Intelligence and Applications ◽

10.2316/p.2010.674-126 ◽

2010 ◽

Author(s):

Y. Oppacher ◽

F. Oppacher

Keyword(s):

Genetic Programming ◽

Grammar Based Genetic Programming

Download Full-text

A Comparative Study of Machine Learning Algorithms in Predicting Severe Complications after Bariatric Surgery

Journal of Clinical Medicine ◽

10.3390/jcm8050668 ◽

2019 ◽

Vol 8 (5) ◽

pp. 668 ◽

Cited By ~ 17

Author(s):

Yang Cao ◽

Xin Fang ◽

Johan Ottosson ◽

Erik Näslund ◽

Erik Stenberg

Keyword(s):

Machine Learning ◽

Bariatric Surgery ◽

Test Data ◽

Imbalanced Data ◽

Preoperative Assessment ◽

High Accuracy ◽

Machine Learning Algorithms ◽

Global Public Health ◽

Bariatric Surgical Procedure ◽

A Minor

Background: Severe obesity is a global public health threat of growing proportions. Accurate models to predict severe postoperative complications could be of value in the preoperative assessment of potential candidates for bariatric surgery. So far, traditional statistical methods have failed to produce high accuracy. We aimed to find a useful machine learning (ML) algorithm to predict the risk for severe complication after bariatric surgery. Methods: We trained and compared 29 supervised ML algorithms using information from 37,811 patients that operated with a bariatric surgical procedure between 2010 and 2014 in Sweden. The algorithms were then tested on 6250 patients operated in 2015. We performed the synthetic minority oversampling technique tackling the issue that only 3% of patients experienced severe complications. Results: Most of the ML algorithms showed high accuracy (>90%) and specificity (>90%) in both the training and test data. However, none of the algorithms achieved an acceptable sensitivity in the test data. We also tried to tune the hyperparameters of the algorithms to maximize sensitivity, but did not yet identify one with a high enough sensitivity that can be used in clinical praxis in bariatric surgery. However, a minor, but perceptible, improvement in deep neural network (NN) ML was found. Conclusion: In predicting the severe postoperative complication among the bariatric surgery patients, ensemble algorithms outperform base algorithms. When compared to other ML algorithms, deep NN has the potential to improve the accuracy and it deserves further investigation. The oversampling technique should be considered in the context of imbalanced data where the number of the interested outcome is relatively small.

Download Full-text

Grammar Based Genetic Programming for Software Configuration Problem

Search Based Software Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-319-66299-2_10 ◽

2017 ◽

pp. 130-136 ◽

Cited By ~ 2

Author(s):

Fitsum Meshesha Kifetew ◽

Denisse Muñante ◽

Jesús Gorroñogoitia ◽

Alberto Siena ◽

Angelo Susi ◽

...

Keyword(s):

Genetic Programming ◽

Software Configuration ◽

Grammar Based Genetic Programming ◽

Configuration Problem

Download Full-text

Data Mining Using Grammar Based Genetic Programming and Applications

10.1007/b116131 ◽

2002 ◽

Cited By ~ 1

Keyword(s):

Data Mining ◽

Genetic Programming ◽

Grammar Based Genetic Programming

Download Full-text

A genetic programming-based approach to identify potential inhibitors of serine protease of Mycobacterium tuberculosis

Future Medicinal Chemistry ◽

10.4155/fmc-2018-0560 ◽

2020 ◽

Vol 12 (2) ◽

pp. 147-159

Author(s):

Madhulata Kumari ◽

Neeraj Tiwari ◽

Naidu Subbarao

Keyword(s):

Mycobacterium Tuberculosis ◽

Random Forest ◽

Genetic Programming ◽

Serine Protease ◽

Protease Inhibitors ◽

Predictive Models ◽

Machine Learning Algorithms ◽

Serine Protease Inhibitors ◽

Phytochemical Compounds ◽

The Impact

Aim: We applied genetic programming approaches to understand the impact of descriptors on inhibitory effects of serine protease inhibitors of Mycobacterium tuberculosis ( Mtb) and the discovery of new inhibitors as drug candidates. Materials & methods: The experimental dataset of serine protease inhibitors of Mtb descriptors was optimized by genetic algorithm (GA) along with the correlation-based feature selection (CFS) in order to develop predictive models using machine-learning algorithms. The best model was deployed on a library of 918 phytochemical compounds to screen potential serine protease inhibitors of Mtb. Quality and performance of the predictive models were evaluated using various standard statistical parameters. Result: The best random forest model with CFS-GA screened 126 anti-tubercular agents out of 918 phytochemical compounds. Also, genetic programing symbolic classification method is optimized descriptors and developed an equation for mathematical models. Conclusion: The use of CFS-GA with random forest-enhanced classification accuracy and predicted new serine protease inhibitors of Mtb, which can be used for better drug development against tuberculosis.

Download Full-text