Learning in design: From characterizing dimensions to working systems

Author(s):  
YORAM REICH

The application of machine learning (ML) to solve practical problems is complex. Only recently, due to the increased promise of ML in solving real problems and the experienced difficulty of their use, has this issue started to attract attention. This difficulty arises from the complexity of learning problems and the large variety of available techniques. In order to understand this complexity and begin to overcome it, it is important to construct a characterization of learning situations. Building on previous work that dealt with the practical use of ML, a set of dimensions is developed, contrasted with another recent proposal, and illustrated with a project on the development of a decision-support system for marine propeller design. The general research opportunities that emerge from the development of the dimensions are discussed. Leading toward working systems, a simple model is presented for setting priorities in research and in selecting learning tasks within large projects. Central to the development of the concepts discussed in this paper is their use in future projects and the recording of their successes, limitations, and failures.

2021 ◽  
Vol 4 ◽  
Author(s):  
Henry Adams ◽  
Michael Moy

Through the use of examples, we explain one way in which applied topology has evolved since the birth of persistent homology in the early 2000s. The first applications of topology to data emphasized the global shape of a dataset, such as the three-circle model for 3 × 3 pixel patches from natural images, or the configuration space of the cyclo-octane molecule, which is a sphere with a Klein bottle attached via two circles of singularity. In these studies of global shape, short persistent homology bars are disregarded as sampling noise. More recently, however, persistent homology has been used to address questions about the local geometry of data. For instance, how can local geometry be vectorized for use in machine learning problems? Persistent homology and its vectorization methods, including persistence landscapes and persistence images, provide popular techniques for incorporating both local geometry and global topology into machine learning. Our meta-hypothesis is that the short bars are as important as the long bars for many machine learning tasks. In defense of this claim, we survey applications of persistent homology to shape recognition, agent-based modeling, materials science, archaeology, and biology. Additionally, we survey work connecting persistent homology to geometric features of spaces, including curvature and fractal dimension, and various methods that have been used to incorporate persistent homology into machine learning.


2021 ◽  
pp. 1-18
Author(s):  
Ilenna Simone Jones ◽  
Konrad Paul Kording

Abstract Physiological experiments have highlighted how the dendrites of biological neurons can nonlinearly process distributed synaptic inputs. However, it is unclear how aspects of a dendritic tree, such as its branched morphology or its repetition of presynaptic inputs, determine neural computation beyond this apparent nonlinearity. Here we use a simple model where the dendrite is implemented as a sequence of thresholded linear units. We manipulate the architecture of this model to investigate the impacts of binary branching constraints and repetition of synaptic inputs on neural computation. We find that models with such manipulations can perform well on machine learning tasks, such as Fashion MNIST or Extended MNIST. We find that model performance on these tasks is limited by binary tree branching and dendritic asymmetry and is improved by the repetition of synaptic inputs to different dendritic branches. These computational experiments further neuroscience theory on how different dendritic properties might determine neural computation of clearly defined tasks.


Author(s):  
Carlo Ciliberto ◽  
Mark Herbster ◽  
Alessandro Davide Ialongo ◽  
Massimiliano Pontil ◽  
Andrea Rocchetto ◽  
...  

Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.


2021 ◽  
Author(s):  
◽  
Andrew Lensen

<p>Unsupervised learning is a fundamental category of machine learning that works on data for which no pre-existing labels are available. Unlike in supervised learning, which has such labels, methods that perform unsupervised learning must discover intrinsic patterns within data.  The size and complexity of data has increased substantially in recent years, which has necessitated the creation of new techniques for reducing the complexity and dimensionality of data in order to allow humans to understand the knowledge contained within data. This is particularly problematic in unsupervised learning, as the number of possible patterns in a dataset grows exponentially with regard to the number of dimensions. Feature manipulation techniques such as feature selection (FS) and feature construction (FC) are often used in these situations. FS automatically selects the most valuable features (attributes) in a dataset, whereas FC constructs new, more powerful and meaningful features that provide a lower-dimensional space.  Evolutionary computation (EC) approaches have become increasingly recognised for their potential to provide high-quality solutions to data mining problems in a reasonable amount of computational time. Unlike other popular techniques such as neural networks, EC methods have global search ability without needing gradient information, which makes them much more flexible and applicable to a wider range of problems. EC approaches have shown significant potential in feature manipulation tasks with methods such as Particle Swarm Optimisation (PSO) commonly used for FS, and Genetic Programming (GP) for FC.  The use of EC for feature manipulation has, until now, been predominantly restricted to supervised learning problems. This is a notable gap in the research: if unsupervised learning is even more sensitive to high-dimensionality, then why is EC-based feature manipulation not used for unsupervised learning problems?  This thesis provides the first comprehensive investigation into the use of evolutionary feature manipulation for unsupervised learning tasks. It clearly shows the ability of evolutionary feature manipulation to improve both the performance of algorithms and interpretability of solutions in unsupervised learning tasks. A variety of tasks are investigated, including the well-established task of clustering, as well as more recent unsupervised learning problems, such as benchmark dataset creation and manifold learning.  This thesis proposes a new PSO-based approach to performing simultaneous FS and clustering. A number of improvements to the state-of-the-art are made, including the introduction of a new medoid-based representation and an improved fitness function. A sophisticated three-stage algorithm, which takes advantage of heuristic techniques to determine the number of clusters and to fine-tune clustering performance is also developed. Empirical evaluation on a range of clustering problems demonstrates a decrease in the number of features used, while also improving the clustering performance.  This thesis also introduces two innovative approaches to performing wrapper-based FC in clustering tasks using GP. An initial approach where constructed features are directly provided to the k-means clustering algorithm demonstrates the clear strength of GP-based FC for improving clustering results. A more advanced method is proposed that utilises the functional nature of GP-based FC to evolve more specific, concise, and understandable similarity functions for use in clustering algorithms. These similarity functions provide clear improvements in performance and can be easily interpreted by machine learning practitioners.  This thesis demonstrates the ability of evolutionary feature manipulation to solve unsupervised learning tasks that traditional methods have struggled with. The synthesis of benchmark datasets has long been a technique used for evaluating machine learning techniques, but this research is the first to present an approach that automatically creates diverse and challenging redundant features for a given dataset. This thesis introduces a GP-based FC approach that creates difficult benchmark datasets for evaluating FS algorithms. It also makes the intriguing discovery that using a mutual information-based fitness function with GP has the potential to be used to improve supervised learning tasks even when the labels are not utilised.  Manifold learning is an approach to dimensionality reduction that aims to reduce dimensionality by discovering the inherent lower-dimensional structure of a dataset. While state-of-the-art manifold learning approaches show impressive performance in reducing data dimensionality, they do so at the cost of removing the ability for humans to understand the data in terms of the original features. By utilising a GP-based approach, this thesis proposes new methods that can perform interpretable manifold learning, which provides deep insight into patterns in the data.  These four contributions clearly support the hypothesis that evolutionary feature manipulation has untapped potential in unsupervised learning. This thesis demonstrates that EC-based feature manipulation can be successfully applied to a variety of unsupervised learning tasks with clear improvements in both performance and interpretability. A plethora of future research directions in this area are also discovered, which we hope will lead to further valuable findings in this area.</p>


2021 ◽  
Author(s):  
◽  
Andrew Lensen

<p>Unsupervised learning is a fundamental category of machine learning that works on data for which no pre-existing labels are available. Unlike in supervised learning, which has such labels, methods that perform unsupervised learning must discover intrinsic patterns within data.  The size and complexity of data has increased substantially in recent years, which has necessitated the creation of new techniques for reducing the complexity and dimensionality of data in order to allow humans to understand the knowledge contained within data. This is particularly problematic in unsupervised learning, as the number of possible patterns in a dataset grows exponentially with regard to the number of dimensions. Feature manipulation techniques such as feature selection (FS) and feature construction (FC) are often used in these situations. FS automatically selects the most valuable features (attributes) in a dataset, whereas FC constructs new, more powerful and meaningful features that provide a lower-dimensional space.  Evolutionary computation (EC) approaches have become increasingly recognised for their potential to provide high-quality solutions to data mining problems in a reasonable amount of computational time. Unlike other popular techniques such as neural networks, EC methods have global search ability without needing gradient information, which makes them much more flexible and applicable to a wider range of problems. EC approaches have shown significant potential in feature manipulation tasks with methods such as Particle Swarm Optimisation (PSO) commonly used for FS, and Genetic Programming (GP) for FC.  The use of EC for feature manipulation has, until now, been predominantly restricted to supervised learning problems. This is a notable gap in the research: if unsupervised learning is even more sensitive to high-dimensionality, then why is EC-based feature manipulation not used for unsupervised learning problems?  This thesis provides the first comprehensive investigation into the use of evolutionary feature manipulation for unsupervised learning tasks. It clearly shows the ability of evolutionary feature manipulation to improve both the performance of algorithms and interpretability of solutions in unsupervised learning tasks. A variety of tasks are investigated, including the well-established task of clustering, as well as more recent unsupervised learning problems, such as benchmark dataset creation and manifold learning.  This thesis proposes a new PSO-based approach to performing simultaneous FS and clustering. A number of improvements to the state-of-the-art are made, including the introduction of a new medoid-based representation and an improved fitness function. A sophisticated three-stage algorithm, which takes advantage of heuristic techniques to determine the number of clusters and to fine-tune clustering performance is also developed. Empirical evaluation on a range of clustering problems demonstrates a decrease in the number of features used, while also improving the clustering performance.  This thesis also introduces two innovative approaches to performing wrapper-based FC in clustering tasks using GP. An initial approach where constructed features are directly provided to the k-means clustering algorithm demonstrates the clear strength of GP-based FC for improving clustering results. A more advanced method is proposed that utilises the functional nature of GP-based FC to evolve more specific, concise, and understandable similarity functions for use in clustering algorithms. These similarity functions provide clear improvements in performance and can be easily interpreted by machine learning practitioners.  This thesis demonstrates the ability of evolutionary feature manipulation to solve unsupervised learning tasks that traditional methods have struggled with. The synthesis of benchmark datasets has long been a technique used for evaluating machine learning techniques, but this research is the first to present an approach that automatically creates diverse and challenging redundant features for a given dataset. This thesis introduces a GP-based FC approach that creates difficult benchmark datasets for evaluating FS algorithms. It also makes the intriguing discovery that using a mutual information-based fitness function with GP has the potential to be used to improve supervised learning tasks even when the labels are not utilised.  Manifold learning is an approach to dimensionality reduction that aims to reduce dimensionality by discovering the inherent lower-dimensional structure of a dataset. While state-of-the-art manifold learning approaches show impressive performance in reducing data dimensionality, they do so at the cost of removing the ability for humans to understand the data in terms of the original features. By utilising a GP-based approach, this thesis proposes new methods that can perform interpretable manifold learning, which provides deep insight into patterns in the data.  These four contributions clearly support the hypothesis that evolutionary feature manipulation has untapped potential in unsupervised learning. This thesis demonstrates that EC-based feature manipulation can be successfully applied to a variety of unsupervised learning tasks with clear improvements in both performance and interpretability. A plethora of future research directions in this area are also discovered, which we hope will lead to further valuable findings in this area.</p>


2020 ◽  
Vol 89 ◽  
pp. 20-29
Author(s):  
Sh. K. Kadiev ◽  
◽  
R. Sh. Khabibulin ◽  
P. P. Godlevskiy ◽  
V. L. Semikov ◽  
...  

Introduction. An overview of research in the field of classification as a method of machine learning is given. Articles containing mathematical models and algorithms for classification were selected. The use of classification in intelligent management decision support systems in various subject areas is also relevant. Goal and objectives. The purpose of the study is to analyze papers on the classification as a machine learning method. To achieve the objective, it is necessary to solve the following tasks: 1) to identify the most used classification methods in machine learning; 2) to highlight the advantages and disadvantages of each of the selected methods; 3) to analyze the possibility of using classification methods in intelligent systems to support management decisions to solve issues of forecasting, prevention and elimination of emergencies. Methods. To obtain the results, general scientific and special methods of scientific knowledge were used - analysis, synthesis, generalization, as well as the classification method. Results and discussion thereof. According to the results of the analysis, studies with a mathematical formulation and the availability of software developments were identified. The issues of classification in the implementation of machine learning in the development of intelligent decision support systems are considered. Conclusion. The analysis revealed that enough algorithms were used to perform the classification while sorting the acquired knowledge within the subject area. The implementation of an accurate classification is one of the fundamental problems in the development of management decision support systems, including for fire and emergency prevention and response. Timely and effective decision by officials of operational shifts for the disaster management is also relevant. Key words: decision support, analysis, classification, machine learning, algorithm, mathematical models.


Author(s):  
Ivan Herreros

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.


Author(s):  
Raghothama Chaerkady ◽  
Yebin Zhou ◽  
Jared A. Delmar ◽  
Shao Huan Samuel Weng ◽  
Junmin Wang ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
José Castela Forte ◽  
Galiya Yeshmagambetova ◽  
Maureen L. van der Grinten ◽  
Bart Hiemstra ◽  
Thomas Kaufmann ◽  
...  

AbstractCritically ill patients constitute a highly heterogeneous population, with seemingly distinct patients having similar outcomes, and patients with the same admission diagnosis having opposite clinical trajectories. We aimed to develop a machine learning methodology that identifies and provides better characterization of patient clusters at high risk of mortality and kidney injury. We analysed prospectively collected data including co-morbidities, clinical examination, and laboratory parameters from a minimally-selected population of 743 patients admitted to the ICU of a Dutch hospital between 2015 and 2017. We compared four clustering methodologies and trained a classifier to predict and validate cluster membership. The contribution of different variables to the predicted cluster membership was assessed using SHapley Additive exPlanations values. We found that deep embedded clustering yielded better results compared to the traditional clustering algorithms. The best cluster configuration was achieved for 6 clusters. All clusters were clinically recognizable, and differed in in-ICU, 30-day, and 90-day mortality, as well as incidence of acute kidney injury. We identified two high mortality risk clusters with at least 60%, 40%, and 30% increased. ICU, 30-day and 90-day mortality, and a low risk cluster with 25–56% lower mortality risk. This machine learning methodology combining deep embedded clustering and variable importance analysis, which we made publicly available, is a possible solution to challenges previously encountered by clustering analyses in heterogeneous patient populations and may help improve the characterization of risk groups in critical care.


Sign in / Sign up

Export Citation Format

Share Document