Combination of supervised and unsupervised learning for training the activation functions of neural networks

2014 ◽  
Vol 37 ◽  
pp. 178-191 ◽  
Author(s):  
Ilaria Castelli ◽  
Edmondo Trentin
Author(s):  
Sindhu P. Menon

In the last couple of years, artificial neural networks have gained considerable momentum. Their results could be enhanced if the number of layers could be made deeper. Of late, a lot of data has been generated, which has led to big data. This comes along with many challenges like quality, which is one of the most important ones. Deep learning models can improve the quality of data. In this chapter, an attempt has been made to review deep supervised and deep unsupervised learning algorithms and the various activation functions used. Challenges in deep learning have also been discussed.


Materials ◽  
2020 ◽  
Vol 13 (4) ◽  
pp. 938 ◽  
Author(s):  
Enrique Miranda ◽  
Jordi Suñé

Artificial Intelligence has found many applications in the last decade due to increased computing power. Artificial Neural Networks are inspired in the brain structure and consist in the interconnection of artificial neurons through artificial synapses in the so-called Deep Neural Networks (DNNs). Training these systems requires huge amounts of data and, after the network is trained, it can recognize unforeseen data and provide useful information. As far as the training is concerned, we can distinguish between supervised and unsupervised learning. The former requires labelled data and is based on the iterative minimization of the output error using the stochastic gradient descent method followed by the recalculation of the strength of the synaptic connections (weights) with the backpropagation algorithm. On the other hand, unsupervised learning does not require data labeling and it is not based on explicit output error minimization. Conventional ANNs can function with supervised learning algorithms (perceptrons, multi-layer perceptrons, convolutional networks, etc.) but also with unsupervised learning rules (Kohonen networks, self-organizing maps, etc.). Besides, another type of neural networks are the so-called Spiking Neural Networks (SNNs) in which learning takes place through the superposition of voltage spikes launched by the neurons. Their behavior is much closer to the brain functioning mechanisms they can be used with supervised and unsupervised learning rules. Since learning and inference is based on short voltage spikes, energy efficiency improves substantially. Up to this moment, all these ANNs (spiking and conventional) have been implemented as software tools running on conventional computing units based on the von Neumann architecture. However, this approach reaches important limits due to the required computing power, physical size and energy consumption. This is particularly true for applications at the edge of the internet. Thus, there is an increasing interest in developing AI tools directly implemented in hardware for this type of applications. The first hardware demonstrations have been based on Complementary Metal-Oxide-Semiconductor (CMOS) circuits and specific communication protocols. However, to further increase training speed andenergy efficiency while reducing the system size, the combination of CMOS neuron circuits with memristor synapses is now being explored. It has also been pointed out that the short time non-volatility of some memristors may even allow fabricating purely memristive ANNs. The memristor is a new device (first demonstrated in solid-state in 2008) which behaves as a resistor with memory and which has been shown to have potentiation and depression properties similar to those of biological synapses. In this Special Issue, we explore the state of the art of neuromorphic circuits implementing neural networks with memristors for AI applications.


2019 ◽  
Vol 8 (4) ◽  
pp. 9746-9750

Searching for an optimal article which was given highest and best priority is quite harder based on requirements. Ranking is one of the best measure or a method to get the best rated and optimal article or a conference or a research paper through this huge Internet World. As Technology been increasing day by day Artificial Intelligence is the first step to get through any problem for a solution Machine learning is also an important aspect of Artificial Intelligence. Machine Learning is best known for classifying, categorizing and predicting. Rank prediction can be done through many different algorithm implementations in machine learning. But choosing the best is important for accurate results. This paper gives the most accurate results of algorithms that can be used for rank predictions for articles. To simplify and resolve this problem, solutions were given in many different ways but to achieve accuracy is necessary, in previous models this is given using supervised learning only. We proposed this research work with perfect results using both supervised and unsupervised learning. Neural Networks is the best algorithm in supervised learning for classifying and predicting within data. In unsupervised learning we used K-means clustering because of grouping the data. This work helps the user(s) for optimal search of an article and also gives a competitive spirit for author to get into the top, totally this is implemented using Machine Learning Techniques of Neural Networks, K-Means Algorithm which is a mixture of supervised and unsupervised learning for predicting ranks.


2019 ◽  
Vol 12 (3) ◽  
pp. 156-161 ◽  
Author(s):  
Aman Dureja ◽  
Payal Pahwa

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.


Author(s):  
Volodymyr Shymkovych ◽  
Sergii Telenyk ◽  
Petro Kravets

AbstractThis article introduces a method for realizing the Gaussian activation function of radial-basis (RBF) neural networks with their hardware implementation on field-programmable gaits area (FPGAs). The results of modeling of the Gaussian function on FPGA chips of different families have been presented. RBF neural networks of various topologies have been synthesized and investigated. The hardware component implemented by this algorithm is an RBF neural network with four neurons of the latent layer and one neuron with a sigmoid activation function on an FPGA using 16-bit numbers with a fixed point, which took 1193 logic matrix gate (LUTs—LookUpTable). Each hidden layer neuron of the RBF network is designed on an FPGA as a separate computing unit. The speed as a total delay of the combination scheme of the block RBF network was 101.579 ns. The implementation of the Gaussian activation functions of the hidden layer of the RBF network occupies 106 LUTs, and the speed of the Gaussian activation functions is 29.33 ns. The absolute error is ± 0.005. The Spartan 3 family of chips for modeling has been used to get these results. Modeling on chips of other series has been also introduced in the article. RBF neural networks of various topologies have been synthesized and investigated. Hardware implementation of RBF neural networks with such speed allows them to be used in real-time control systems for high-speed objects.


2021 ◽  
Vol 11 (15) ◽  
pp. 6704
Author(s):  
Jingyong Cai ◽  
Masashi Takemoto ◽  
Yuming Qiu ◽  
Hironori Nakajo

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.


2021 ◽  
Vol 7 (15) ◽  
pp. eabe4166
Author(s):  
Philippe Schwaller ◽  
Benjamin Hoover ◽  
Jean-Louis Reymond ◽  
Hendrik Strobelt ◽  
Teodoro Laino

Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of “reaction rules” from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.


Sign in / Sign up

Export Citation Format

Share Document