Different types of Deep Learning models explained


Deep learning is a class of machine learning techniques that exploit many layers of non-linear information processing for supervised or unsupervised feature extraction and transformation, for pattern analysis and classification. It consists of many hierarchical layers to process the information in a non-linear manner, where some lower-level concept helps to define the higher-level concepts.

The shallow artificial neural networks are not capable of handling a significant amount of complex data, which are apparent in many routine applications such as natural speech, images, information retrieval, and other human-like information processing applications. Deep learning is suitable for such applications. With deep learning, it is possible to recognize, classify, and categorize patterns in data for a machine with comparatively less efforts. Google is a pioneer to experiment deep learning, which is initiated by Andrew Ng.

Deep learning offers human-like multilayered processing in comparison with the shallow architecture. The basic idea of deep learning is to employ hierarchical processing using many layers of architecture. The architecture layers are arranged hierarchically. After several pre-training, each layer’s input goes to its adjacent layer. Most often, such pre-training of a selected layer executed in an unsupervised way.

Deep learning follows a distributed approach to managing big data. The method assumes that the data gets generated considering numerous factors, different time, and various levels. Deep learning facilitates the arrangement and processing of the data into different layers according to its time (occurrence), its scale, or nature.

Deep learning is often associated with artificial neural networks. There are three categories of deep learning architectures:

  • Generative
  • Discriminative
  • Hybrid deep learning architectures

Architectures from general categories focus on the pre-training of a layer in an unsupervised way. This approach eliminates the difficulty of training lower level architectures, which rely on the previous layers. Each layer can be pre-trained and later included in the model for further general tuning and learning. Doing this resolves the problem of training neural network architecture with multiple layers and enables deep learning.

Neural network architecture may have discriminative processing ability by stacking output of each layer with the original data or by various information combinations and thus forming deep learning architecture. The descriptive model often considers the neural network outputs as a conditional distribution over all possible label sequences for the given input sequence, which will be optimized further through an objective function. The hybrid architecture combines the properties of the generative and discriminative architecture. Typically, one can do deep learning can as follows.

  • Construct a network consisting of an input layer and a hidden layer with necessary nodes
  • Train the network
  • Add another hidden layer on the top of the previously learned network to generate a new network
  • Retrain the network
  • Repeat adding more layers and after every addition, retrain the network

Different types of deep learning models


An autoencoder is an artificial neural network that is capable of learning various coding patterns. The simple form of the autoencoder is just like the multilayer perceptron, containing an input layer or one or more hidden layers, or an output layer. The significant difference between the typical multilayer perceptron and feedforward neural network and autoencoder is in the number of nodes at the output layer. In the case of the autoencoder, the output layer contains the same amount of nodes as in the input layer. Instead of predicting target values as per the output vector, the autoencoder has to predict its inputs. The broad outline of the learning mechanism is as follows.

For each input x,

  • Do a feedforward pass to compute activation functions provided at all the hidden layers and output layers
  • Find the deviation between the calculated values with the inputs using appropriate error function
  • Backpropagate the error to update weights
  • Repeat the task till satisfactory output.

If the number of nodes in the hidden layers is fewer than the input/output nodes, then the activations of the last hidden layer are considered as a compressed representation of the inputs. When the hidden layer nodes are more than the input layer, an autoencoder can potentially learn the identity function and become useless in the majority of the cases.

Deep Belief Net

A deep belief network is a solution to the problem of handling non-convex objective functions and local minima while using the typical multilayer perceptron. It is an alternative type of deep learning consisting of multiple layers of latent variables with connection between the layers. The deep belief network can be viewed as restricted Boltzmann machines (RBM), where each subnetwork’s hidden layer acts as the visible input layer for the adjacent layer of the network. It makes the lowest visible layer a training set for the adjacent layer of the network. This way, each layer of the network is trained independently and greedily. The hidden variables are used as the observed variables to train each layer of the deep structure. The training algorithm for such a deep belief network is provided as follows:

  • Consider a vector of inputs
  • Train a restricted Boltzmann machine using the input vector and obtain the weight matrix
  • Train the lower two layers of the network using this weight matrix
  • Generate new input vector by using the network (RBM) through sampling or mean activation of the hidden units
  • Repeat the procedure till the top two layers of the network are reached

The fine-tuning of the deep belief network is very similar to the multilayer perceptron. Such deep belief networks are useful in acoustic modeling.

Convolutional Neural Networks

A convolutional neural network (CNN) is another variant of the feedforward multilayer perceptron. It is a type of feedforward neural network, where the individual neurons are ordered in a way that they respond to all overlapping regions in the visual area.

Deep CNN works by consecutively modeling small pieces of information and combining them deeper in the network. One way to understand them is that the first layer will try to identify edges and form templates for edge detection. Then, the subsequent layers will try to combine them into simpler shapes and eventually into templates of different object positions, illumination, scales, etc. The final layers will match an input image with all the templates, and the final prediction is like a weighted sum of all of them. So, deep CNNs can model complex variations and behavior, giving highly accurate predictions.

Such a network follows the visual mechanism of living organisms. The cells in the visual cortex are sensitive to small subregions of the visual field, called a receptive field. The subregions are arranged to cover the entire visual area, and the cells act as local filters over the input space. The backpropagation algorithm is used to train the parameters of each convolution kernel. Further, each kernel is replicated over the entire image with the same parameters. There are convolutional operators which extract unique features of the input. Besides the convolutional layer, the network contains a rectified linear unit layer, pooling layers to compute the max or average value of a feature over a region of the image, and a loss layer consisting of application-specific loss functions. Image recognition and video analysis and natural language processing are major applications of such a neural network.

The area of computer vision has witnessed frequent progresses in the past few years. One of the most stated advancements is CNNs. Now, deep CNNs form the core of most sophisticated fancy computer vision applications, such as self-driving cars, gesture recognition, auto-tagging of friends in our Facebook pictures, facial security features, and automatic number plate recognition.

Recurrent Neural Networks

The convolutional model works on a fixed number of inputs, generates a fix-sized vector as output with a predefined number of steps. The recurrent networks allow us to operate over sequences of vectors in input and output. In the case of recurrent neural network, the connection between units forms a directed cycle. Unlike the traditional neural network, the recurrent neural network input and output are not independent but related. Further, the recurrent neural network shares the standard parameters at every layer. One can train the recurrent network in a way that is like the traditional neural network using the backpropagation method.

Here, calculation of gradient depends not on the current step but previous steps also. A variant called a bidirectional recurrent neural network is also used for many applications. The bidirectional neural network considers not only the previous but also the expected future output. In two-way and straightforward recurrent neural networks, deep learning can be achieved by introducing multiple hidden layers. Such deep networks provide higher learning capacity with lots of learning data. Speech, image processing, and natural language processing are some of the candidate areas where recurrent neural networks can be used.

Reinforcement Learning to Neural Networks

Reinforcement learning is a kind of hybridization of dynamic programming and supervised learning. Typical components of the approach are environment, agent, actions, policy, and cost functions. The agent acts as a controller of the system; policy determines the actions to be taken, and the reward function specifies the overall objective of the reinforcement learning problem. An agent, receiving the maximum possible reward, can be regarded as performing the best action for a given state.

Here, an agent refers to an abstract entity, either an object or a subject (autonomous cars, robots, humans, customer support chatbots, etc.), which performs actions. The state of an agent refers to its position and state of being in its abstract environment; for example, a specific position in a virtual reality world, a building, a chessboard, or the position and speed on a racetrack. Deep reinforcement learning holds the promise of a very generalized learning procedure that can learn useful behavior with very little feedback. It is an exciting and challenging area, which will undoubtedly be an essential part of the future AI landscape.