Types of Neural Networks And Their Classification

Nitesh S
10 min readJan 11, 2022

Published by Nitesh Sonawane, Adiya Wanjari, Mohit Lalwani, Anushka Wankhade, Shresthi Yadav


Mohit Lalwani: Introduction and Feed Forward Neural Network

Shresthi Yadav: Radial Basis Functions (RBF) Neural Network

Nitesh Sonawane: Kohonen Self-organizing Neural Network, RNN Introduction

Aditya Wanjari: LSTM, GRU, LSTM vs GRU

Anushka Wankhade: Advantages & Disadvantage of RNN, CNN, MNN

Introduction to Neural Networks

Neural networks or also known as Artificial Neural Networks (ANN) are networks that utilize complex mathematical models for information processing. They are based on the model of the functioning of neurons and synapses in the brain of human beings. Similar to the human brain, a neural network connects simple nodes, also known as neurons or units. And a collection of such nodes forms a network of nodes, hence the name “neural network.”

Similar to the human brain, in a neural network, an array of algorithms are used to identify and recognize relationships in data sets. Neural networks are designed to adapt to dynamic input scenarios; with the result, the best possible outcomes are provided by the network without having to rework the design of the output for further processing.

From a utilization standpoint, Neural Networks are being used on a variety of technologies and applications such as video games, computer vision, speech recognition, social network filtering, playing board, machine translation, and medical diagnosis. Surprisingly, neural networks are being used for activities that are traditional and creative, like painting and art.

Classification of Neural Networks in AI

  1. Feed-forward Neural Network
    This is the simplest model of a Neural network. Feed-forward neural networks are fast while used; however, from a training perspective, it is a little slow and takes time. Most of the vision and speech recognition applications use some form of feed-forward type of neural network.
    The feed-forward network is non-linear. The primary reason for these networks to be called feed-forward is that the flow of data takes place in the forward direction more so the data travels in a unidirectional way viz. input to output. Different functions can be arranged to depict these networks. Each model can be depicted as a graph where the functional groups are described. An example could be, three functions f(1), input layer one, f(2) is layer two, and f(3) is the output layer. So the information is passed from the input layer to the next layer where the computation takes place, which in turn gets passed to the output layer.
  2. Radial Basis Functions (RBF) Neural Network
    In this type of neural network, the data is grouped based on its distance from a center point. In situations where there is no training data, the data is grouped, and a center point is created. This network is designed to look for data points that are similar to each other and then group the data. An example application of this type of neural network is Power Restoration system.
    To explain further for better understanding, a Radial Basis Function (RBF) neural network has three layers — an input layer, a hidden layer, and an output layer. The hidden layer is non-linear, and the output layer is linear. Applications of RBF networks are image processing, speech recognition, and medical diagnosis.

RBF Networks — The three layers — Details:

Input Layer

For each predictor variable, there is one neuron in the input layer, and in the situation of categorical variables, N-1 neurons are utilized where N represents the number of categories. The standardization of the range of the values is performed by the input neurons, where the median value is subtracted and divided by the interquartile range. Subsequently, the input neurons feed each of the values to the neurons in the hidden layer.

Hidden Layer

This layer consists of a variable number of neurons, and the training process determines the exact number. Every neuron contains a radial basis function centered on a point. The number of dimensions and the number of predictor variables are the same for every neuron. For each dimension, the spread or the radius of the RBF function could be different. The training process defines and determines the centers and spreads. The hidden neuron computes the Euclidean distance of the test case from the neuron’s center point. The values thus obtained after applying the RBF kernel function to the distance using spread values, are passed to the summation layer.

Summation Layer

The output value of a neuron from a hidden layer is multiplied by a weight associated with the neuron and shifted to the summation function where the weighted values are added, and the sum is presented as the output of the network. Wherever there is a classification dilemma, there is one output with a separate group of weights and a summation unit for each category target. The output value is a probability that the case that is being studied or under evaluation has that particular category.
RBF networks are quite identical to K-Means clustering, PNN, and GRNN networks.

Key facts

  • In PNN/GRNN networks, each point in the training file has one neuron. In the case of RBF networks, there are variable numbers of neurons that are generally lesser than the number of training points.
  • In small to medium-sized training sets, PNN or GRNN networks generally are more accurate than RBF networks. The downside is that PNN or GRNN networks are not practically suitable for large training sets.

Kohonen Self-organizing Neural Network

As per — Scholarpedia, supported by Brain Corporation, “Kohonen Network, which is also called Self-Organizing Map (SOM), is used for the visualization and analysis of high-dimensional data, specifically experimentally acquired information. It is a computational method where it defines an ordered mapping and projects onto a regular two-dimensional grid from a set of given data points.
The SOM was primarily developed for the visualization of distributions of metric vectors, like ordered sets of measurement values and statistical attributes. It can practically be shown that the mutual pairwise distances of data can be defined by utilizing a SOM-type mapping for any data set or items. SOM computational methods can be applied to non-vectorial data sets as well, such as strings of symbols and sequences of segments in organic molecules.”

Recurrent Neural Network

In a Recurrent Neural Network (RNN), the previous step’s output is given as input to the following step. In conventional neural networks, the entire inputs and outputs are independent of each other. However, when there is a need to predict the next word of a sentence, the previous words are needed, and that necessitates a need to remember the previous words. So RNN was developed and was designed to solve this issue of remembering the previous input with the help of a Hidden Layer.
The paramount feature of RNN is its Hidden state, where the information about a sequence is remembered. It has “memory,” which remembers and recalls all information of what has been calculated in the previous computational steps. In RNN, the same parameters are used for each input since the task is the same irrespective of whether it performs the computation for inputs or hidden layers to produce the output. This significantly helps by reducing the complexity of parameters as compared to other neural networks. RNN has a wide variety of applications, one of which is TTS (text-to-speech) synthesis.

Long Short Term Memory (LSTM)

LSTM networks are an extension of recurrent neural networks (RNNs) mainly introduced to handle situations where RNNs fail. Talking about RNN, it is a network that works on the present input by taking into consideration the previous output (feedback) and storing it in its memory for a short period of time (short-term memory). Out of its various applications, the most popular ones are in the fields of speech processing, non-Markovian control, and music composition. Nevertheless, there are drawbacks to RNNs. First, it fails to store information for a longer period of time. At times, a reference to certain information stored quite a long time ago is required to predict the current output. But RNNs are absolutely incapable of handling such “long-term dependencies”. Second, there is no finer control over which part of the context needs to be carried forward and how much of the past needs to be ‘forgotten’. Other issues with RNNs are exploding and vanishing gradients (explained later) which occur during the training process of a network through backtracking. Thus, Long Short-Term Memory (LSTM) was brought into the picture. It has been so designed that the vanishing gradient problem is almost completely removed, while the training model is left unaltered. Long time lags in certain problems are bridged using LSTMs where they also handle noise, distributed representations, and continuous values. With LSTMs, there is no need to keep a finite number of states from beforehand as required in the hidden Markov model (HMM). LSTMs provide us with a large range of parameters such as learning rates, and input and output biases. Hence, no need for fine adjustments. The complexity to update each weight is reduced to O(1) with LSTMs, similar to that of Back Propagation Through Time (BPTT), which is an advantage.

Gated Recurrent Neural Network (GTU)

As mentioned above, GRUs is an improved version of standard recurrent neural networks. But what makes them so special and effective?
To solve the vanishing gradient problem of a standard RNN, GRU uses, so-called, update gate and reset gate. Basically, these are two vectors that decide what information should be passed to the output. The special thing about them is that they can be trained to keep information from long ago, without washing it through time or removing information that is irrelevant to the prediction.


In spite of being quite similar to LSTMs, GRUs have never been so popular. But what are GRUs? GRU stands for Gated Recurrent Units. As the name suggests, these recurrent units, proposed by Cho, are also provided with a gated mechanism to effectively and adaptively capture dependencies of different time scales. They have an update gate and a reset gate. The former is responsible for selecting what piece of knowledge is to be carried forward, whereas the latter lies in between two successive recurrent units and decides how much information needs to be forgotten.

Advantages of Recurrent Neural Network

  • RNN is intelligent enough to remember every piece of information across the network and is very useful in time series prediction. This is the primary reason that it is used in such kinds of applications as it can remember previous inputs as well.
  • Inputs of any length can be processed in this model.

Disadvantages of Recurrent Neural Network

  • Exploding and gradient vanishing is common in this model.\
  • Training an RNN is quite a challenging task.
  • It cannot process very long sequences if using ‘tanh’ or ‘relu’ as an activation function.

Convolution Neural Network

One of the well-known algorithms for machine learning, more specifically, deep learning, is a Convolutional Neural Network (CNN or ConvNet). In CNN, the model learns to execute tasks directly from images, video, text, or sound. CNNs find patterns in images and pictures to recognize objects, faces, and scenes. The learning happens directly from image data. They classify images by the use of patterns and eliminate the requirement for human interaction for feature extraction.
It is interesting to know that CNN is powerful when it comes to applications that require object recognition and computer vision, such as self-driving vehicles and face recognition. Based on the application, one can develop a CNN from scratch. Pre-trained models can also be used for the data set. CNNs are architected quite well for image recognition and pattern detection. The advancements in GPUs and Parallel computing have made CNNs very robust and capable of delivering high quality in automated driving and facial recognition. CNNs are versatile in that they learn to identify the differences between a traffic signal and a pedestrian.
Why are CNNs useful?

  • CNNs eliminate the need for manual or human intervention feature extraction efforts.
  • CNNs deliver the highest quality results in recognition results.
  • CNNs enable building on pre-existing networks, thus retraining for new recognition tasks is made possible.
  • Micromanagement of input features is possible. In this sense, the input features are handled as batches. This permits the network to remember an image in several parts.

Modular Neural Network (MNN)

In a Modular neural network, the results are collectively contributed by several independent networks. These independent neural networks perform several sub-tasks constructed by each of these neural networks. This type of activity provides a group of unique inputs as compared to other neural networks.
Further, in this type of neural network, modularity helps in lessening the complexity of a problem to be solved since these modular networks completely break down the computational processes into small components. The speed of the computation is also significantly improved since the number of connections is broken down, which therefore reduces the requirement for interactions between these neural networks. Also, the total time of processing is dependent on the number of neurons that are involved in the computation process.
An interesting fact is that Modular Neural Networks is probably the fastest-growing area in Artificial Intelligence.


As we have seen, there are several types of Neural Networks, and each of these is applied for different requirements to achieve desired outcomes. The significant thing about neural networks is that they are modeled designed keeping in mind the way the neurons in the brain work. So, what can be expected eventually is that these networks will learn more and improve more with more data and utilization. The difference between traditional Machine Learning (ML) algorithms and neural networks is that ML will tend to stagnate after a point. In contrast, Neural networks can truly grow in performance and outcome with increased data and usage.

That is one of the reasons why many industry experts staunchly believe that neural networks will be the basic framework on which next-generation Artificial Intelligence will be built and grow. For sure, by now, you would have got a good understanding of the concept of Neural Networks and their types.