This article describes about highlights of ALEXNET architecture.
Introduction
Alexnet is a neural network model created for Imagenet challenge on 2012 by Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. The challenge is to classify the images, basically object recognition problem.
Dataset
Imagenet dataset is used, it has 1.2 million high resolution images which should be classified into 1000 classes.
Architecture
The neural network has CNN , Maxpooling layer. There are 5 CONV2D layers, 3 MaxPool, and 2 Dense Layer.
Ideas
Relu is used in hidden units for all the layers becuause of its non saturating linearity. Saturating function means, if we give an input , it will convert the input in between certain range, whereas non saturating means, it will tend to infinity, there is no limit; so, as the input increases, the output also increases. We know the function Relu which is a 45 degree line tending to infinity. This is chosen so as to fasten the convergence process.
Multiple GPU strategy is used to train the model , in this case 2 GPUs to speed up the training process by sending mini batches to individual GPU as the GPUS can read and write memory bypassing the host machine. It also make use of parallelization by using one set of kernels in a GPU and another set in other GPU and kernel interacts in sequential way, i.e kernels of 3 takes the input from kernel 2. This setup is called as “Columnnar CNN”
Local Response Normalization Since ReLU is used as activation, the values of neuron can vary without any limit, obviously we need to normalize, so the idea here is to get average value of the surrounding pixels, similar to the batch normalization where we normalize the entire batch, here we normalize 5 or 6 filters (hyperparameter) surrounding it which is also called as Brightness Normalization. This is inspired by real neuron concept called “Lateral Inhibition” which is the ability of the neuron to supress its surrounding neurons.
LRN is calculated as below
The idea behind the LRN formula is :
new_pixel_value = old_pixel_value / (constant_value + alpha(sum_of_surrounding_pixels))
Constant Value is for avoid divide by zero and alpha is normalizing factor to tell how much we need to normalize.
Data Augumentation is used to capture the spatial information specifically horizontal reflections and random cropping.
Dropouts are used to avoid overfitting, which reduces the complexity of the network , for further reading, another article which describes reguralization is here https://solomon-ai.medium.com/l1-regularization-and-sparsity-c1d077bdc07c
References
2. Header Image from https://www.mdpi.com/2072-4292/9/8/848