fbpx
  • VGG16 – Convolutional Network for Classification and Detection

    vgg16

    VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another. VGG16 was trained for weeks and was using NVIDIA Titan Black GPU’s.

    vgg16 architecture
    VGG16

    DataSet

    ImageNet is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories. The images were collected from the web and labeled by human labelers using Amazon’s Mechanical Turk crowd-sourcing tool. Starting in 2010, as part of the Pascal Visual Object Challenge, an annual competition called the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) has been held. ILSVRC uses a subset of ImageNet with roughly 1000 images in each of 1000 categories. At all, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing images. ImageNet consists of variable-resolution images. Therefore, the images have been down-sampled to a fixed resolution of 256×256. Given a rectangular image, the image is rescaled and cropped out the central 256×256 patch from the resulting image.

    The Architecture

    The architecture depicted below is VGG16.

    VGG16 Artitecture
    Figure 1. VGG16 Architecture

    The input to cov1 layer is of fixed size 224 x 224 RGB image. The image is passed through a stack of convolutional (conv.) layers, where the filters were used with a very small receptive field: 3×3 (which is the smallest size to capture the notion of left/right, up/down, center). In one of the configurations, it also utilizes 1×1 convolution filters, which can be seen as a linear transformation of the input channels (followed by non-linearity). The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Spatial pooling is carried out by five max-pooling layers, which follow some of the conv.  layers (not all the conv. layers are followed by max-pooling). Max-pooling is performed over a 2×2 pixel window, with stride 2.

    Three Fully-Connected (FC) layers follow a stack of convolutional layers (which has a different depth in different architectures): the first two have 4096 channels each, the third performs 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks.

    All hidden layers are equipped with the rectification (ReLU) non-linearity. It is also noted that none of the networks (except for one) contain Local Response Normalisation (LRN), such normalization does not improve the performance on the ILSVRC dataset, but leads to increased memory consumption and computation time.

    Configurations

    The ConvNet configurations are outlined in figure 2. The nets are referred to their names (A-E). All configurations follow the generic design present in architecture and differ only in the depth: from 11 weight layers in the network A (8 conv. and 3 FC layers) to 19 weight layers in the network E (16 conv. and 3 FC layers). The width of conv. layers (the number of channels) is rather small, starting from 64 in the first layer and then increasing by a factor of 2 after each max-pooling layer, until it reaches 512.

    Figure 2. ConvNet Configuration

    Use-Cases and Implementation

    Unfortunately, there are two major drawbacks with VGGNet:

    1. It is painfully slow to train.
    2. The network architecture weights themselves are quite large (concerning disk/bandwidth).

    Due to its depth and number of fully-connected nodes, VGG16 is over 533MB. This makes deploying VGG a tiresome task.VGG16 is used in many deep learning image classification problems; however, smaller network architectures are often more desirable (such as SqueezeNet, GoogLeNet, etc.). But it is a great building block for learning purpose as it is easy to implement.

    [Pytorch]

    [Tensorflow]

    [Keras]

    Result

    VGG16 significantly outperforms the previous generation of models in the ILSVRC-2012 and ILSVRC-2013 competitions. The VGG16 result is also competing for the classification task winner (GoogLeNet with 6.7% error) and substantially outperforms the ILSVRC-2013 winning submission Clarifai, which achieved 11.2% with external training data and 11.7% without it. Concerning the single-net performance, VGG16 architecture achieves the best result (7.0% test error), outperforming a single GoogLeNet by 0.9%.

    VGG16 comparison
    VGG16 comparison

    It was demonstrated that the representation depth is beneficial for the classification accuracy, and that state-of-the-art performance on the ImageNet challenge dataset can be achieved using a conventional ConvNet architecture with substantially increased depth.

    Subscribe
    Notify of
    guest
    38 Comments
    Oldest
    Newest Most Voted
    Inline Feedbacks
    View all comments

    […] will use VGG16 as our deep learning model. We could have trained our own model from ground up, but this takes time […]

    […] Taken from: https://neurohive.io/en/popular-networks/vgg16/ […]

    […] map) at shallow layer(conv-1) of each conv block. A typical pre-trained classification CNN like VGG16 is consist of a few conv blocks, which has 2 or 3 convolution(Conv2D) layers(conv1,conv2 etc.) […]

    […] map) at shallow layer(conv-1) of each conv block. A typical pre-trained classification CNN like VGG16 is consist of a few conv blocks, which has 2 or 3 convolution(Conv2D) layers(conv1,conv2 etc.) […]

    […] המאומנות כמו VGG16, VGG19, Resnet101 ורבות אחרות נמצאות ברשת לשימוש הכל, עם […]

    […] a predição em diferentes elevações. Para agilizar o treinamento, eles usaram a estrutura VGG-16 inicializada com pesos oriundos de um pré-treinamento no dataset ImageNet. A tarefa específica do […]

    […] here’s some visualization of what the actual filters that pass over the images look like in a VGG16 network. (more specifically, an image on which the filter activates the most, code to do […]

    […] here’s some visualization of what the actual filters that pass over the images look like in a VGG16 network. (more specifically, an image on which the filter activates the most, code to do […]

    […] Nuerohive: Convolution Network for Classification and Detection  […]

    […] Referencehttps://neurohive.io/en/popular-networks/vgg16/https://www.quora.com/What-is-the-VGG-neural-network […]

    […] this topic have focused on learning from famous, high-performance deep learning networks, such as VGGNet-16, ResNet-50, or Inception-V3/V4, etc. These networks were trained on the massive ImageNet database, […]

    […] have targeted on studying from well-known, high-performance deep studying networks, similar to VGGNet-16, ResNet-50, or Inception-V3/V4, and many others. These networks had been educated on the […]

    […] the advancement of deep learning such as convolutional neural network (i.e., ConvNet) [1], computer vision becomes a hot scientific research topic again. One of the main goals of […]

    […] of every picture within the coaching set are calculated utilizing a hidden layer of the pretrained VGG-16 convolutional neural community. A single absolutely related feed-forward layer is skilled to take […]

    […] 20 novembre 2018 VGG16 è un modello di rete neurale convoluzionale proposto da K. Simonyan e A. Zisserman dell’Università di Oxford nel documento “Reti convoluzionali molto profonde per il riconoscimento di immagini su larga scala”. Il modello raggiunge la precisione dei primi 5 test del 92,7% in ImageNet, che è un set di dati di oltre 14 milioni di immagini appartenenti a 1000 classi. Era uno dei famosi modelli presentati a ILSVRC-2014. Apporta miglioramenti su AlexNet sostituendo i filtri di grandi dimensioni del kernel (rispettivamente 11 e 5 nel primo e secondo strato convoluzionale) con più filtri 3 × 3 di dimensioni del kernel uno dopo l’altro. VGG16 è stato addestrato per settimane e utilizzava GPU NVIDIA Titan Black. DataSet ImageNet è un set di dati di oltre 15 milioni di immagini etichettate ad alta risoluzione appartenenti a circa 22.000 categorie. Le immagini sono state raccolte dal Web ed etichettate da etichettatrici umane utilizzando lo strumento di crowdsourcing Mechanical Turk di Amazon. A partire dal 2010, nell’ambito del Pascal Visual Object Challenge, si è tenuto un concorso annuale chiamato ImageNet Large Scale Scale Recognition Challenge (ILSVRC). ILSVRC utilizza un sottoinsieme di ImageNet con circa 1000 immagini in ciascuna delle 1000 categorie. Complessivamente, ci sono circa 1,2 milioni di immagini di addestramento, 50.000 immagini di validazione e 150.000 immagini di prova. ImageNet è costituito da immagini a risoluzione variabile. Pertanto, le immagini sono state sottocampionate ad una risoluzione fissa di 256 × 256. Data un’immagine rettangolare, l’immagine viene ridimensionata e ritagliata la patch centrale 256 × 256 dall’immagine risultante. L’architettura L’architettura raffigurata di seguito è VGG16. Architettura VGG16 L’ingresso al livello cov1 ha un’immagine RGB di dimensioni fisse 224 x 224. L’immagine viene fatta passare attraverso una pila di strati convoluzionali (conv.), In cui i filtri sono stati utilizzati con un campo ricettivo molto piccolo: 3 × 3 (che è la dimensione più piccola per catturare la nozione di sinistra / destra, su / giù, centro ). In una delle configurazioni, utilizza anche filtri di convoluzione 1 × 1, che possono essere visti come una trasformazione lineare dei canali di input (seguita dalla non linearità). Il passo di convoluzione è fisso su 1 pixel; l’imbottitura spaziale di conv. l’input del livello è tale che la risoluzione spaziale viene preservata dopo la convoluzione, ovvero il riempimento è di 1 pixel per 3 × 3 conv. strati. Il pooling spaziale viene eseguito da cinque livelli di pool massimo, che seguono alcuni dei conv. layer (non tutti i layer conv. sono seguiti da max pooling). Il max pooling viene eseguito su una finestra di 2 × 2 pixel, con il passo 2. Tre livelli completamente collegati (FC) seguono una pila di livelli convoluzionali (che ha una profondità diversa in architetture diverse): i primi due hanno 4096 canali ciascuno, il terzo esegue la classificazione ILSVRC a 1000 vie e quindi contiene 1000 canali (uno per ogni classe). Il livello finale è il livello soft-max. La configurazione dei layer completamente connessi è la stessa in tutte le reti. Tutti i livelli nascosti sono dotati della non linearità di rettifica (ReLU). Si noti inoltre che nessuna delle reti (tranne una) contiene Local Response Normalization (LRN), tale normalizzazione non migliora le prestazioni sul set di dati ILSVRC, ma porta ad un aumento del consumo di memoria e dei tempi di calcolo. Configurazioni Le configurazioni di ConvNet sono descritte nella figura 02. Le reti si riferiscono ai loro nomi (A-E). Tutte le configurazioni seguono il disegno generico presente nell’architettura e differiscono solo per la profondità: da 11 strati di peso nella rete A (8 strati e 3 strati FC) a 19 strati di peso nella rete E (16 strati e 3 strati FC) . La larghezza di conv. i livelli (il numero di canali) è piuttosto piccolo, a partire da 64 nel primo livello e quindi aumentando di un fattore 2 dopo ogni livello di pool massimo, fino a raggiungere 512. Figura: 2 Casi di utilizzo e implementazione Sfortunatamente, ci sono due principali inconvenienti con VGGNet: è dolorosamente lento allenarsi. I pesi dell’architettura di rete sono piuttosto grandi (per quanto riguarda il disco / la larghezza di banda). Grazie alla profondità e al numero di nodi completamente connessi, VGG16 supera i 533 MB. Questo rende la distribuzione di VGG un compito noioso.VGG16 viene utilizzato in molti problemi di classificazione delle immagini di apprendimento profondo; tuttavia, le architetture di rete più piccole sono spesso più desiderabili (come SqueezeNet, GoogLeNet, ecc.). Ma è un ottimo componente per scopi di apprendimento in quanto è facile da implementare. [Pytorch] [Tensorflow] [Keras] Risultato VGG16 supera significativamente la precedente generazione di modelli nelle competizioni ILSVRC-2012 e ILSVRC-2013. Il risultato VGG16 compete anche per il vincitore del compito di classificazione (GoogLeNet con errore del 6,7%) e supera notevolmente … Fonti […]

    […] features of each image in the training set are calculated using a hidden layer of the pretrained VGG-16 convolutional neural network. A single fully connected feed-forward layer is trained to take those […]

    […] features of each image in the training set are calculated using a hidden layer of the pretrained VGG-16 convolutional neural network. A single fully connected feed-forward layer is trained to take those […]

    […] ciascuno aspetto nel set nel corso di sono calcolate usando unico mano introvabile del pretrattatoVGG – 16 neurale convoluzionale. Un specifico calibro feed-forward del tutto collegato viene preparato verso […]

    […] [3] “VGG16 – Convolutional Network for Classification and Detection”. [Online]. Available: https://neurohive.io/en/popular-networks/vgg16/.  […]

    […] Layered architecture of VGG16 (Source) […]

    […] will use the pre-trained AlexNet and VGG16. They both receive 224*224 (3 channel) as input, so we need to resize our image to 224*224. Because […]

    […] An occasion of options at totally different ranges within the VGG-16 architecture […]

    […] An instance of features at different levels in the VGG-16 architecture […]

    […] https://neurohive.io/en/popular-networks/vgg16/ — VGG16 – Convolutional Network for Classification and Detection (emphasis mine) […]

    […] example, the neural nets, which can include VGG-16, RESNET-50, and others, have the following size when used as a tf.keras application (for example, […]

    […] VGG16 – Convolutional network for classification and detection. (2018, November 21). Neurohive – Neural Networks. https://neurohive.io/en/popular-networks/vgg16/ […]

    […] For image Detecting, we are using a pre-trained model which is VGG16. VGG16 is already installed in the Keras library.VGG 16 was proposed by Karen Simonyan and Andrew Zisserman of the Visual Geometry Group Lab of Oxford University in 2014 in the paper VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION. […]

    […] This isn’t an easy question because this is a full separate paper on its own, but I will give you an idea of what it is. Encoder decoder frameworks are the current SOTA models for image segmentation. A partial decoder discards features of shallower layers of the image, this speeds up the feature extraction process. This decoder is based on the VGG16 model. […]

    […] This isn’t an easy question because this is a full separate paper on its own, but I will give you an idea of what it is. Encoder decoder frameworks are the current SOTA models for image segmentation. A partial decoder discards features of shallower layers of the image, this speeds up the feature extraction process. This decoder is based on the VGG16 model. […]

    […] We will first run the images through the VGG16 base model. VGG16 is a pre-trained model that takes in (224,224) RGB images and converts them into features. It comes out-of-the-box from the keras library and has been trained on millions of images from ImageNet. To read more about VGG16 model check out this site: https://neurohive.io/en/popular-networks/vgg16/ […]

    […] We will first run the images through the VGG16 base model. VGG16 is a pre-trained model that takes in (224,224) RGB images and converts them into features. It comes out-of-the-box from the keras library and has been trained on millions of images from ImageNet. To read more about VGG16 model check out this site: https://neurohive.io/en/popular-networks/vgg16/ […]

    […] a single image prediction faster, I could always resort to using a network with fewer layers like VGG16, […]

    […] a single image prediction faster, I could always resort to using a network with fewer layers like VGG16, […]

    […] of this algorithm is that it takes a large scale of enterprise logos from a database and, utilizing VGG16 for feature extraction, forms K clusters of visually similar […]

    […] of this algorithm is that it takes a large scale of enterprise logos from a database and, utilizing VGG16 for feature extraction, forms K clusters of visually similar […]

    […] I was able to find the VGG16 CNN model (For more information of VGG16 model: VGG16 Paper and Blog post) and implemented the VGG16 model provided by the Python Tensorflow and Keras. I also had to change […]