Convolutional neural networks and the bag-of-visual-words approach : a comparative analysis based on variation of training set size

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/29525

Title:	Convolutional neural networks and the bag-of-visual-words approach : a comparative analysis based on variation of training set size
Authors:	Falzon, Michelle
Keywords:	Neural networks (Computer science) Computer vision Image processing
Issue Date:	2017
Abstract:	Image classification has a large variety of practical applications in computing, most notably medicinal (e.g. detecting diseases from tests), industrial (e.g. locating faulty parts) and security (e.g. facial recognition). Despite their growing popularity, little research has been carried out on the comparison between two of the most common image classi cation algorithms: the Bag-of-Visual-Words approach (BoVW) and Convolutional Neural Networks (CNN), in terms of the size of the training set required. These two methods of image classi cation have proven to be highly reliable. However, they each have different memory and time requirements as well as different optimal training set characteristics. The tendency towards using the current top performing algorithm, without regard for its best uses, has led many to use CNNs despite other methods being possibly more suitable for the task at hand. This work aimed at implementing and studying the bag-of-visual-words approach and convolutional neural networks to determine their performance when presented with varied sized training sets. Subsets of the CIFAR-100 data set, containing tiny images and multiple categories, and the FERET data set, consisting of larger images and only two categories, were chosen for this project. Upon training the entire CNN on the CIFAR-100 data set, the error rate did not decrease below 0.9. In contrast, training using transfer learning obtained an accuracy of 77.55% on 20 classes using 200 images/class. The BoVW approach was less successful, reaching a top accuracy of 54.05%. Re-initialising more layers of the CNN led to a lower performance - probably due to the fact that more training images were required. The addition of more levels to the BoVW spatial pyramid resulted in a higher accuracy. The opposite was seen on the FERET data set. The BoVW drastically outperformed the CNN, peaking at 92.82% with 200 images/category. The CNN required more than 50 images/category in order to provide reliable results and, even then, its accuracy was slightly less than that of the BoVW. From the results, one can conclude that using transfer learning on pre-trained CNNs is more suitable for large and complex data sets, while the BoVW approach performs best on simpler and smaller data sets.
Description:	B.SC.IT(HONS)
URI:	https://www.um.edu.mt/library/oar//handle/123456789/29525
Appears in Collections:	Dissertations - FacICT - 2017 Dissertations - FacICTAI - 2017

Files in This Item:

File	Description	Size	Format
17BITAI012.pdf Restricted Access		3.93 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics