Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/64169
Title: A citizen science approach for the collection of data to train deep learning models
Authors: Saliba, Chantelle
Keywords: Plants -- Malta
Machine learning
Neural networks (Computer science)
Research -- Malta -- Citizen participation
Issue Date: 2020
Citation: Saliba, C. (2020). A citizen science approach for the collection of data to train deep learning models (Bachelor's dissertation).
Abstract: Machine learning continues to advocate the technological progress of nature studies. Machine learning techniques that give good predictions require a considerable amount of data, which can sometimes be a challenge to collect. Due to the size of the island, the study of Maltese flora is one of such fields that lacks available data causing little technological advancements. Training a deep learning network with lack of data easily results in overfitting. Therefore other auxiliary techniques have to be used to overcome this challenge and provide more data for better training. In the first part of this study, we investigate the training of a deep learning model that makes use of a limited training dataset utilising techniques such as data augmentation, data scraping and transfer learning. The deep learning model being considered is composed of 50 categories incorporating species that are endemic to the Maltese islands, whilst eliminating cultivated exotic species that are usually not found in the Maltese countryside. Data scraping did not generate sufficient training data. Data augmentation was then used to enhance the dataset, concluding that data augmentation performed on both the training data and the testing data generated the highest accuracy model. Different transfer learning methods were also evaluated and it was concluded that the VGG-16 model outperformed the other models. Considering the mentioned techniques and dataset, a model with an accuracy of 47.87% was generated. This low accuracy of an improved off-the-shelf model showed the relevance of the initial hypothesis that citizen science is needed for the improvement of deep-learning models. In the second phase, citizen science was used as a data augmentation technique. Citizen science depends on the structure of society and culture; therefore a study was conducted through the use of a questionnaire to determine the opinion of the general public. From 243 respondents, it was concluded that 13.2% said that they were not interested in a mobile communication system to crowdsource data. The application was to be utilized through nature walks during the peak months of COVID-19. Consequently, the application was distributed through the use of an APK file to interested individuals, gathering 257 valid images which were used to enhance the dataset. The deep learning model was re-trained on this dataset, achieving an accuracy of 62.44%, an increase of 14.57% on its performance. The data collected was utilised to generate visualisations of the Maltese flora distribution. This study demonstrated that the use of citizen science is essential for the improvement of deep learning models so that they can be employed in more widespread applications.
Description: B.SC.ICT(HONS)ARTIFICIAL INTELLIGENCE
URI: https://www.um.edu.mt/library/oar/handle/123456789/64169
Appears in Collections:Dissertations - FacICT - 2020
Dissertations - FacICTAI - 2020
Scholarly Works - FacSciBio

Files in This Item:
File Description SizeFormat 
20BITAI009 - Saliba Chantelle.pdf
  Restricted Access
7.44 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.