Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/76927
Title: Automatic crime information gathering and data analytics from online news reports
Authors: Spiteri, Janica (2020)
Keywords: Online journalism
Criminal statistics
Crime
Natural language processing (Computer science)
Machine learning
Issue Date: 2020
Citation: Spiteri, J. (2020). Automatic crime information gathering and data analytics from online news reports (Bachelor's dissertation).
Abstract: One of the major challenges faced by law enforcement is that of the prioritisation and rostering of resources, maximising chances of having the right resources at the right place and at the right time. This research proposes a hybrid machine learning technology which uses a set of customised crawlers to gather data on a daily basis from newspaper articles. Articles that deal with criminal offences are identified, analysed and their inherent details extracted using Natural Language Processing (NLP) Technology. Articles coming from different sources are converged using a standardised format that allows the details of the criminal act (such as crime, location, time, criminal, etc.) to be easily accessed. Related data such as population, literacy etc. are also extracted from other sources using dedicated web crawlers and cross referenced with the criminal events themselves. Web crawling is automated using a special bot designed to initiate the crawling processes regularly. A visualisation engine is being proposed to allow users to quickly and effectively browse the criminal event database using a feature rich search engine enabling specific parameters to be easily identified and depicted. Representations include geographical/calendar heat maps, graphs, etc. Previous research in similar areas has utilised various machine learning techniques with different success rates. This research aims to study the effectiveness of K-Means and DBSCAN [87] based technologies when applied to crime prediction. K-Means uses a purely statistical past-data based model to attempt to predict the incidence of crime; while DBSCAN uses clustering techniques which could include other datasets in addition to past criminal event data. Various datasets has been used to evaluate the performance of the proposed technology; with encouraging results. The Precision/Recall/F-Measure technique used in previous studies [85], [96], has been utilised to compute the F-Measure of both techniques. Moreover, geographically different regions (Malta and Boston) where used to evaluate different crime patterns. While the large number of possible prediction configurations make it very difficult to cover all the possible scenarios, both techniques performed quite well, with the K-Means based one being slightly more accurate when predicting recurring crimes. Predictions of monthly instances of specific crimes were achieved with a combined (NLP + Prediction) F-Measure of 0.78 which compares very favourably with other studies, even those who only covered prediction on a ready-made dataset without any NLP related inaccuracies.
Description: B.Sc. IT (Hons)(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/76927
Appears in Collections:Dissertations - FacICT - 2020
Dissertations - FacICTCIS - 2020

Files in This Item:
File Description SizeFormat 
20BITSD019.pdf
  Restricted Access
3.16 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.