Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/107789
Title: Automated news aggregator
Authors: Dimech, David (2022)
Keywords: News Web sites
Natural language processing (Computer science)
Sentiment analysis
Issue Date: 2022
Citation: Dimech, D. (2022). Automated news aggregator (Bachelor's dissertation).
Abstract: The way readers consume news has evolved as a result of the rise of the internet and social media. Over the last two decades, newsrooms have expanded their operations online, and their stories are now published on social media, online web portals, and/or mobile applications. The internet has democratised and facilitated journalism, while social media has made it easier to exchange and spread news. Although this is generally positive, there may be certain instances where it has a detrimental impact. If the news is biassed or inaccurate, it may distort the public’s perception of critical issues. The Automated News Aggregator (ANA) attempts to solve this problem by providing an online platform. Here, articles related to the same subject published by multiple newsrooms, are aggregated into one article with minimal bias. Currently, existing systems just group similar articles and stories together. This project takes it a step further by aggregating the article’s content, and tries to reduce the bias, all the while working in a transparent and responsible manner. The original articles are scraped from their respective websites, pre-processed and translated. Using TF-IDF, articles are made into a vector, in order to be queried and grouped into similar articles. Each sentence of the similar articles is split and inserted into one list of sentences. The sentences are then embedded into sentence vectors, and clustered by semantic meaning. Clusters of similar sentences are then processed and scored according to specific criteria such as the sentiment of a sentence, the number of entities, the position of a sentence relative to the article, and use of pronouns. The best scoring sentence is then chosen from its cluster, and added to a list of sentences for the newly aggregated article. ANA takes online news portals on a new trajectory, encouraging consumable and unbiased media. A questionnaire was conducted and 73 responses were gathered and 54 unique articles were evaluated. Different criteria were assessed, including Accuracy of the Article Title, Structure of the Article, Overall Correctness of the Article, Flow of the Article, Quality of English used and Usefulness of aggregated article. The highest performing criteria was the Quality of English scoring an average of 4.0 while the lowest was the Flow of the article 3.7 (both out of 5). Each criteria had a maximum of 5 points, and the average of all the scores for the articles was 3.8.
Description: B.Sc. IT (Hons)(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/107789
Appears in Collections:Dissertations - FacICT - 2022
Dissertations - FacICTAI - 2022

Files in This Item:
File Description SizeFormat 
2208ICTICT390900013946_1.PDF
  Restricted Access
3.5 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.