Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/107789
Full metadata record
DC FieldValueLanguage
dc.date.accessioned2023-03-28T05:55:36Z-
dc.date.available2023-03-28T05:55:36Z-
dc.date.issued2022-
dc.identifier.citationDimech, D. (2022). Automated news aggregator (Bachelor's dissertation).en_GB
dc.identifier.urihttps://www.um.edu.mt/library/oar/handle/123456789/107789-
dc.descriptionB.Sc. IT (Hons)(Melit.)en_GB
dc.description.abstractThe way readers consume news has evolved as a result of the rise of the internet and social media. Over the last two decades, newsrooms have expanded their operations online, and their stories are now published on social media, online web portals, and/or mobile applications. The internet has democratised and facilitated journalism, while social media has made it easier to exchange and spread news. Although this is generally positive, there may be certain instances where it has a detrimental impact. If the news is biassed or inaccurate, it may distort the public’s perception of critical issues. The Automated News Aggregator (ANA) attempts to solve this problem by providing an online platform. Here, articles related to the same subject published by multiple newsrooms, are aggregated into one article with minimal bias. Currently, existing systems just group similar articles and stories together. This project takes it a step further by aggregating the article’s content, and tries to reduce the bias, all the while working in a transparent and responsible manner. The original articles are scraped from their respective websites, pre-processed and translated. Using TF-IDF, articles are made into a vector, in order to be queried and grouped into similar articles. Each sentence of the similar articles is split and inserted into one list of sentences. The sentences are then embedded into sentence vectors, and clustered by semantic meaning. Clusters of similar sentences are then processed and scored according to specific criteria such as the sentiment of a sentence, the number of entities, the position of a sentence relative to the article, and use of pronouns. The best scoring sentence is then chosen from its cluster, and added to a list of sentences for the newly aggregated article. ANA takes online news portals on a new trajectory, encouraging consumable and unbiased media. A questionnaire was conducted and 73 responses were gathered and 54 unique articles were evaluated. Different criteria were assessed, including Accuracy of the Article Title, Structure of the Article, Overall Correctness of the Article, Flow of the Article, Quality of English used and Usefulness of aggregated article. The highest performing criteria was the Quality of English scoring an average of 4.0 while the lowest was the Flow of the article 3.7 (both out of 5). Each criteria had a maximum of 5 points, and the average of all the scores for the articles was 3.8.en_GB
dc.language.isoenen_GB
dc.rightsinfo:eu-repo/semantics/restrictedAccessen_GB
dc.subjectNews Web sitesen_GB
dc.subjectNatural language processing (Computer science)en_GB
dc.subjectSentiment analysisen_GB
dc.titleAutomated news aggregatoren_GB
dc.typebachelorThesisen_GB
dc.rights.holderThe copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.en_GB
dc.publisher.institutionUniversity of Maltaen_GB
dc.publisher.departmentFaculty of Information and Communication Technology. Department of Artificial Intelligenceen_GB
dc.description.reviewedN/Aen_GB
dc.contributor.creatorDimech, David (2022)-
Appears in Collections:Dissertations - FacICT - 2022
Dissertations - FacICTAI - 2022

Files in This Item:
File Description SizeFormat 
2208ICTICT390900013946_1.PDF
  Restricted Access
3.5 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.