Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/107789
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.date.accessioned | 2023-03-28T05:55:36Z | - |
dc.date.available | 2023-03-28T05:55:36Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Dimech, D. (2022). Automated news aggregator (Bachelor's dissertation). | en_GB |
dc.identifier.uri | https://www.um.edu.mt/library/oar/handle/123456789/107789 | - |
dc.description | B.Sc. IT (Hons)(Melit.) | en_GB |
dc.description.abstract | The way readers consume news has evolved as a result of the rise of the internet and social media. Over the last two decades, newsrooms have expanded their operations online, and their stories are now published on social media, online web portals, and/or mobile applications. The internet has democratised and facilitated journalism, while social media has made it easier to exchange and spread news. Although this is generally positive, there may be certain instances where it has a detrimental impact. If the news is biassed or inaccurate, it may distort the public’s perception of critical issues. The Automated News Aggregator (ANA) attempts to solve this problem by providing an online platform. Here, articles related to the same subject published by multiple newsrooms, are aggregated into one article with minimal bias. Currently, existing systems just group similar articles and stories together. This project takes it a step further by aggregating the article’s content, and tries to reduce the bias, all the while working in a transparent and responsible manner. The original articles are scraped from their respective websites, pre-processed and translated. Using TF-IDF, articles are made into a vector, in order to be queried and grouped into similar articles. Each sentence of the similar articles is split and inserted into one list of sentences. The sentences are then embedded into sentence vectors, and clustered by semantic meaning. Clusters of similar sentences are then processed and scored according to specific criteria such as the sentiment of a sentence, the number of entities, the position of a sentence relative to the article, and use of pronouns. The best scoring sentence is then chosen from its cluster, and added to a list of sentences for the newly aggregated article. ANA takes online news portals on a new trajectory, encouraging consumable and unbiased media. A questionnaire was conducted and 73 responses were gathered and 54 unique articles were evaluated. Different criteria were assessed, including Accuracy of the Article Title, Structure of the Article, Overall Correctness of the Article, Flow of the Article, Quality of English used and Usefulness of aggregated article. The highest performing criteria was the Quality of English scoring an average of 4.0 while the lowest was the Flow of the article 3.7 (both out of 5). Each criteria had a maximum of 5 points, and the average of all the scores for the articles was 3.8. | en_GB |
dc.language.iso | en | en_GB |
dc.rights | info:eu-repo/semantics/restrictedAccess | en_GB |
dc.subject | News Web sites | en_GB |
dc.subject | Natural language processing (Computer science) | en_GB |
dc.subject | Sentiment analysis | en_GB |
dc.title | Automated news aggregator | en_GB |
dc.type | bachelorThesis | en_GB |
dc.rights.holder | The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder. | en_GB |
dc.publisher.institution | University of Malta | en_GB |
dc.publisher.department | Faculty of Information and Communication Technology. Department of Artificial Intelligence | en_GB |
dc.description.reviewed | N/A | en_GB |
dc.contributor.creator | Dimech, David (2022) | - |
Appears in Collections: | Dissertations - FacICT - 2022 Dissertations - FacICTAI - 2022 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2208ICTICT390900013946_1.PDF Restricted Access | 3.5 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.