Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/94047
Title: | A web browser history indexing and retrieval system based on named entity co-reference resolution within and across documents |
Authors: | Galea, Kristian (2012) |
Keywords: | Information retrieval Semantic computing Internet Data mining |
Issue Date: | 2012 |
Citation: | Galea, K. (2012). A web browser history indexing and retrieval system based on named entity co-reference resolution within and across documents (Bachelor’s dissertation). |
Abstract: | The more common approaches to information retrieval are performed using a naive approach that simply index information using raw statistical data extracted from the text. The relationship between terms in a document, however, offers a useful layer of indexable data that could aid in returning more relevant results to a query. We tackle the problem of a named entity cross-document co-reference aware retrieval system, using a user's web history as a document corpus. We approach the problem using an intra-document co-reference system and a custom, simplistic cross-document co-reference system which takes into consideration the context of terms in a document and the similarity of those contexts between possibly co-referring terms. We aim to achieve higher recall using co-reference aware retrieval over that of naive information retrieval systems, particularly in the event that a document would not have been returned for a given query because of the lack of explicit presence of the query term within the document. We aim to use co-reference resolution across documents in IR such that when a query is searched, we will still be able to return documents with consideration for the query term and that of its co-referring entities with adequate ranking Our retrieval system uses a modification of the BM25 scoring algorithm that takes into consideration the presence and relevance of the co-referring terms to determine the rank of a document in a result set. We build an application to evaluate our approach. The results we collect are indicative that given an optimal combination of parameters we can indeed tune the system to provide a set of results which have marginally better recall in co-referring documents (documents which do not contain the query term explicitly, but contains any number of co-referring terms) than that of a naive retrieval system. However, in the majority of cases we note that co-reference resolution can clutter the results to the extent that irrelevant documents with vague/generic relations to the query are returned with high ranking, consequently lowering recall. |
Description: | B.Sc. IT (Hons)(Melit.) |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/94047 |
Appears in Collections: | Dissertations - FacICT - 2012 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
B.SC.(HONS)ICT_Galea_Kristian_2012.PDF Restricted Access | 6.1 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.