A web browser history indexing and retrieval system based on named entity co-reference resolution within and across documents

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/94047

Title:	A web browser history indexing and retrieval system based on named entity co-reference resolution within and across documents
Authors:	Galea, Kristian (2012)
Keywords:	Information retrieval Semantic computing Internet Data mining
Issue Date:	2012
Citation:	Galea, K. (2012). A web browser history indexing and retrieval system based on named entity co-reference resolution within and across documents (Bachelor’s dissertation).
Abstract:	The more common approaches to information retrieval are performed using a naive approach that simply index information using raw statistical data extracted from the text. The relationship between terms in a document, however, offers a useful layer of indexable data that could aid in returning more relevant results to a query. We tackle the problem of a named entity cross-document co-reference aware retrieval system, using a user's web history as a document corpus. We approach the problem using an intra-document co-reference system and a custom, simplistic cross-document co-reference system which takes into consideration the context of terms in a document and the similarity of those contexts between possibly co-referring terms. We aim to achieve higher recall using co-reference aware retrieval over that of naive information retrieval systems, particularly in the event that a document would not have been returned for a given query because of the lack of explicit presence of the query term within the document. We aim to use co-reference resolution across documents in IR such that when a query is searched, we will still be able to return documents with consideration for the query term and that of its co-referring entities with adequate ranking Our retrieval system uses a modification of the BM25 scoring algorithm that takes into consideration the presence and relevance of the co-referring terms to determine the rank of a document in a result set. We build an application to evaluate our approach. The results we collect are indicative that given an optimal combination of parameters we can indeed tune the system to provide a set of results which have marginally better recall in co-referring documents (documents which do not contain the query term explicitly, but contains any number of co-referring terms) than that of a naive retrieval system. However, in the majority of cases we note that co-reference resolution can clutter the results to the extent that irrelevant documents with vague/generic relations to the query are returned with high ranking, consequently lowering recall.
Description:	B.Sc. IT (Hons)(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/94047
Appears in Collections:	Dissertations - FacICT - 2012

Files in This Item:

File	Description	Size	Format
B.SC.(HONS)ICT_Galea_Kristian_2012.PDF Restricted Access		6.1 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics