Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/91449
Title: Clustering search engine results using LSA (Latent Semantic Analysis)
Authors: Mercieca, Daniel (2014)
Keywords: Latent structure analysis
Algorithms
Search engines
Issue Date: 2014
Citation: Mecieca, D. (2014). Clustering search engine results using LSA (Latent Semantic Analysis) (Bachelor's dissertation).
Abstract: Search engine result pages contain a list of search results, often comprised of titles of web documents and short texts called 'snippets' used to represent them. These results are conventionally presented in the form of a ranked list, ranked by their relevance to the search term. When a search query is ambiguous, the search results will be related to different topics; related links are not grouped together. We have built a search result retrieval system that can perfom on-demand clustering of search results. The system can also perform automatic query expansion by allowing users to 'drill down' their search, by getting a new page of results relevant to a cluster of results. This clustering is performed using combinations of a modified K-Means clustering algorithm and Latent Semantic Analysis (LSA), a mathematical technique used to extract and represent the meanings of terms and passages. The final application presents search results in the form of clusters, divided according to meaning, and ranked according to the relevance to the query. The system also generates a 'more like this' link above each cluster, allowing the user to receive results related to that particular cluster via query expansion. The effectiveness of our system was investigated using time-based comparison tests and questionnaires. Although strong claims cannot be made due to the limited scope of our research, our approach was found to be more effective than the ranked list approach 67% of the time. Test users also indicated that they preferred the clustered approach, and that they preferred the clusters generated when LSA was enabled. The results of these tests show that our approach offers improvements over the conventional ranked list approach by making a page of search results easier to read, and by facilitating the refining of search queries.
Description: B.SC.ICT(HONS)ARTIFICIAL INTELLIGENCE
URI: https://www.um.edu.mt/library/oar/handle/123456789/91449
Appears in Collections:Dissertations - FacICT - 2014
Dissertations - FacICTAI - 2002-2014

Files in This Item:
File Description SizeFormat 
B.SC.(HONS)ICT_Mercieca_Daniel_2014.PDF
  Restricted Access
7.66 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.