Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/103266
Title: | Experiments with document retrieval from small text collections using latent semantic analysis or term similarity with query coordination and automatic relevance feedback |
Other Titles: | Semantic keyword-based search on structured data sources. IKC 2016. Lecture notes in computer science |
Authors: | Layfield, Colin Azzopardi, Joel Staff, Chris |
Keywords: | Log-linear models -- Computer programs Semantics Information retrieval Latent semantic indexing |
Issue Date: | 2017 |
Publisher: | Springer International Publishing AG |
Citation: | Layfield, C., Azzopardi, J., & Staff, C. (2017). Experiments with document retrieval from small text collections using latent semantic analysis or term similarity with query coordination and automatic relevance feedback. In A. Calì, D. Gorgan, & M. Ugarte (Eds.), Semantic Keyword-Based Search on Structured Data Sources. IKC 2016. Lecture Notes in Computer Science, vol 10151. (pp. 25-36). Cham: Springer. |
Abstract: | Users face the Vocabulary Gap problem when attempting to retrieve relevant textual documents from small databases, especially when there are only a small number of relevant documents, as it is likely that different terms are used in queries and relevant documents to describe the same concept. To enable comparison of results of different approaches to semantic search in small textual databases, the PIKES team constructed an annotated test collection and Gold Standard comprising 35 search queries and 331 articles. We present two different possible solutions. In one, we index an unannotated version of the PIKES collection using Latent Semantic Analysis (LSA) retrieving relevant documents using a combination of query coordination and automatic relevance feedback. Although we outperform prior work, this approach is dependent on the underlying collection, and is not necessarily scalable. In the second approach, we use an LSA Model generated by SEMILAR from a Wikipedia dump to generate a Term Similarity Matrix (TSM). Queries are automatically expanded with related terms from the TSM and are submitted to a term-by-document matrix Vector Space Model of the PIKES collection. Coupled with a combination of query coordination and automatic relevance feedback we also outperform prior work with this approach. The advantage of the second approach is that it is independent of the underlying document collection. |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/103266 |
Appears in Collections: | Scholarly Works - FacICTAI |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Experiments_with_document_retrieval_from_small_text_collections_using_latent_semantic_analysis_or_term_similarity_with_query_coordination_and_automatic_relevance_feedback_2017.pdf Restricted Access | 194.12 kB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.