Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/108346
Title: Educational question answering using information retrieval and machine reading comprehension techniques
Authors: Caruana, Roberta (2022)
Keywords: Natural language processing (Computer science)
Machine learning
Information storage and retrieval systems
Issue Date: 2022
Citation: Caruana, R. (2022). Educational question answering using information retrieval and machine reading comprehension techniques (Master's dissertation).
Abstract: Question answering systems focus on providing concise and accurate answers in response to natural language queries. A branch of such systems appear in education which aim to improve the students’ learning experience while reducing the professors’ workload. In contrast to search-engines, these systems aim to reduce information overload by presenting focused answers based on lecture resources. Despite research efforts on educational question answering systems, those based on recent text-based approaches are limited. Recent retriever-reader approaches leveraging information retrieval and machine reading comprehension techniques revolutionised open-domain question answering research, making these techniques of interest to explore within the educational domain. An open-domain Retriever-Ranker-Reader question answering pipeline system using Wikipedia as the main knowledge source was considered for the initial phase of this work. Inspired from previous works, traditional information retrieval models along with neural paragraph ranker and document reader models were chosen with the aim of evaluating them in the overall pipeline to determine the best configuration setup. The neural models were trained on the SQuAD dataset where EM, F1 and answer recall metrics were used to assess model performance. Furthermore, different embedding representations (GloVe, FastText and ELMo), applying the multitask dataset, and answer aggregation parameter tuning were applied during the test experiments. In addition to SQuADopen, the pipeline systems were evaluated on CuratedTREC, WebQuestions and WikiMovies datasets. The empirical results obtained were then analysed through statistical testing which helped in determining the best performing open-domain model. The question answering research reviewed lacked the important aspect of performing significance testing, which ensures that the experimental results presented are not coincidental. Analysis of the pipeline experiments showed that the best performing pipeline achieving significant improvements consisted of: the Anserini framework retrieving the top-116 paragraphs, where the ranker was trained using FastText embeddings and the reader was trained using ELMo embeddings. This pipeline achieved improved EM scores over the baseline system on SQuAD (best EM: 36.18; baseline EM: 30.20) and CuratedTREC (best EM: 36.46; baseline EM: 35.40), while a decrease in performance was obtained on WebQuestions (best EM: 13.04; baseline EM: 19.90) and WikiMovies (best EM: 28.82; baseline EM: 39.10). However, in the context of the prototype system, the multitask pipeline retrieving the top-29 paragraphs using Anserini, having both the ranker and reader trained on GloVe was identified as the best pipeline configuration. The latter’s capabilities were tested on the Natural Language Processing study-unit. The NlpQA prototype system deployed for this investigation aimed to answer student queries related to this field, using Wikipedia, Stack Overflow and a popular NLP textbook as the main knowledge sources. Student feedback obtained from 6 students who asked a total of 16 queries indicated acceptable system performance, with Stack Overflow and the textbook being the preferred knowledge sources. Despite the system’s limitations, it has been successful within the scope of educational question answering.
Description: M.Sc.(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/108346
Appears in Collections:Dissertations - FacICT - 2022
Dissertations - FacICTAI - 2022

Files in This Item:
File Description SizeFormat 
2219ICTICS520000005163_1.PDF3.24 MBAdobe PDFView/Open


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.