Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/91702
Title: | Towards polyglot machines : cross-lingual natural language inference |
Authors: | Dalli, Jake J. (2021) |
Keywords: | Natural language processing (Computer science) Semantics -- Data processing Deep learning (Machine learning) Neural networks (Computer science) |
Issue Date: | 2021 |
Citation: | Dalli, J. J. (2021). Towards polyglot machines : cross-lingual natural language inference (Master’s dissertation). |
Abstract: | Inference is a central aspect of Natural Language Processing (NLP); the Natural Language Inference task (NLI), also called Recognizing Textual Entailment (RTE), is the task of determining whether a hypothesis text fragment corroborates (positvely entails), contradicts (negatively entails) or bears no relation to (no entailment) a premise text fragment. Prior work on this task has nearly exclusively focused on the monolingual English inference; in this study, we aim to address cross-lingual NLI. We study the area of cross-lingual natural language inference by addressing two different formulations of the task; cross-lingual transfer, where we explore how an inference model trained for English can be fine-tuned to perform inference in another language; and purely cross-lingual inference, where we train a model to detect inference for sentence pairs in different languages. Within our study, we experiment with two neural network architectures to address these tasks, a bidirectional LSTM and a decomposable attention model, employing aligned word embeddings to represent language. Results show that the bidirectional LSTM neural network performs best across all tasks. Moreover, we also show that employing machine translation to deal with cross-lingual NLI provides the best results. Although the use of word embeddings to encode sentences does not perform as well sentence embeddings, our proposed architecture using word embeddings requires significantly less computational resources due to the lower dimensionality of the embeddings. Our approach presents a results with less than a 10% loss of accuracy, and as little as a 5% loss in the best case, while using a fraction of the computational resources required by solutions employing sentence embeddings. |
Description: | M.Sc.(Melit.) |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/91702 |
Appears in Collections: | Dissertations - FacICT - 2021 Dissertations - FacICTAI - 2021 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
21MAIPT009.pdf | 1.52 MB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.