Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/114786
Title: Exploring the impact of transliteration on NLP performance : treating Maltese as an Arabic dialect
Authors: Micallef, Kurt
Eryani, Fadhl
Habash, Nizar
Bouamor, Houda
Borg, Claudia
Keywords: Text processing (Computer science) -- Malta
Transliteration
Artificial intelligence
Translating and interpreting -- Technological innovations
Issue Date: 2023
Publisher: Association for Computational Linguistics
Citation: Micallef, K., Eryani, F., Habash, N., Bouamor, H. & Borg, C. (2023). Exploring the Impact of Transliteration on NLP Performance: Treating Maltese as an Arabic Dialect. Workshop on Computation and Written Language (CAWL 2023), Toronto. 22-32.
Abstract: Multilingual models such as mBERT have been demonstrated to exhibit impressive crosslingual transfer for a number of languages. Despite this, the performance drops for lowerresourced languages, especially when they are not part of the pre-training setup and when there are script differences. In this work we consider Maltese, a low-resource language of Arabic and Romance origins written in Latin script. Specifically, we investigate the impact of transliterating Maltese into Arabic scipt on a number of downstream tasks: Part-of-Speech Tagging, Dependency Parsing, and Sentiment Analysis. We compare multiple transliteration pipelines ranging from deterministic character maps to more sophisticated alternatives, including manually annotated word mappings and non-deterministic character mappings. For the latter, we show that selection techniques using n-gram language models of Tunisian Arabic, the dialect with the highest degree of mutual intelligibility to Maltese, yield better results on downstream tasks. Moreover, our experiments highlight that the use of an Arabic pre-trained model paired with transliteration outperforms mBERT. Overall, our results show that transliterating Maltese can be considered an option to improve the cross-lingual transfer capabilities.
URI: https://www.um.edu.mt/library/oar/handle/123456789/114786
Appears in Collections:Scholarly Works - FacICTAI

Files in This Item:
File Description SizeFormat 
2023.cawl-1.4.pdf449.16 kBAdobe PDFView/Open


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.