Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/119975
Title: Cross-lingual transfer from related languages : treating low-resource Maltese as multilingual code-switching
Authors: Micallef, Kurt
Habash, Nizar
Borg, Claudia
Eryani, Fadhl
Bouamor, Houda
Keywords: Natural language processing (Computer science)
Transliteration
Computational linguistics
Translating and interpreting
Artificial intelligence
Issue Date: 2024-03
Publisher: Association for Computational Linguistics
Citation: Micallef, K., Habash, N., Borg, C., Eryani, F. and Bouamor, H. (2024) Cross-Lingual Transfer from Related Languages: Treating Low-Resource Maltese as Multilingual Code-Switching in European Association of Computational Linguistics (EACL2024), Malta.
Abstract: Although multilingual language models exhibit impressive cross-lingual transfer capabilities on unseen languages, the performance on downstream tasks is impacted when there is a script disparity with the languages used in the multilingual model's pre-training data. Using transliteration offers a straightforward yet effective means to align the script of a resource-rich language with a target language, thereby enhancing cross-lingual transfer capabilities. However, for mixed languages, this approach is suboptimal, since only a subset of the language benefits from the cross-lingual transfer while the remainder is impeded. In this work, we focus on Maltese, a Semitic language, with substantial influences from Arabic, Italian, and English, and notably written in Latin script. We present a novel dataset annotated with word-level etymology. We use this dataset to train a classifier that enables us to make informed decisions regarding the appropriate processing of each token in the Maltese language. We contrast indiscriminate transliteration or translation to mixing processing pipelines that only transliterate words of Arabic origin, thereby resulting in text with a mixture of scripts. We fine-tune the processed data on four downstream tasks and show that conditional transliteration based on word etymology yields the best results, surpassing fine-tuning with raw Maltese or Maltese processed with non-selective pipelines.
URI: https://www.um.edu.mt/library/oar/handle/123456789/119975
Appears in Collections:Scholarly Works - FacICTAI

Files in This Item:
File Description SizeFormat 
EACL_2024___Maltese_Etymology.pdf255.23 kBAdobe PDFView/Open


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.