Exploring parameter-efficient adapters for low-resource automatic speech recognition

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/120551

Full metadata record

DC Field	Value	Language
dc.date.accessioned	2024-04-09T05:50:52Z	-
dc.date.available	2024-04-09T05:50:52Z	-
dc.date.issued	2023	-
dc.identifier.citation	Samin, A.M. (2023). Exploring parameter-efficient adapters for low-resource automatic speech recognition (Master's dissertation).	en_GB
dc.identifier.uri	https://www.um.edu.mt/library/oar/handle/123456789/120551	-
dc.description	M.Sc. (HLST)(Melit.)	en_GB
dc.description.abstract	Parameter-efficient adapter modules have been leveraged in pre-trained speech models for speech processing tasks such as automatic speech recognition (ASR) in recent years. An adapter, integrated into these pre-trained speech models, typically consists of two feed-forward layers that are trained while keeping the pre-trained backbone frozen. Despite their emergence for ASR, a comprehensive exploration of adapters remains lacking, leaving several research questions unanswered. In this thesis, we employ adapter-based tuning on two state-of-the-art pre-trained models, XLS-R and MMS, and compare it with the complete fine-tuning approach. Our study investigates the data requirements for adapter-tuning and reveals that while adapters are unsuited for few-shot learning, they exhibit competitive performance compared to full fine-tuning when at least 10 hours of labeled speech data are available. We also demonstrate that exploiting the larger XLS-R model with 2 billion parameters for adapter-tuning exhibits superior performance than fine-tuning the entire XLS-R 2B model. This phenomenon likely arises due to the susceptibility of larger models to overfitting during full fine-tuning, a challenge effectively circumvented by training only the adapters while leveraging the pre-trained knowledge. Moreover, our experiment reveals that more pre-training data might be helpful for the adapter-tuning to work well. Additionally, we perform separate experiments on transfer learning with adapters and scaling the adapter modules with more feed-forward layers, yielding valuable insights. To the best of our knowledge, this exhaustive study is pioneering in its exploration of adapters for ASR, contributing significant insights to this evolving technology.	en_GB
dc.language.iso	en	en_GB
dc.rights	info:eu-repo/semantics/restrictedAccess	en_GB
dc.subject	Automatic speech recognition	en_GB
dc.subject	Neural networks (Computer science)	en_GB
dc.subject	Feedforward control systems	en_GB
dc.title	Exploring parameter-efficient adapters for low-resource automatic speech recognition	en_GB
dc.type	masterThesis	en_GB
dc.rights.holder	The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.	en_GB
dc.publisher.institution	University of Malta	en_GB
dc.publisher.department	Faculty of Information and Communication Technology. Department of Artificial Intelligence	en_GB
dc.description.reviewed	N/A	en_GB
dc.contributor.creator	Samin, Ahnaf Mozib (2023)	-
Appears in Collections:	Dissertations - FacICT - 2023 Dissertations - FacICTAI - 2023

Files in This Item:

File	Description	Size	Format
2318ICTCSA531005079269_1.PDF Restricted Access		1.12 MB	Adobe PDF	View/Open Request a copy

Show simple item record Statistics