MASRI
We are interested in a broad variety of problems in Speech Recognition for under-resourced languages, especially Maltese.
Aħna interessati f'diversi sfidi konnessi mat-traskrizzjoni awtomatika b'lingwi bi ftit riżorsi, speċjalment il-Malti.
Support | Fondi
MASRI is supported by a University of Malta Research Fund Excellence grant.
MASRI għandu l-fondi mill-Fond tar-Riċerka tal-Università ta' Malta.
Data
We are creating speech corpora and investigating data augmentation techniques.
Qed noħolqu korpora tat-taħdit u ninvestigaw mezzi biex nawmentaw id-data.
Technologies | Technoloġiji
We investigate neural techniques for speech-to-text, forced alignment etc.
Qed ninvestigaw teknoloġija bbażata fuq xbiek newrali għat-traskrizzjoni awtomatika, forced alignment, eċċ.
Neħtieġu l-għajnuna tiegħek
Biex nibnu sistemi li jużaw il-vuċi aħjar neħtieġu għadd kbir ta' recordings ta' taħdit bil-Malti. Għaldaqstant, qed nużaw il-Common Voice, proġett tal-Mozilla mfassal sabiex jgħin fil-ħolqien ta' sistemi tat-taħdit għal-lingwi kollha mitkellma madwar id-dinja. Permezz ta' dan, qed niġbru kampjuni ta' taħdit minn kelliema madwar Malta u Għawdex.
- Agħfas il-buttuna biex iżżur il-paġna ta' Common Voice.
- Qis li tagħżel il-Malti bħala l-lingwa li trid tuża.
- Jekk trid, oħloq kont. Dan iħallik tippersonalizza l-esperjenza u tneħħi kwalunkwe data li tkun tajtna.
- Ibda rrekordja l-vuċi tiegħek billi taqra ftit sentenzi kuljum.
- Isma' r-rekordings ta' ħaddieħor u għidilna humiex tajbin jew le.
We need your help
In order to build better voice-activated systems, we need huge amounts of spoken data in Maltese. We are therefore using Common Voice, a project by Mozilla designed to help in the construction of voice-activated systems for all the languages of the world. We are using this in order to collect samples of speech from speakers all over the Maltese islands.
- Click the button to visit the Common Voice page.
- Make sure you choose Maltese (Malti) as your language.
- If you want, you can create an account. This allows you to personalise the experience, but also to delete your data should you wish to.
- Start recording your voice by reading a few sentences a day.
- Listen to recordings made by other people and tell us if they're good or not.
Data
Data and resources created in the MASRI project.
Dati
Data u riżorsi maħluqa fil-proġett MASRI.
MASRI Data Repo
MASRI has released several corpora of paired text and speech. For full details of the datasets, check our Github repo.
Currently, this data is only being released for non-commercial purposes.
Il-proġett MASRI nieda diversi datasets ta' taħdit bit-traskrizzjonijiet. Għal aktar informazzjoni ara d-dettalji fuq ir-repożitorju tal-Github tal-proġett.
Bħalissa, id-data hija disponibbli biss għal skopijiet mhux kummerċjali.
Download | Niżżel
To download the data, please read our terms and conditions.
Biex tniżżel id-data, aqra t-termini u l-kundizzjonijiet.
Ready to download? Click the button below.
Lest biex tniżżel? Agħfas il-buttuna hawn taħt.
G2P Tool for Maltese
Sistema G2P għall-Malti
Python 3 grapheme-to-phoneme tool (G2P). Code is available on Github. A demo and web service interface is hosted on the Maltese Language Resource Server.
Implimentazzjoni bil-Python 3 ta' programm għat-traskrizzjoni fonetika. Kodiċi disponibbli fuq Github. . Tista' tara' demo u tikseb aċċess permezz ta' web service mis-Server għar-Riżorsi Lingwistiċi Maltin.
MASRI-HEADSET
Corpus of 8 hours of paired Maltese speech and text.
To obtain the dataset, kindly contact us.
Note that this dataset is superseded by our full data release.
Korpus ta' 8 sigħat ta' taħdit bil-Malti, flimkien mat-traskrizzjonijiet.
Dan id-dataset huwa sorpassat, mindu nidejna d-data kompluta.
MASRI-HEADSET splits
Taqsimiet tal-MASRI-HEADSET
These files reproduce the train/test experiments on MASRI-HEADSET, as described in this paper.
Dawn il-fajls jirriproduċu t-taqsimiet fil-MASRI-HEADSET użati fl-esperimenti rrappurtati f'dan l-artiklu.
Publications
Project research papers
Pubblikazzjonijiet
Artikli dwar ir-riċerka tal-proġett.
Mena, C; Gatt, A; DeMarco, A; Borg, C; van der Plas, L; Muscat, A and Padovani, I. (2020). MASRI-HEADSET: A Maltese corpus for speech recognition. Proceedings of the 12th edition of the Language Resources and Evaluation Conference (LREC'20). Marseille, France: ELRA.
People
Project members and associated research assistants.
Membri
Membri tal-proġett u assistenti tar-riċerka.
Albert Gatt
Andrea DeMarco
Claudia Borg
Lonneke van der Plas
Carlos Mena
Alexandra Vella
Amanda Muscat
Ian Padovani
Kirsty Azzopardi
Ayrton Didier Brincat
Contact
Ikkuntattjana
Institute of Linguistics and Language Technology
University of Malta
Msida MSD2080
masri {at} um.edu.mt