Linguistics Circle Seminar June 2022

In Seminars
09:41, 06 Jun 2022

Event: Linguistics Circle Seminar

Date: 15 June 2022

Time: 11:30 - 12:30

Venue: Virtual & ICT Building, Level 1 Block B Room 38

BERTu: New Corpus & BERT Models for Maltese

The Talk host will be Mr Kurt Micallef (RSO, University of Malta).

Large pre-trained language models have become a core component in many Natural Language Processing (NLP) tasks. BERT is one such which has gained popularity, due to its state-of-the-art performance in a variety of downstream tasks and the relatively simple architecture needed to fine-tune it to a particular task. Although this model is specific to English, other variants of BERT have been released for other languages (e.g. CamemBERT for French, AraBERT for Arabic) as well as multiple languages at the same time (e.g. multilingual BERT).

In this talk, I will introduce newly developed language models for Maltese — BERTu and mBERTu. BERTu was trained on a new version of the Korpus Malti, containing approximately 466 million tokens (2.52GB), which is also part of this work.

We will go over the conceptual ideas on how this kind of models make use of these large corpora to learn language representations. I will present the state-of-the-art results that the new models obtain on various syntactic tagging benchmarks as well as an evaluation on a sentiment analysis dataset. Furthermore, I will provide insights on how using different pre-training data sizes and domains affects the downstream performance.

This event will be taking place in a hybrid manner, if you wish to attend on campus the venue is ICT Boardroom whilst if you are unable to attend physically the below Zoom link can be accessed for this event.
Meeting ID: 983 0665 5581
Passcode: 914155

References:

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., NAACL 2019)

This Seminar is being hosted by the Institute of Linguistics & Language Technology.

Linguistics Circle Seminar June 2022

See More

UM academic delivers keynote speech at the CopaGloba concluding seminar

Seminar: Using Deep Learning to find needles in a genomic junkyard

Third LGBTIQ Research Symposium

Categories

Quicklinks

Information about

Information for

Quicklinks

Contact us