Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/107910
Title: | Automatic transcription and summarisation of multi-speaker meetings |
Authors: | Dimech, Mikea (2022) |
Keywords: | Transcription Automatic speech recognition Natural language processing (Computer science) Transfer learning (Machine learning) Deep learning (Machine learning) |
Issue Date: | 2022 |
Citation: | Dimech, M. (2022). Automatic transcription and summarisation of multi-speaker meetings (Bachelor's dissertation). |
Abstract: | The task of automatic transcription of dialogue within the context of multispeaker meetings and the ensuing summarisation of the key points discussed proves to be universally valuable to businesses and organisations. This work aims to apply recent advances in Natural Language Processing (NLP) to solve this problem, implementing a pipeline of modules incorporating Speaker Diarisation, Automatic Speech Recognition (ASR), Text Enhancement and Text Summarisation. Following an in-depth analysis of numerous relevant state-of-the-art models and techniques utilising Transfer Learning, a system was implemented that achieves this goal, successfully transcribing and summarising meeting recordings from a corpus of business meetings. Transcription proved to be challenging due to non-ideal acoustic conditions and regular speech overlaps, which tend to be common occurrences in meeting scenarios. The summarisation task was also non-trivial due to the irregular structure of conversational text, with the main points being scattered across several speakers and utterances. Two Transformer-based Deep Learning (DL) models were trained for the summarisation task on data obtained from a dataset of meeting transcripts, which was appended by summaries generated by the robust GPT-3 model. This process was carried out in order to achieve the results of a highresource model with significantly less data. In the process, this increased the robustness of the summarisation model towards noise and errors in the input transcript. Subsequently, the system was evaluated as a whole via an experimentation process, altering various parts of the pipeline and examining these changes on the overall output of the system. These tests were carried out in order to identify the key difficulties and viable approaches to addressing them. The prevalence of transcript noise, excessive transcript lengths, and the capacity of summarisation models to generalise to domains other than those seen in the dataset were amongst the issues. We outline the lack of large-scale multi-domain meeting data as a hindrance to the development of robust systems in this field. Apart from obtaining such corpora, future work should focus on end-to-end neural approaches as well as the pre-training of large Transformer models on data that incorporates dialogue and non-conventional linguistic structures. |
Description: | B.Sc. IT (Hons)(Melit.) |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/107910 |
Appears in Collections: | Dissertations - FacICT - 2022 Dissertations - FacICTAI - 2022 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2208ICTICT390905069138_1.PDF Restricted Access | 4.18 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.