Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/94032
Title: Automatic sentence compression system (ASCS)
Authors: Galea, Matthew (2008)
Keywords: Natural language processing (Computer science)
Data structures (Computer science)
Issue Date: 2008
Citation: Galea, M. (2008). Automatic sentence compression system (ASCS) (Bachelor's dissertation).
Abstract: Information comes in abundance and it is very hard to find the right information due to information overload. Information overload is the state of having too much information and being unable to make a good selection. Summarisers reduce information overload and are applied to text or speech. Sentence compression is the task of shortening sentences such that important information is kept and the compression remains coherent and grammatically correct. Sentence compression components are essential for good automatic summarisers. In this project a sentence compression system has been designed which employs three word-based algorithms that compress sentences without changing the word order. The algorithms make use of rules extracted from parallel corpora. The highest mean sentence score achieved was of 80.94%. The sentence score is a value that is based on grammaticality, fluency, understandability and faithfulness. Two human judges have been asked to evaluate a subset of the testing data and score the compressions based on the criteria mentioned previously. This dissertation gives a detailed description of previous work in the area, the implemented system and the improvements that could be made to the task of sentence compression.
Description: B.SC.ICT(HONS)ARTIFICIAL INTELLIGENCE
URI: https://www.um.edu.mt/library/oar/handle/123456789/94032
Appears in Collections:Dissertations - FacICT - 1999-2009
Dissertations - FacICTAI - 2002-2014

Files in This Item:
File Description SizeFormat 
B.SC.(HONS)IT_Galea_Matthew_2008.PDF
  Restricted Access
20.96 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.