Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/95231
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.date.accessioned | 2022-05-06T07:44:13Z | - |
dc.date.available | 2022-05-06T07:44:13Z | - |
dc.date.issued | 2011 | - |
dc.identifier.citation | Tanti, A. (2011). Automated identification of plots and story structure in unstructured documents (Bachelor's dissertation). | en_GB |
dc.identifier.uri | https://www.um.edu.mt/library/oar/handle/123456789/95231 | - |
dc.description | B.SC.(HONS)IT | en_GB |
dc.description.abstract | Modern information retrieval systems such as search engines are adopting the philosophy of "The less information, the better". This means that if, for example, a user makes a query for the date of birth of the Maltese prime minister, he or she should be given the actual date of birth and not a list of documents about Dr. George Abela. Thus, a whole research area exists which deals with specific information within a document, and this is referred to as information extraction (IE). Text classification (TC) is another area in human language technology (HL T) that is becoming increasingly popular. This area deals with the process of assigning a label to a particular piece of text according to unique features found in the text. The research in this thesis is based on how techniques in the two branches just mentioned can be applied to literature oriented works in the English language. In a nutshell, the system takes a novel written in English as input and returns particular information such as the main characters and their type, the plot type of the story and the character interactions found in the story. To our knowledge, no research was done of this sort and this made it more challenging since all the ideas had to be designed from scratch. The system was evaluated against a human gold standard by taking two distinct scenarios. The first scenario included pronouns that refer to characters (anaphora resolution) as valid character instances in the text, thus increasing the frequency of occurrences of each character in the text. On the other hand, the second scenario excluded pronouns. When considering the lack of research papers found on the subject, the combined F-Measure results of the main tasks are fairly satisfiable; 52% when including pronouns and 57% when excluding pronouns | en_GB |
dc.language.iso | en | en_GB |
dc.rights | info:eu-repo/semantics/restrictedAccess | en_GB |
dc.subject | Information technology | en_GB |
dc.subject | Computer simulation | en_GB |
dc.title | Automated identification of plots and story structure in unstructured documents | en_GB |
dc.type | bachelorThesis | en_GB |
dc.rights.holder | The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder. | en_GB |
dc.publisher.institution | University of Malta | en_GB |
dc.publisher.department | Faculty of Information and Communication Technology | en_GB |
dc.description.reviewed | N/A | en_GB |
dc.contributor.creator | Tanti, Alex (2011) | - |
Appears in Collections: | Dissertations - FacICT - 2011 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
BSC(HONS)ICT_Tanti, Alex_2011.PDF Restricted Access | 14.34 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.