Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/94043
Title: | Application and viability of document classification techniques |
Authors: | Mercieca, Joanna (2008) |
Keywords: | Semantic Web Electronic records Machine learning |
Issue Date: | 2008 |
Citation: | Mercieca, J. (2008). Application and viability of document classification techniques (Bachelor's dissertation). |
Abstract: | Many businesses today are finding it more feasible to transform their paper-based elements of their workflows into fully electronic workflows. Soft-copies of documents are becoming the primary source of information. This 'soft-archiving is being done to address cost and regulatory concerns. This has resulted in a bigger volume of documents that have to be sorted by individual employees. Besides being time-consuming, this sorting process tends also to be error-prone, if done manually by the employees. For the company it is also an expensive process since employees are derailed from their primary tasks doing chores outside their real job-description. An automated system that takes on this secondary yet important classification process would thus be ideal; potentially leading to an increase in productivity and efficiency. Software solutions using artificial intelligence and natural language processing techniques are emerging to classify documents into right categories. Each of these techniques has demonstrated merits and limitations. This thesis provides a broad overview of the various methodologies to classify documents, by investigating natural language techniques, more specifically Text Classification (TC) algorithms. Subsequently, based on this overview, three methods are adopted and implemented into one comprehensive 'document categoriser. Contrary, to previous implementations, the resulting system is not limited by application. In this instance, the resulting system is tested in a company business framework. The various merits and limitations of the different TC algorithms are discussed and compared with other implementations. |
Description: | B.Sc. IT (Hons)(Melit.) |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/94043 |
Appears in Collections: | Dissertations - FacICT - 1999-2009 Dissertations - FacICTCS - 2008 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
B.SC.(HONS)IT_Mercieca_Joanna_2008.pdf Restricted Access | 13.98 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.