Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/93295
Title: Process fact extraction from the web
Authors: Buhagiar, Stephanie (2008)
Keywords: Natural language processing (Computer science)
Data logging
Computer networks -- Security measures
Ontologies (Information retrieval)
Issue Date: 2008
Citation: Buhagiar, S. (2008). Process fact extraction from the web (Bachelor's dissertation).
Abstract: The web contains large quantities of information regarding processes running on our computers. In this work, we apply Ontology-based Information Extraction to the task of building profiles of executable files and dynamic link libraries. A profile is made up of facts about the process as well as a list of keywords or concepts describing its behavior. We combine evidence on a single webpage to extract facts and categorize processes reliably. By assigning a concept to each relevant sentence in the process description, we obtain a summary of relevant concepts for each process. Such concepts include "System", "Backdoor", "Keylogger" and "Mail Propagation" among others. We make use of Natural Language Processing techniques combined with Machine Learning techniques for Ontologies to classify our description. A Naive Bayes classifier assigns a category to each profile, based on the information extracted. Similar process descriptions are combined, thus using redundancy to increase reliability rather than information overload. We then evaluate each module in isolation and measure the effectiveness and accuracy of the entire system by comparing it to a human performing the same extraction task.
Description: B.SC.ICT(HONS)ARTIFICIAL INTELLIGENCE
URI: https://www.um.edu.mt/library/oar/handle/123456789/93295
Appears in Collections:Dissertations - FacICT - 1999-2009
Dissertations - FacICTAI - 2002-2014

Files in This Item:
File Description SizeFormat 
B.SC.(HONS)IT_Buhagiar_Stephanie_2008.pdf
  Restricted Access
14.28 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.