Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/16972
Title: | Integrating information to bootstrap information extraction from web sites |
Authors: | Ciravegna, Fabio Dingli, Alexiei Guthrie, David Wilks, Yorick |
Keywords: | Information organization Information retrieval -- Automation Data mining Digital libraries Databases |
Issue Date: | 2005 |
Publisher: | International Joint Conferences on Artificial Intelligence Organization |
Citation: | Ciravegna, F., Dingli, A., Guthrie, D., & Wilks, Y. (2003). Integrating information to bootstrap information extraction from web sites. IJCAI-03 Workshop on Information Integration on the Web, 2003. 1-6. |
Abstract: | In this paper we propose a methodology to learn to extract domain-specific information from large repositories (e.g. the Web) with minimum user intervention. Learning is seeded by integrating information from structured sources (e.g. databases and digital libraries). Retrieved information is then used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to train more complex IE engines. All the corpora for training the IE en- gines are produced automatically by integrating in- formation from different sources such as available corpora and services (e.g. databases or digital libraries, etc.). User intervention is limited to providing an initial URL and adding information missed by the different modules when the computation has finished. The information added or delete by the user can then be reused providing further training and therefore getting more information (recall) and/or more precision. We are currently applying this methodology to mining web sites of Computer Science departments. |
URI: | https://www.um.edu.mt/library/oar//handle/123456789/16972 |
Appears in Collections: | Scholarly Works - FacICTAI |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
OA - Integrating Information to Bootstrap Information Extraction from Web Sites.2-7.pdf | Integrating information to bootstrap information extraction from web sites | 109.78 kB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.