Integrating information to bootstrap information extraction from web sites

Ciravegna, Fabio; Dingli, Alexiei; Guthrie, David; Wilks, Yorick

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/16972

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ciravegna, Fabio
dc.contributor.author	Dingli, Alexiei
dc.contributor.author	Guthrie, David
dc.contributor.author	Wilks, Yorick
dc.date.accessioned	2017-03-04T20:00:18Z
dc.date.available	2017-03-04T20:00:18Z
dc.date.issued	2005
dc.identifier.citation	Ciravegna, F., Dingli, A., Guthrie, D., & Wilks, Y. (2003). Integrating information to bootstrap information extraction from web sites. IJCAI-03 Workshop on Information Integration on the Web, 2003. 1-6.	en_GB
dc.identifier.uri	https://www.um.edu.mt/library/oar//handle/123456789/16972
dc.description.abstract	In this paper we propose a methodology to learn to extract domain-specific information from large repositories (e.g. the Web) with minimum user intervention. Learning is seeded by integrating information from structured sources (e.g. databases and digital libraries). Retrieved information is then used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to train more complex IE engines. All the corpora for training the IE en- gines are produced automatically by integrating in- formation from different sources such as available corpora and services (e.g. databases or digital libraries, etc.). User intervention is limited to providing an initial URL and adding information missed by the different modules when the computation has finished. The information added or delete by the user can then be reused providing further training and therefore getting more information (recall) and/or more precision. We are currently applying this methodology to mining web sites of Computer Science departments.	en_GB
dc.language.iso	en	en_GB
dc.publisher	International Joint Conferences on Artificial Intelligence Organization	en_GB
dc.rights	info:eu-repo/semantics/openAccess	en_GB
dc.subject	Information organization	en_GB
dc.subject	Information retrieval -- Automation	en_GB
dc.subject	Data mining	en_GB
dc.subject	Digital libraries	en_GB
dc.subject	Databases	en_GB
dc.title	Integrating information to bootstrap information extraction from web sites	en_GB
dc.type	conferenceObject	en_GB
dc.rights.holder	The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.	en_GB
dc.bibliographicCitation.conferencename	IJCAI-03 Workshop on Information Integration on the Web	en_GB
dc.bibliographicCitation.conferenceplace	Acapulco, Mexico, 9-10/08/2003	en_GB
dc.description.reviewed	peer-reviewed	en_GB
Appears in Collections:	Scholarly Works - FacICTAI

Files in This Item:

File	Description	Size	Format
OA - Integrating Information to Bootstrap Information Extraction from Web Sites.2-7.pdf	Integrating information to bootstrap information extraction from web sites	109.78 kB	Adobe PDF	View/Open

Show simple item record Statistics