Visual Segmentation-Based Data Record Extraction From Web Documents

M. Weatherston, A. Obregon, Longzhuang Li, Yonghuai Liu

Research output: Chapter in Book/Report/Conference proceedingConference Proceeding (Non-Journal item)

29 Citations (Scopus)

Abstract

Semi-structured data records contained in the Web pages provide useful information for shopping agents and metasearch engines. In this paper, we present a visual segmentation-based data record extraction (VSDR) method to extract data records from those Web pages. VSDR method first segments a Web page into semantic blocks using the spatial closeness and visual resemblance of data records, then neighboring and non-neighboring data records are extracted based on a compress and collapse technique. Experimental results slum that unlike the existing methods which only generate good results on their test domains, VSDR is a general data record extraction method that is able to produce quite stable and good results on a wide range of Web pages.
Original languageEnglish
Title of host publicationInternational Conference on Information Reuse and Itegration
Pages502-507
Number of pages6
ISBN (Electronic)1-4244-1500-4
DOIs
Publication statusPublished - 13 Aug 2007
EventInternational Conference on Information Reuse and Itegration - Las Vegas, United Kingdom of Great Britain and Northern Ireland
Duration: 13 Aug 200715 Aug 2007

Conference

ConferenceInternational Conference on Information Reuse and Itegration
Country/TerritoryUnited Kingdom of Great Britain and Northern Ireland
CityLas Vegas
Period13 Aug 200715 Aug 2007

Fingerprint

Dive into the research topics of 'Visual Segmentation-Based Data Record Extraction From Web Documents'. Together they form a unique fingerprint.

Cite this