Abstract
Semi-structured data records contained in the Web pages provide useful information for shopping agents and metasearch engines. In this paper, we present a visual segmentation-based data record extraction (VSDR) method to extract data records from those Web pages. VSDR method first segments a Web page into semantic blocks using the spatial closeness and visual resemblance of data records, then neighboring and non-neighboring data records are extracted based on a compress and collapse technique. Experimental results slum that unlike the existing methods which only generate good results on their test domains, VSDR is a general data record extraction method that is able to produce quite stable and good results on a wide range of Web pages.
| Original language | English |
|---|---|
| Title of host publication | International Conference on Information Reuse and Itegration |
| Pages | 502-507 |
| Number of pages | 6 |
| ISBN (Electronic) | 1-4244-1500-4 |
| DOIs | |
| Publication status | Published - 13 Aug 2007 |
| Event | International Conference on Information Reuse and Itegration - Las Vegas, United Kingdom of Great Britain and Northern Ireland Duration: 13 Aug 2007 → 15 Aug 2007 |
Conference
| Conference | International Conference on Information Reuse and Itegration |
|---|---|
| Country/Territory | United Kingdom of Great Britain and Northern Ireland |
| City | Las Vegas |
| Period | 13 Aug 2007 → 15 Aug 2007 |