Tree-structured template generation for Web pages

Shui Lung Chuang*, Jane Yung Jen Hsu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

As the web becomes an increasingly important source of information, tools for modeling, searching, and extracting information from Web pages are indispensable. By modeling the structure of a Web page defined by its markup tags, one can easily extract target information using structural templates. This paper introduces the Tree Template Automatic Generator (TTAG) that learns tree-structured templates from training Web pages. TTAG was applied to both query-based and frequently updated Web sites, and produced effective templates from a small number of examples. The experiments show that TTAG is a powerful extraction tool for semi-structured information sources.

Original languageEnglish
Title of host publicationProceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004
EditorsN. Zhong, H. Tirri, Y. Yao, L. Zhou
Pages327-333
Number of pages7
StatePublished - 2004
Externally publishedYes
EventProceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004 - Beijing, China
Duration: 20 09 200424 09 2004

Publication series

NameProceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004

Conference

ConferenceProceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004
Country/TerritoryChina
CityBeijing
Period20/09/0424/09/04

Fingerprint

Dive into the research topics of 'Tree-structured template generation for Web pages'. Together they form a unique fingerprint.

Cite this