Effective and Enhanced method for Template Extraction from Heterogeneous Web Pages

IJSRP, Volume 3, Issue 9, September 2013 Edition [ISSN 2250-3153]

Effective and Enhanced method for Template Extraction from Heterogeneous Web Pages

P. Rajeswari, A. Gandhirajan and R.Senthil

Abstract: To achieve high productivity publishing the web pages are automatically evaluated using common templates with contents. The templates provide readers easy access to the contents guided by consistent structures. Cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously. This process proposes to represent the document and a template as a set of paths in a DOM (Data Object Model) tree. As validated by the most popular XML query language XPATH, paths are sufficient to express tree structures and useful to be queried. Our experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to the state of the art for template detection algorithms.

[VIEW FULL PAPER]

[DOWNLOAD]

[Reference this Paper] [BACK]

Reference this Research Paper (copy & paste below code):

P. Rajeswari, A. Gandhirajan and R.Senthil (2018); Effective and Enhanced method for Template Extraction from Heterogeneous Web Pages; Int J Sci Res Publ 3(9) (ISSN: 2250-3153). http://www.ijsrp.org/research-paper-0913.php?rp=P211805