IJSRP Logo
International Journal of Scientific and Research Publications

IJSRP, Volume 3, Issue 9, September 2013 Edition [ISSN 2250-3153]


Effective and Enhanced method for Template Extraction from Heterogeneous Web Pages
      P. Rajeswari, A. Gandhirajan and R.Senthil
Abstract: To achieve high productivity publishing the web pages are automatically evaluated using common templates with contents. The templates provide readers easy access to the contents guided by consistent structures. Cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously. This process proposes to represent the document and a template as a set of paths in a DOM (Data Object Model) tree. As validated by the most popular XML query language XPATH, paths are sufficient to express tree structures and useful to be queried. Our experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to the state of the art for template detection algorithms.

Reference this Research Paper (copy & paste below code):

P. Rajeswari, A. Gandhirajan and R.Senthil (2018); Effective and Enhanced method for Template Extraction from Heterogeneous Web Pages; Int J Sci Res Publ 3(9) (ISSN: 2250-3153). http://www.ijsrp.org/research-paper-0913.php?rp=P211805
©️ Copyright 2011-2023 IJSRP - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.