IJSRP, Volume 5, Issue 11, November 2015 Edition [ISSN 2250-3153]
Srinath T K, Mr. Joby George
Abstract:
Extracting information from semi structured documents is a very hard task, and is going to become more and more critical as the amount of digital information available on the internet grows. Indeed, documents are often so large that the dataset returned as answer to a query may be too big to convey interpretable knowledge. In this work we describe an approach based on Tree-based Association Rules (TARs) mined rules, which provide approximate, intentional information on both the structure and the contents of XML documents, and can be stored in XML format as well. This mined knowledge is later used to provide: (i) a concise idea – the gist – of both the structure and the content of the XML document, (ii) quick, approximate answers to queries and (iii) output without redundancy, null tags and empty tag. In this work we focus on the second and third feature. A prototype system and experimental results demonstrate the effectiveness of the approach.