IJSRP, Volume 4, Issue 7, July 2014 Edition [ISSN 2250-3153]
Sankalp Mitra, Suchit Bande, Shreyas Kudale, Advait Kulkarni, Asst. Prof. Leena A. Deshpande
Abstract:
As an important part of discovering association rules, frequent itemsets mining plays a key role in mining associations, correlations, causality and other important data mining tasks. Since some traditional frequent itemsets mining algorithms are unable to handle massive small files datasets effectively, such as high memory cost, high I/O overhead, and low computing performance, an improved Parallel FP-Growth (IPFP) algorithm and discuss its applications in this paper. In particular, a small files processing strategy for massive small files datasets to compensate defects of low read/write speed and low processing efficiency in Hadoop. Moreover, use of MapReduce to implement the parallelization of FP-Growth algorithm, thereby improving the overall performance of frequent itemsets mining. The experimental results show that the IPFP algorithm is feasible and valid with a good speedup and a higher mining efficiency, and can meet the rapidly growing needs of frequent itemsets mining for massive small files datasets.