IJSRP, Volume 5, Issue 12, December 2015 Edition [ISSN 2250-3153]
Prajakta chandgude, Ashwini Bhagwat, Mayuri Autade, Anjali Pansare
Abstract:
k-means is one of the most used clustering algorithms due to its simplicity of understanding and efficiency. However, this algorithm is mostly sensitive to the chosen initial centers and thus a proper initialization is hard for obtaining an ideal solution. To overcome this problem, k-means++ one by one chooses the centers so as to achieve a optimal solution. Due to less scalability, k-means++ is not efficient as the size of data increases. To improve its scalability and efficiency, use MapReduce along with the k-means++ method which can reduce the number of MapReduce jobs by using only one MapReduce job to obtain k centers. In this the k-means++ initialization algorithm is run in the first phase called Mapper phase and secondly the weighted k-means++ initialization algorithm is run in the Reducer phase. As this new MapReduce k-means++ method replaces the instances among multiple machines with a single machine. As this iterations are going to perform on single machine it can reduce the communication and I/O costs significant.