IJSRP, Volume 4, Issue 5, May 2014 Edition [ISSN 2250-3153]
Phuc Nhan Minh
Duplicate bug report describes problems for which there is already a report in a bug repository. For many open source projects, the number of duplicate reports represents a significant percentage of the repository, so automatic identification of duplicate reports are very important and need let’s avoid wasting time a triager spends in searching for duplicate bug reports of any incoming report. In this paper we want to present a novel approach which it can help better of duplicate bug report identification. The proposed approach has two novel features: firstly, use n-gram features for the task of duplicate bug report detection. Secondly, apply cluster shrinkage technique to improve the detection performance. We tested our approach on three popular open source projects: Apache, Argo UML, and SVN. We have also conducted empirical studies. The experimental results show that the proposed scheme can effectively improve the detection performance compared with previous methods.