logo

SCIENTIA SINICA Informationis, Volume 49, Issue 10: 1283-1298(2019) https://doi.org/10.1360/N112019-00001

Static duplicate bug-report identification for compilers

More info
  • ReceivedMar 3, 2019
  • AcceptedSep 3, 2019
  • PublishedOct 16, 2019

Abstract

Compiler bug reports are important for guaranteeing compiler quality; however, duplicate bug reports tend to incur extra costs. To identify duplicate bug reports for compilers, we propose a static approach (IdenDup) to identifying duplicate bug reports for compilers. This method effectively identifies duplicate bug reports for compilers in two scenarios (fuzz testing and the bug-management system) by utilizing static text and program information, including lexical features, syntax features, and proposed dataflow features that describe variable-usage path features (i.e., how variables are used and their order). We conducted empirical evaluations of the effectiveness of IdenDup based on the use of GCC and LLVM, with our results demonstrating that IdenDup effectively identified duplicate bug reports in the two scenarios for compilers and outperformed existing approaches.


Funded by

国家重点技术研发计划(2017YFB1001803)

国家自然科学基金(61672047,61872008,61861130363,61922003,61828201)


References

[1] Sun C N, Lo D, Wang X Y, et al. A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of 2010 ACM/IEEE 32nd International Conference on Software Engineering, Cape Town, 2010. 45--54. Google Scholar

[2] Nguyen A T, Nguyen T T, Nguyen T N, et al. Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, Essen, 2012. 70--79. Google Scholar

[3] Alipour A, Hindle A, Stroulia E. A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the 10th Working Conference on Mining Software Repositories, San Francisco, 2013. 183--192. Google Scholar

[4] Chen Y, Groce A, Zhang C Q, et al. Taming compiler fuzzers. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, Seattle, 2013. 197--208. Google Scholar

[5] Ramos J. Using tf-idf to determine word relevance in document queries. In: Proceedings of the 1st Instructional Conference on Machine Learning, 2003. 242: 133--142. Google Scholar

[6] Robertson S. The Probabilistic Relevance Framework: BM25 and Beyond. FNT Inf Retrieval, 2009, 3: 333-389 CrossRef Google Scholar

[7] Tian Y, Sun C N, Lo D. Improved duplicate bug report identification. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering, 2012. 385--390. Google Scholar

[8] Sun C N, Lo D, Khoo S C, et al. Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, 2011. 253--262. Google Scholar

[9] Robertson S, Zaragoza H, Taylor M. Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, Washington, 2004. 42--49. Google Scholar

[10] Wang X Y, Zhang L, Xie T, et al. An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering, Leipzig, 2008. 461--470. Google Scholar

[11] Lerch J, Mezini M. Finding duplicates of your yet unwritten bug report. In: Proceedings of the 17th European Conference on Software Maintenance and Reengineering, 2013. 69--78. Google Scholar

[12] Regehr J, Chen Y, Cuoq P, et al. Test-case reduction for C compiler bugs. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, Beijing, 2012. 335--346. Google Scholar

[13] Pflanzer M, Donaldson A F, Lascu A. Automatic test case reduction for Opencl. In: Proceedings of the 4th International Workshop on OpenCL, Vienna, 2016. Google Scholar

[14] Herfert S, Patra J, Pradel M. Automatically reducing tree-structured test inputs. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, 2017. 861--871. Google Scholar

[15] Sloane A M. Debugging Eli-generated compilers with Noosa. In: Proceedings of International Conference on Compiler Construction. Berlin: Springer, 1999. 17--31. Google Scholar

[16] Yang X J, Chen Y, Eide E, et al. Finding and understanding bugs in C compilers. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, San Jose, 2011. 283--294. Google Scholar

[17] Lidbury C, Lascu A, Chong N, et al. Many-core compiler fuzzing. In: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, 2015. 65--76. Google Scholar

[18] McKeeman W M. Differential testing for software. Digit Tech J, 1998, 10: 100--107. Google Scholar

[19] Le V, Afshari M, Su Z D. Compiler validation via equivalence modulo inputs. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, Edinburgh, 2014. 216--226. Google Scholar

[20] Chen J J, Hu W X, Hao D, et al. An empirical comparison of compiler testing techniques. In: Proceedings of 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), 2016. 180--190. Google Scholar

[21] Chen J J, Bai Y W, Hao D, et al. Test case prioritization for compilers: A text-vector based approach. In: Proceedings of 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST), 2016. 266--277. Google Scholar

[22] Chen J J, Bai Y W, Hao D, et al. Learning to prioritize test programs for compiler testing. In: Proceedings of 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017. 700--711. Google Scholar

[23] Chen J J. Learning to accelerate compiler testing. In: Proceedings of the 40th International Conference on Software Engineering, 2018. 472--475. Google Scholar

[24] Chen J J, Wang G C, Hao D, et al. Coverage prediction for accelerating compiler testing. IEEE Trans Softw Eng, 2019. DOI: 10.1109/TSE.2018.2889771. Google Scholar

[25] Nielson F, Nielson H R, Hankin C. Principles of Program Analysis. Berlin: Springer, 2015. Google Scholar

[26] Kahn A B. Topological sorting of large networks. Commun ACM, 1962, 5: 558-562 CrossRef Google Scholar

[27] Schutze H, Manning C D, Raghavan P. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008. Google Scholar

[28] Chen J J, Han J Q, Sun P Y, et al. Compiler bug isolation via effective witness test program generation. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019. 223--234. Google Scholar

[29] Chen J J, Wang G C, Hao D, et al. History-Guided Configuration Diversification for Compiler Test-Program Generation. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, 2019. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1