logo

SCIENCE CHINA Information Sciences, Volume 62, Issue 10: 200102(2019) https://doi.org/10.1007/s11432-018-1465-6

A manual inspection of Defects4J bugs and its implications for automatic program repair

More info
  • ReceivedOct 25, 2018
  • AcceptedJun 21, 2019
  • PublishedSep 6, 2019

Abstract

Automatic program repair techniques, which target to generate correct patches for real-world defects automatically, have gained a lot of attention in the last decade. Many different techniques and tools have been proposed and developed. However, even the most sophisticated automatic program repair techniques can only repair a small portion of defects while producing a large number of incorrect patches. A possible reason for the low performance is the test suites of real-world programs are usually too weak to guarantee the behavior of a program. To understand to what extent defects can be fixed with exiting test suites, we manually analyzed 50 real-world defectsfrom Defects4J, where a large portion (i.e., 82%) of them were correctly fixed This result suggests that there is much roomfor the current automatic program repair techniques to improve. Furthermore, we summarized seven fault localization and seven patch generation strategies that are useful in localizing and fixing these defects, and compared those strategies with current techniques. The results indicate potential directions to improve automatic program repair in the future.


Acknowledgment

This work was supported by National Key Research and Development Program of China (Grant No. 2017YFB1001803) and National Natural Science Foundation of China (Grant No. 61672045).


References

[1] Mei H, Zhang L. Can big data bring a breakthrough for software automation?. Sci China Inf Sci, 2018, 61: 056101 CrossRef Google Scholar

[2] Le Goues C, Nguyen T V, Forrest S. GenProg: A Generic Method for Automatic Software Repair. IIEEE Trans Software Eng, 2012, 38: 54-72 CrossRef Google Scholar

[3] Long F, Rinard M. Staged program repair with condition synthesis. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015. 166--178. Google Scholar

[4] Xiong Y F, Wang J, Yan R F, et al. Precise condition synthesis for program repair. In: Proceedings of the 39th International Conference on Software Engineering. New York: IEEE, 2017. 416--426. Google Scholar

[5] Mechtaev S, Yi J, Roychoudhury A. Angelix: scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 691--701. Google Scholar

[6] Kim D, Nam J, Song J, et al. Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 International Conference on Software Engineering. New York: IEEE, 2013. 802--811. Google Scholar

[7] Thien Nguyen H D, Qi D, Roychoudhury A, et al. Semfix: program repair via semantic analysis. In: Proceedings of the 2013 International Conference on Software Engineering. New York: IEEE, 2013. 772--781. Google Scholar

[8] Mechtaev S, Yi J, Roychoudhury A. Directfix: looking for simple program repairs. In: Proceedings of the 37th International Conference on Software Engineering. New York: IEEE, 2015. 448--458. Google Scholar

[9] Gao Q, Zhang H S, Wang J, et al. Fixing recurring crash bugs via analyzing q&a sites (t). In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering, 2015. 307--318. Google Scholar

[10] Long F, Amidon P, Rinard M. Automatic inference of code transforms for patch generation. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2017. 727--739. Google Scholar

[11] Rolim R, Soares G, Dantoni L, et al. Learning syntactic program transformations from examples. In: Proceedings of the 39th International Conference on Software Engineering. New York: IEEE, 2017. 404--415. Google Scholar

[12] Long F, Rinard M. Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. New York: ACM, 2016. 298--312. Google Scholar

[13] Abreu R, Zoeteweij P, van Gemund A J. On the accuracy of spectrum-based fault localization. In: Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques, 2017. 89--98. Google Scholar

[14] Abreu R, Zoeteweij P, Van Gemund A J. Spectrum-based multiple fault localization. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009. 88--99. Google Scholar

[15] Zhang X Y, Gupta N, Gupta R. Locating faults through automated predicate switching. In: Proceedings of the 28th International Conference on Software Engineering. New York: ACM, 2006. 272--281. Google Scholar

[16] Chandra S, Torlak E, Barman S, et al. Angelic debugging. In: Proceedings of the 33rd International Conference on Software Engineering. New York: ACM, 2011. 121--130. Google Scholar

[17] Marcote S L, Durieux T, Le Berre D. Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Trans Softw Eng, 2016, 43: 34--55. Google Scholar

[18] Perkins J H, Kim S, Larsen S, et al. Automatically patching errors in deployed software. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. New York: ACM, 2009. 87--102. Google Scholar

[19] Wen M, Chen J J, Wu R X, et al. Context-aware patch generation for better automated program repair. In: Proceedings of the 40th International Conference on Software Engineering. New York: ACM, 2018. 1--11. Google Scholar

[20] Jiang J J, Xiong Y F, Zhang H Y, et al. Shaping program repair space with existing patches and similar code. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2018. 298--309. Google Scholar

[21] Liu C, Yang J Q, Tan L, et al. R2fix: Automatically generating bug fixes from bug reports. In: Proceedings of the 2013 IEEE 6th International Conference on Software Testing, Verification and Validation, 2013. 282--291. Google Scholar

[22] Le Goues C, Dewey-Vogt M, Forrest S, et al. A systematic study of automated program repair: Fixing 55 out of 105 bugs for 8 each. In: Proceedings of the 34th International Conference on Software Engineering. New York: IEEE, 2012. 3--13. Google Scholar

[23] Just R, Jalali D, Ernst M D. Defects4j: A database of existing faults to enable controlled testing studies for java programs. In: Proceedings of International Symposium on Software Testing and Analysis. New York: ACM, 2014. 437--440. Google Scholar

[24] Qi Z, Long F, Achour S, et al. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis. New York: ACM, 2015. 24--36. Google Scholar

[25] Martinez M, Durieux T, Sommerard R, et al. Automatic repair of real bugs in java: a large-scale experiment on the Defects4J dataset. Empir Softw Eng, 2017, 22: 1936--1964. Google Scholar

[26] Xiong Y F, Liu X Y, Zeng M H, et al. Identifying patch correctness in test-based program repair. In: Proceedings of the 40th International Conference on Software Engineering. New York: ACM, 2018. 789--799. Google Scholar

[27] Chen L S, Pei Y, Furia C A. Contract-based program repair without the contracts. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. New York: IEEE, 2017. 637--647. Google Scholar

[28] Saha R K, Lyu Y J, Yoshida H, et al. Elixir: Effective object oriented program repair. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. New York: IEEE, 2017. 648--659. Google Scholar

[29] Smith E K, Barr E T, Le Goues C, et al. Is the cure worse than the disease? overfitting in automated program repair. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015. 532--543. Google Scholar

[30] Long F, Rinard M. An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 702--713. Google Scholar

[31] Zhong H, Su Z D. An empirical study on real bug fixes. In: Proceedings of the 37th International Conference on Software Engineering. New York: IEEE, 2015. 913--923. Google Scholar

[32] Tan S H, Yoshida H, Prasad M R, et al. Anti-patterns in search-based program repair. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2016. 727--738. Google Scholar

[33] Dantoni L, Samanta R, Singh R. Qlose: program repair with quantiative objectives. In: Proceedings of International Conference on Computer Aided Verification. Berlin: Springer, 2016. 383--401. Google Scholar

[34] Wei Y, Pei Y, Furia C A, et al. Automated fixing of programs with contracts. In: Proceedings of the 19th International Symposium on Software Testing and Analysis. New York: ACM, 2010. 61--72. Google Scholar

[35] Gao Q, Xiong Y F, Mi Y Q, et al. Safe memory-leak fixing for c programs. In: Proceedings of IEEE/ACM 37th IEEE International Conference on Software Engineering, 2015. 459--470. Google Scholar

[36] Cai Y, Cao L W. Fixing deadlocks via lock pre-acquisitions. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 1109--1120. Google Scholar

[37] Hassan F, Wang X Y. Hirebuild: an automatic approach to history-driven repair of build scripts. In: Proceedings of the 40th International Conference on Software Engineering. New York: ACM, 2018. 1078--1089. Google Scholar

[38] Martinez M, Monperrus M. Mining software repair models for reasoning on the search space of automated program fixing. Empir Software Eng, 2015, 20: 176-205 CrossRef Google Scholar

[39] Soto M, Thung F, Wong C P, et al. A deeper look into bug fixes: patterns, replacements, deletions, and additions. In: Proceedings of the 13th International Workshop on Mining Software Repositories, 2016. 512--515. Google Scholar

[40] Yang J Q, Zhikhartsev A, Liu Y F, et al. Better test cases for better automated program repair. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2017. 831--841. Google Scholar

[41] Baudry B, Fleurey F, Le Traon Y. Improving test suites for efficient fault localization. In: Proceedings of the 28th International Conference on Software Engineering. New York: ACM, 2006. 82--91. Google Scholar

[42] Artzi S, Dolby J, Tip F, et al. Directed test generation for effective fault localization. In: Proceedings of the 19th international symposium on Software testing and analysis. New York: ACM, 2010. 49--60. Google Scholar

[43] Yang D H, Qi Y H, Mao X G. Evaluating the strategies of statement selection in automated program repair. In: Proceedings of International Conference on Software Analysis, Testing, and Evolution. Berlin: Springer, 2018. 33--48. Google Scholar

[44] Tao Y D, Kim J, Kim S, et al. Automatically generated patches as debugging aids: a human study. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014. 64--74. Google Scholar

[45] Lawrance J, Bogart C, Burnett M. How Programmers Debug, Revisited: An Information Foraging Theory Perspective. IIEEE Trans Software Eng, 2013, 39: 197-215 CrossRef Google Scholar

[46] LaToza T D, Myers B A. Hard-to-answer questions about code. In: Proceedings of Evaluation and Usability of Programming Languages and Tools. New York: ACM, 2010. 1--6. Google Scholar

[47] Murphy-Hill E, Zimmermann T, Bird C. The Design Space of Bug Fixes and How Developers Navigate It. IIEEE Trans Software Eng, 2015, 41: 65-81 CrossRef Google Scholar

[48] Qi Y H, Mao X G, Lei Y, et al. The strength of random search on automated program repair. In: Proceedings of the 36th International Conference on Software Engineering. New York: ACM, 2014. 254--265. Google Scholar

[49] Le X B D, Lo D, Le Goues C. History driven program repair. In: Proceedings of IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering. New York: IEEE, 2016. 213--224. Google Scholar

[50] Xin Q, Reiss S P. Leveraging syntax-related code for automated program repair. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. New York: IEEE, 2017. 660--670. Google Scholar

[51] Agrawal H, Horgan J R. Dynamic program slicing. In: Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation. New York: ACM, 1990. 246--256. Google Scholar

[52] Zhang X Y, Gupta N, Gupta R. Pruning dynamic slices with confidence. In: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2006. 169--180. Google Scholar

[53] Rathore S S, Kumar S. Predicting number of faults in software system using genetic programming. In: Proceedings of International Conference on Soft Computing and Software Engineering, 2015. 62: 303--311. Google Scholar

[54] Tahir A, MacDonell S G. A systematic mapping study on dynamic metrics and software quality. In: Proceedings of the 2012 IEEE International Conference on Software Maintenance, 2012. 326--335. Google Scholar

[55] Wu R X, Zhang H Y, Cheung S C, et al. Crashlocator: locating crashing faults based on crash stacks. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis. New York: ACM, 2014. 204--214. Google Scholar

[56] Wong C P, Xiong Y F, Zhang H Y, et al. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, 2014. 181--190. Google Scholar

[57] Zhong H, Mei H. Mining repair model for exception-related bug. J Syst Software, 2018, 141: 16-31 CrossRef Google Scholar

[58] Cleve H, Zeller A. Locating causes of program failures. In: Proceedings of the 27th international conference on Software engineering. New York: ACM, 2005. 342--351. Google Scholar

[59] Le T B, Lo D, Goues C L, et al. A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th International Symposium on Software Testing and Analysis. New York: ACM, 2016. 177--188. Google Scholar

[60] Ayewah N, Hovemeyer D, Morgenthaler J D, et al. Using static analysis to find bugs. IEEE Softw, 2008, 25: 22--29. Google Scholar

[61] Weimer W, Fry Z P, Forrest S. Leveraging program equivalence for adaptive program repair: models and first results. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, 2013. 356--366. Google Scholar

  • Figure 1

    (Color online) The call graph of Chart-2.

  • Figure 2

    (Color online) Stack trace of defect Lang-1. Lines 469 and 472 are real faulty conditions.

  • Table 1   Compare our analysis result with existing automatic repair techniques on our dataset$^{\rm~a)}$
    jGenProg jKali Nopol ACS HDR ssFix ELIXIR JAID CapGen SimFix Munal
    Chart 0/4 0/2 1/1 0/0 2/– 1/3 3/1 0/2 2/0 3/0 7/3
    Closure –/– –/– –/– –/– 1/– 0/1 –/– 0/1 –/– 0/0 8/1
    Lang 0/0 0/0 0/0 1/0 2/– 1/1 1/0 0/0 1/0 0/1 10/0
    Math 2/1 1/1 0/0 3/0 1/– 0/4 1/1 1/0 1/0 1/3 7/2
    Time 0/1 0/1 0/0 0/0 0/– 0/1 1/0 0/0 0/0 1/0 9/0
    Total 2/6 1/4 1/1 4/0 6/– 2/106/2 1/3 4/0 5/4 41/6
  • Table 2   Strategies applied to locate faulty method in our analysis
    Strategy Description Defects$^{\rm~a)}$
    Excluding unexecuted statements Exclude those statements not executed by failing test All defects
    Excluding unlikely candidates Filter all non-related candidates based on their functionalities and complexities L-1, 2, 4, 7, 9; M-5, 10; Ch-2; Cl-9; T-1, 4, 10
    Stack trace analysis Locate faulty locations based on the stack trace information thrown by failing test cases L-1, 5, 6; M-3, 4, 8; Ch-4, 9; Cl-2; T-2, 5, 7, 8, 10
    Locating undesirable value changes Locate those statements that change the input values to the final faulty values of failing test cases L-8; Cl-1, 3, 5, 7, 8, 10; T-3, 9
    Checking programming practice Identify those code that obviously violate some programming principles based on previous programming experience L-6, 8; Ch-1, 7, 8
    Predicate switching Inverse condition statements to get expected output, the inversed condition statement is the error location L-3; Ch-1, 9; Cl-10
    Program understanding Understand the logic of faulty program and the functionalities of relevant objects and methods L-10; M-6, 9; Ch-3; Cl-9; T-3, 9
    a) L, M, Ch, Cl and T denote Lang, Math, Chart, Closure and Time project, respectively.
  • Table 3   Strategies used to generate patches in our analysis
    Strategy Description Defects
    Add NullPointer checker Add null pointer checker before using the object to avoid NullPointerException M-4; Ch-4; Cl-2
    Return expected output Return the expected value according to the assertions L-2, 7, 9; M-3, 5, 10; T-1, 3
    Replace an identifier with a similar one Replace an identifier with another one that has the similar name and same type in the scope L-6, 8; Ch-7, 8
    Compare test executions Generate patches by comparing the failed tests with those passed tests with similar test inputs L-2, 5
    Interpret comments Generate patches by directly interpreting comments written in natural language M-9; Cl-1, 5, 7, 9; T-8, 9
    Imitate similar code element Imitate the code that is near the error location and has similar structures L-4, 5; M-6, 8; Ch-1, 2, 7, 9; Cl-3, 8, 10; T-5, 7, 10
    Fix by program understanding Generate patches by understanding the functionality of program L-1, 3, 9, 10; M-6, 9; Ch-2, 3; Cl-3, 8; T-1, 2, 4, 10

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号