This work was supported by National Key Research and Development Program of China (Grant No. 2018YFB1003900) and in part by National Natural Science Foundation of China (Grant Nos. 61722202, 61772107).
[1] Fowler M. Refactoring: improving the design of existing code. Addison-Wesley Professional, 2018. Google Scholar
[2] Fontana F A, Ferme V, Zanoni M, et al. Automatic metric thresholds derivation for code smell detection. In: Proceedings of the 6th International Workshop on Emerging Trends in Software Metrics, Florence, 2015. 44--53. Google Scholar
[3] Ouni A, Kessentini M, Inoue K. Search-Based Web Service Antipatterns Detection. IEEE Trans Serv Comput, 2017, 10: 603-617 CrossRef Google Scholar
[4] Palomba F. Textual analysis for code smell detection. In: Proceedings of the 37th International Conference on Software Engineering-Volume 2, 2015. 769--771. Google Scholar
[5] Deng C W, Huang G B, Xu J. Extreme learning machines: new trends and applications. Sci China Inf Sci, 2015, 58: 1-16 CrossRef Google Scholar
[6] Zhou Z H. Abductive learning: towards bridging machine learning and logical reasoning. Sci China Inf Sci, 2019, 62: 76101 CrossRef Google Scholar
[7] Khomh F, Vaucher S, Guéhéneuc Y G, et al. A Bayesian approach for the detection of code and design smells. In: Proceedings of the 9th International Conference on Quality Software, 2009. 305--314. Google Scholar
[8] Arcelli Fontana F, M?ntyl? M V, Zanoni M. Comparing and experimenting machine learning techniques for code smell detection. Empir Software Eng, 2016, 21: 1143-1191 CrossRef Google Scholar
[9] Kaur A, Jain S, Goel S. A support vector machine based approach for code smell detection. In: Proceedings of International Conference on Machine Learning and Data Science (MLDS), 2017. 9--14. Google Scholar
[10] Kreimer J. Adaptive Detection of Design Flaws. Electron Notes Theor Comput Sci, 2005, 141: 117-136 CrossRef Google Scholar
[11] Vaucher S, Khomh F, Moha N, et al. Tracking design smells: lessons from a study of god classes. In: Proceedings of the 16th Working Conference on Reverse Engineering, 2009. 145--154. Google Scholar
[12] Linders B. Refactoring and Code Smells - A Journey Toward Cleaner Code. InfoQ, 2016. Google Scholar
[13] Cristina M, Radu M, Mihancea F, et al. iPlasma: An Integrated Platform for Quality Assessment of Object-Oriented Design. Proceedings of ICSM, 2005: 77-80. Google Scholar
[14] Singh P, Singh H. DynaMetrics. SIGSOFT Softw Eng Notes, 2008, 33: 1-6 CrossRef Google Scholar
[15] Eaddy M, Aho A, Murphy G C. Identifying, assigning, and quantifying crosscutting concerns. In: Proceedings of the 1st International Workshop on Assessment of Contemporary Modularization Techniques, 2007. 2. Google Scholar
[16] Chidamber S R, Kemerer C F. A metrics suite for object oriented design. IIEEE Trans Software Eng, 1994, 20: 476-493 CrossRef Google Scholar
[17] Padilha J, Pereira J, Figueiredo E, et al. On the effectiveness of concern metrics to detect code smells: an empirical study. In: Proceedings of International Conference on Advanced Information Systems Engineering, 2014. 656--671. Google Scholar
[18] Witten I H, Frank E, Hall M A, et al. Data Mining: Practical machine learning tools and techniques. San Fransisco: Morgan Kaufmann, 2016. Google Scholar
[19] Staelin C. Parameter Selection for Support Vector Machines. Hewlett-Packard Company, Technical Report HPL-2002-354R1, 2003. Google Scholar
[20] Palomba F, Di Nucci D, Tufano M, et al. Landfill: an open dataset of code smells with public evaluation. In: Proceedings of 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, 2015. 482--485. Google Scholar
[21] Amorim L, Costa E, Antunes N, et al. Experience report: evaluating the effectiveness of decision trees for detecting code smells. In: Proceedings of IEEE 26th International Symposium on Software Reliability Engineering (ISSRE), 2015. 261--269. Google Scholar
[22] Reshi J A, Singh S. Predicting software defects through SVM: an empirical approach. 2018,. arXiv Google Scholar
[23] Soltanifar B, Akbarinasaji S, Caglayan B, et al. Software analytics in practice: a defect prediction model using code smells. In: Proceedings of the 20th International Database Engineering & Applications Symposium, 2016. 148--155. Google Scholar
[24] Moha N, Gueheneuc Y G, Duchien L. DECOR: A Method for the Specification and Detection of Code and Design Smells. IIEEE Trans Software Eng, 2010, 36: 20-36 CrossRef Google Scholar
[25] Fokaefs M, Tsantalis N, Chatzigeorgiou A. Jdeodorant: identification and removal of feature envy bad smells. In: Proceedings of 2007 IEEE International Conference on Software Maintenance, 2007. 519--520. Google Scholar
[26] Boussaa M, Kessentini W, Kessentini M, et al. Competitive coevolutionary code-smells detection. In: Proceedings of International Symposium on Search Based Software Engineering. Berlin: Springer, 2013. 50--65. Google Scholar
[27] Saranya G, Nehemiah H K, Kannan A. Hybrid particle swarm optimisation with mutation for code smell detection. IJBIC, 2018, 12: 186-195 CrossRef Google Scholar
[28] Kessentini M, Kessentini W, Sahraoui H, et al. Design defects detection and correction by example. In: Proceedings of IEEE 19th International Conference on Program Comprehension, 2011. 81--90. Google Scholar
[29] Khomh F, Vaucher S, Guéhéneuc Y G. BDTEX: A GQM-based Bayesian approach for the detection of antipatterns. J Syst Software, 2011, 84: 559-572 CrossRef Google Scholar
[30] Dario D N, Fabio P, Damian A T, et al. Detecting code smells using machine learning techniques: Are we there yet? In: Proceedings of IEEE 25th International Conference on Software Analysis Evolution and Reengineering (SANER), 2018. 612--621. Google Scholar
[31] Maneerat N, Muenchaisri P. Bad-smell prediction from software design model using machine learning techniques. In: Proceedings of the 8th International Joint Conference on Computer Science and Software Engineering (JCSSE), 2011. 331--336. Google Scholar
[32] Hassaine S, Khomh F, Guéhéneuc Y G, et al. IDS: an immune-inspired approach for the detection of software design smells. In: Proceedings of the 7th International Conference on the Quality of Information and Communications Technology, 2010. 343--348. Google Scholar
Figure 1
Approach overview.
Figure 2
The correlation analysis results of Divergent Change.
Figure 4
The correlation analysis results of Parallel Inheritance.
Figure 5
The decision tree generated by Divergent Change.
Figure 7
The decision tree generated by Parallel Inheritance.
@@extracolsepfillcp115mm@ Terminology | Description |
CVParameterSelection | A meta-classifier for parameter optimization selection in Weka to find the optimal value for one or more parameters within a specified range. |
Filter | A function used to manipulate attributes and instances in the data preprocessing phase. |
ReplaceMissingValues | A function for replacing all missing values for nominal and numeric attributes in a dataset with the modes and means from the training data. |
AttributeSelection | A supervised attribute filter that can be used to select features. |
InfoGainAttributeEval | An evaluator that evaluates the worth of an attribute by measuring the information gain with respect to the class. |
Ranker | A search strategy that ranks attributes by their individual evaluations. |
ArrayEditor | A module for inputting parameters during parameter optimization. |
BatchSize | A parameter for the preferred number of instances to process when a batch prediction is being performed. |
DonotCheckCapabilities | A module that decides whether to check the classifier function. |
NominalToBinaryFitter | A module that converts the nominal attribute type to a numeric attribute. |
Ridge Estimation | A biased estimation regression method for collinear data analysis. |
Category | Name | Definition | Category | Name | Definition |
Basic | Address | Address of class | Complexity | AMW | Average methods weight |
BUR | Usage ratio | AC | Attribute complexity | ||
Class_name | Name of class | CC | Cyclomatic complexity | ||
CRIX | Children ratio | EC | Essential complexity | ||
GREEDY | Raised exceptions | NORM | Number of remote methods | ||
is static | is static class | NOAV | Number of accessed variables | ||
is leaf-class | is leaf-class | NOLV | Number of local variable | ||
is interface | is interface | NOEU | Number of external variables | ||
is root-class | is root-class | NOP_M | Number of parents in the method | ||
is abstract | is abstract class | NOP_Pr | Number of parents in the project | ||
Package | Package of class | RFC | Response for class | ||
Project | Project of class | WOC | Weight of a class | ||
PNAS | Public number of accessors | WMC | Weighted methods count | ||
PS | Package size | WMPC1 | Weighted methods per class 1 | ||
Code Size | LOC | lines of code in the class | WMPC2 | Weighted methods per class 2 | |
LOCC | Lines of code classes | Coupling | AOFD | Access of foreign data | |
LOC_M | lines of code in the method | ATFD | Access to foreign data | ||
LOC_Pa | lines of code in the packages | CE | Efferent coupling | ||
LOC_Pr | lines of code in the project | ChC | Changing classes | ||
NOCON | Number of constructors | CM | Changing methods | ||
NOM | Number of methods | CBO | Coupling between objects | ||
NOM_Pr | Project number of methods | DD | Dependency dispersion | ||
NOM_Pa | Package number of methods | DAC | Data abstraction coupling | ||
NOA | Number of ancestors | FDP | Foreign data providers | ||
NAM | Number of accessor methods | FANOUT | Number of classes referenced | ||
NOD | Number of descendants | FANIN | Number of classes that reference | ||
NOIS | Number of import statements | MIC | Method invocation coupling | ||
NOO | Number of operations | NOEC | Number of external clients | ||
NCC | Number of client classes | NRSS | Number of static calls | ||
NOIC | Number of internal clients | NOED | Number of external dependencies | ||
PProtM | Percentage of protected members | WCM | Weighted changing methods | ||
PPubM | Percentage of public members | Cohesion | ALD | Access of local data | |
PPrivM | Percentage of private members | AID | Access of import data | ||
Inheritance | DIT | Depth of inheritance tree | TCC | Tight class cohesion | |
DOIH | Depth of inheritance hierarchy | Encapsulation | LAA | Locality of attribute accesses | |
HIT | Height of inheritance tree | NOPA | Number of public attribute | ||
NOC | Number of children | NOAM | Number of added methods | ||
NOCC | Number of child classes | NOOM | Number of overridden methods | ||
NOC_Pr | Number of children in the project | Concern | CDC | Concern diffusion over components | |
NOC_Pa | Number of children in the package | CDO | Concern diffusion over operations |
Project | KLOC | Packages | Classes | Divergent Change | Parallel Inheritance | Shotgun Surgery |
Aardvark | 25 | 11 | 103 | 0 | 0 | 0 |
And Engine | 20 | 90 | 596 | 0 | 0 | 0 |
frameworks-base | 770 | 253 | 2766 | 2 | 3 | 0 |
cassandra | 117 | 43 | 826 | 3 | 0 | 0 |
commons-codec | 23 | 7 | 103 | 0 | 0 | 0 |
commons-logging | 23 | 17 | 61 | 1 | 3 | 0 |
derby | 166 | 194 | 1746 | 0 | 0 | 0 |
Eclipse Core | 162 | 843 | 1190 | 0 | 7 | 0 |
Google Guava | 16 | 25 | 153 | 0 | 0 | 0 |
HealthWatcher | 6 | 26 | 132 | 12 | 0 | 7 |
James Mime4j | 280 | 26 | 250 | 1 | 0 | 0 |
MobileMedia | 3 | 10 | 51 | 4 | 0 | 3 |
sdk | 54 | 198 | 268 | 1 | 12 | 0 |
support | 59 | 22 | 246 | 1 | 0 | 1 |
telephony | 75 | 17 | 223 | 0 | 0 | 0 |
Tomcat | 336 | 154 | 1284 | 1 | 10 | 1 |
tool-base | 119 | 69 | 532 | 0 | 0 | 0 |
Machine learning | Divergent Change | Shotgun Surgery | Parallel Inheritance | |||
algorithm | Training time (s) | Testing time (s) | Training time (s) | Testing time (s) | Training time (s) | Testing time (s) |
SMO | 0.08 | 0.05 | 0.04 | 0.02 | 0.48 | 0.14 |
NaiveBayes | 0.10 | 0.05 | 0.04 | 0.03 | 0.17 | 0.13 |
J48 | 0.08 | 0.05 | 0.04 | 0.01 | 0.10 | 0.05 |
JRip | 0.08 | 0.05 | 0.04 | 0.03 | 0.09 | 0.05 |
RandomForest | 0.07 | 0.05 | 0.04 | 0.02 | 0.10 | 0.05 |
Logistic Regression | 0.08 | 0.04 | 0.04 | 0.02 | 0.09 | 0.04 |
Machine learning | Divergent Change | Shotgun Surgery | Parallel Inheritance | |||||||||
algorithm | Precision | Recall | F-measure | ROC | Precision | Recall | F-measure | ROC | Precision | Recall | F-measure | ROC |
(%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | ||||
SMO | 72.7 | 72.7 | 72.7 | 0.817 | 100 | 83.3 | 90.9 | 0.917 | 81.8 | 81.8 | 81.8 | 0.887 |
NaiveBayes | 60.0 | 81.8 | 69.2 | 0.901 | 83.3 | 83.3 | 83.3 | 0.949 | 60.0 | 54.5 | 57.1 | 0.788 |
J48 | 60.0 | 81.8 | 69.2 | 0.838 | 71.4 | 83.3 | 76.9 | 0.821 | 69.2 | 81.8 | 75.0 | 0.885 |
JRip | 64.3 | 81.8 | 72.0 | 0.885 | 75.0 | 50.0 | 60.0 | 0.712 | 88.9 | 72.7 | 80.0 | 0.853 |
RandomForest | 77.8 | 63.6 | 70.0 | 0.903 | 100 | 75.0 | 85.7 | 0.817 | 77.8 | 63.6 | 70.0 | 0.937 |
Logistic Regression | 81.8 | 81.8 | 81.8 | 0.895 | 100 | 83.3 | 90.9 | 1.000 | 81.8 | 81.8 | 81.8 | 0.935 |
Machine learning algorithm | Divergent Change | Shotgun Surgery | Parallel Inheritance |
SMO | 0.0251 | 0.7081 | 0.7359 |
NaiveBayes | 0.2688 | 0.1895 | 0.0121 |
J48 | 0.1411 | 0.0209 | 0.2769 |
JRip | 0.2585 | 0.0054 | 0.6659 |
RandomForest | 0.7943 | 0.2768 | 0.2802 |
Machine learning | Divergent Change | Shotgun Surgery | Parallel Inheritance | |||||||||
algorithm | Precision | Recall | F-measure | ROC | Precision | Recall | F-measure | ROC | Precision | Recall | F-measure | ROC |
(%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | ||||
SMO-o | 66.7 | 30.8 | 42.1 | 0.621 | 50.0 | 33.3 | 40.0 | 0.635 | 66.7 | 42.9 | 52.2 | 0.679 |
NaiveBayes-o | 66.7 | 22.2 | 33.3 | 0.595 | 50.0 | 33.3 | 40.0 | 0.750 | 57.1 | 57.1 | 57.1 | 0.776 |
J48-o | 50.0 | 44.4 | 47.1 | 0.513 | 12.5 | 33.3 | 18.2 | 0.375 | 66.7 | 42.9 | 52.2 | 0.679 |
JRip-o | 66.7 | 22.2 | 33.3 | 0.596 | 12.5 | 33.3 | 18.2 | 0.375 | 62.5 | 35.7 | 45.5 | 0.643 |
RandomForest-o | 66.7 | 44.4 | 53.3 | 0.809 | 40.0 | 66.7 | 50.0 | 0.646 | 66.7 | 28.6 | 40.0 | 0.619 |
Logistic Regression-o | 50.0 | 33.3 | 40.0 | 0.552 | 40.0 | 66.7 | 50.0 | 0.646 | 56.3 | 64.3 | 60.0 | 0.757 |
Machine learning algorithm | Divergent Change | Shotgun Surgery | Parallel Inheritance |
SMO-o | 0.0309 | 0.0009 | 0.0059 |
NaiveBayes-o | 0.0539 | 0.0088 | 0.9619 |
J48-o | 0.0046 | 0.0002 | 0.0275 |
JRip-o | 0.0387 | 0.0033 | 0.0085 |
RandomForest-o | 0.2013 | 0.0098 | 0.0514 |
Logistic Regression-o | 0.0003 | 0.0021 | 0.0069 |
Code smell | Precision (%) | Recall (%) | F-measure (%) | ROC area | P-value |
Divergent Change | 76.9 | 90.9 | 83.3 | 0.903 | 0.6856 |
Shotgun Surgery | 100 | 83.3 | 90.9 | 1.000 | 1.0000 |
Parallel Inheritance | 81.8 | 81.8 | 81.8 | 0.943 | 0.9642 |
Code smell | Precision (%) | Recall (%) | F-measure (%) | ROC area | P-value |
Divergent Change | 81.8 | 81.8 | 81.8 | 0.881 | 0.8927 |
Shotgun Surgery | 100 | 83.3 | 90.9 | 1.000 | 1.0000 |
Parallel Inheritance | 81.8 | 81.8 | 81.8 | 0.937 | 0.9908 |
Number | Divergent Change | Shotgun Surgery | Parallel Inheritance | |||
Metric | Coefficients | Metric | Coefficients | Metric | Coefficients | |
1 | is interface | 1.4067 | TCC | 46.5202 | BUR | 2.4688 |
2 | is root | 1.2052 | is static | 7.6223 | is static | 1.6495 |
3 | PNAS | 1.1199 | DOIH | 2.1524 | is leaf class | 1.3356 |
4 | is leaf class | 1.0874 | DIT | 1.6331 | is interface | 1.1756 |
5 | BUR | 1.0834 | NAM | 1.3989 | MIC | 1.1132 |
6 | AMW | 1.0525 | NOC | 1.3080 | NOCON | 1.1101 |
7 | DAC | 1.0422 | FDP | 1.1255 | CRIX | 1.0605 |
8 | CDC | 1.0333 | AMW | 1.1252 | DAC | 1.0536 |
9 | CRIX | 1.0265 | ChC | 1.0140 | NAM | 1.0448 |
10 | MIC | 1.0203 | CM | 1.0035 | NOC_Pa | 1.0007 |
Code smell | Precision (%) | Recall (%) | F-measure (%) | ROC area | P-value |
Divergent Change | 100 | 50.0 | 66.7 | 0.833 | 0.4549 |
Shotgun Surgery | 100 | 66.7 | 80.0 | 1.000 | 0.4786 |
Parallel Inheritance | 60.0 | 30.0 | 40.0 | 0.603 | 0.0037 |