logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 8 : 1148-1177(2020) https://doi.org/10.1360/SSI-2019-0149

Android malware detection: a survey

More info
  • ReceivedJul 12, 2019
  • AcceptedFeb 3, 2020
  • PublishedJul 31, 2020

Abstract

Android has become the most popular mobile operating system in the past ten years due to its three main advantages, namely, the openness of source code, richness of hardware selection, and millions of applications (apps). It is of no surprise that Android has become the major target of malware. The rapid increase in the number of Android malware poses big threats to smart phone users such as financial charges, information collection, and remote control. Thus, the in-depth study of the security issues of mobile apps is of great importance to the sound development of the smart phone ecosystem. We first introduce the existing problems and challenges of malware analysis, and then summarize the widely-used benchmark datasets. After that, we divide the existing malware analysis methods into three categories, including signature-based methods, machine learning-based methods, and behavior-based methods. We further summarize the techniques used in each method, and compare and analyze the advantages and disadvantages of different techniques. Finally, combined with our own research foundation in malware analysis, we explore and discuss future research directions and challenges.


Funded by

国家重点研发计划(2016YFB1000903)

国家自然科学基金(61902306,61632015,U1766215,61772408,61833015)

国家自然科学基金创新群体(61721002)

教育部创新团队(IRT_17R86)

中国博士后科学基金站前特别资助(2019TQ0251)


Acknowledgment

特别感谢“雁栖湖大数据时代软件自动化的机遇和挑战会议".


References

[1] Wang H Y, Liu Z, Liang J Y, et al. Beyond google play: a large-scale comparative study of chinese android app markets. In: Proceedings of the Internet Measurement Conference (IMC), Boston, 2018. 293--307. Google Scholar

[2] Avdiienko V, Kuznetsov K, Gorla A, et al. Mining apps for abnormal usage of sensitive data. In: Proceedings of IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), Florence, 2015. 426--436. Google Scholar

[3] Chen K, Liu P, Zhang Y J. Achieving accuracy and scalability simultaneously in detecting application clones on android markets. In: Proceedings of the IEEE/ACM 36th International Conference on Software Engineering (ICSE), Hyderabad, 2014. 175--186. Google Scholar

[4] Li M H, Wang W, Wang P, et al. Libd: scalable and precise third-party library detection in android markets. In: Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017. 335--346. Google Scholar

[5] Feng Y, Anand S, Dillig I, et al. Apposcopy: Semantics-based detection of android malware through static analysis. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), Hong Kong, 2014. 576--587. Google Scholar

[6] Liu J, Wu D Y, Xue J L. TDroid: Exposing app switching attacks in Android with control flow specialization. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), Montpellier, 2018. 236--247. Google Scholar

[7] Yan J W, Deng X, Wang P, et al. Characterizing and identifying misexposed activities in android applications. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), Montpellier, 2018. 691--701. Google Scholar

[8] Zhou Y J, Jiang X X. Dissecting android malware: characterization and evolution. In: Proceedings of the IEEE symposium on security and privacy, San Francisco, 2012. 95--109. Google Scholar

[9] Octeau D, McDaniel P, Jha S, et al. Effective inter-component communication mapping in android with epicc: an essential step towards holistic security analysis. In: Proceedings of the 22nd USENIX Security Symposium, Washington, 2013. 543--558. Google Scholar

[10] Chen K, Wang P, Lee Y, et al. Finding unknown malice in 10 seconds: mass vetting for new threats at the google-play scale. In: Proceedings of the 24th USENIX Security Symposium, Washington, 2015. 659--674. Google Scholar

[11] Xue L, Zhou Y J, Chen T, et al. Malton: towards on-device non-invasive mobile malware analysis for ART. In: Proceedings of the 26th USENIX Security Symposium, Vancouver, 2017. 289--306. Google Scholar

[12] Qu Z Y, Rastogi V, Zhang X Y, et al. Autocog: measuring the description-to-permission fidelity in android applications. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Scottsdale, 2014. 1354--1365. Google Scholar

[13] Zhu Z Y, Dumitras T. FeatureSmith: automatically engineering features for malware detection by mining the security literature. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, 2016. 767--778. Google Scholar

[14] Au K W, Zhou Y F, Huang Z, et al. Pscout: analyzing the android permission specification. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Raleigh, 2012. 217--228. Google Scholar

[15] Arp D, Spreitzenbarth M, Hubner M, et al. DREBIN: effective and explainable detection of Android malware in your pocket. In: Proceedings of the 21st Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2014. Google Scholar

[16] Feng Y, Bastani O, Martins R, et al. Automated synthesis of semantic malware signatures using maximum satisfiability. In: Proceedings of the 24th Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2017. Google Scholar

[17] Mariconti E, Onwuzurike L, Andriotis P, et al. Mamadroid: detecting android malware by building Markov chains of behavioral models. In: Proceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2016. Google Scholar

[18] Fan M, Liu J, Wang W. DAPASA: Detecting Android Piggybacked Apps Through Sensitive Subgraph Analysis. IEEE TransInformForensic Secur, 2017, 12: 1772-1785 CrossRef Google Scholar

[19] Wang W, Wang X, Feng D. Exploring Permission-Induced Risk in Android Applications for Malicious Application Detection. IEEE TransInformForensic Secur, 2014, 9: 1869-1882 CrossRef Google Scholar

[20] Rastogi V, Chen Y, Jiang X. Catch Me If You Can: Evaluating Android Anti-Malware Against Transformation Attacks. IEEE TransInformForensic Secur, 2014, 9: 99-108 CrossRef Google Scholar

[21] Liu J, Su P R, Yang M, et al. Software and Cyber Security - A Survey. Journal of Software, 2018, 29(1):42-68 DOI: 10.13328/j.cnki.jos.005320. Google Scholar

[22] Qing S H. Research Progress on Android Security. Journal of Software, 2016, 27(1):45-71 DOI: 10.13328/j.cnki.jos.004914. Google Scholar

[23] Zhang Y Q, Wang K, Yang H, et al. Survey of Android OS Security. Journal of Computer Research and Development, 2014, 51(7):1385-1396 DOI: 10.7544/issn1000-1239.2014.20140098. Google Scholar

[24] Nan Y Z, Yang M, Yang Z M, et al. UIPicker: user-input privacy identification in mobile applications. In: Proceedings of the 24th USENIX Security Symposium, Washington, 2015. 993--1008. Google Scholar

[25] Jiang X X. Security Alert: New Stealthy Android Spyware--Plankton--Found in Official Android Market. 2011. https://www.csc2.ncsu.edu/faculty/xjiang4/Plankton/. Google Scholar

[26] Fan M, Liu J, Luo X. Android Malware Familial Classification and Representative Sample Selection via Frequent Subgraph Analysis. IEEE TransInformForensic Secur, 2018, 13: 1890-1905 CrossRef Google Scholar

[27] Fan M, Liu J, Luo X P, et al. Frequent subgraph based familial classification of android malware. In: Proceedings of IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), Ottawa, 2016. 24--35. Google Scholar

[28] Zhang M, Duan Y, Yin H, et al. Semantics-aware android malware classification using weighted contextual API dependency graphs. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Scottsdale, 2014. 1105--1116. Google Scholar

[29] Tian Z, Liu T, Zheng Q. Exploiting thread-related system calls for plagiarism detection of multithreaded programs. J Syst Software, 2016, 119: 136-148 CrossRef Google Scholar

[30] Tian Z, Liu T, Zheng Q. Reviving Sequential Program Birthmarking for Multithreaded Software Plagiarism Detection. IIEEE Trans Software Eng, 2018, 44: 491-511 CrossRef Google Scholar

[31] Li L, Bissyande, T, Octeau D, et al. Droidra: taming reflection to support whole-program analysis of android apps. In: Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA), Saarbrucken, 2016. 318--329. Google Scholar

[32] Xue L, Luo X P, Yu L, et al. Adaptive unpacking of Android apps. In: Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017. 358--369. Google Scholar

[33] Kalysch A, Milisterfer O, Protsenko M, et al. Tackling Androids native library malware with robust, efficient and accurate similarity measures. In: Proceedings of the 13th International Conference on Availability, Reliability and Security, Hamburg, 2018. 1--10. Google Scholar

[34] Qian C X, Luo X P, Shao Y R, et al. On tracking information flows through jni in Android applications. In: Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, 2014. 180--191. Google Scholar

[35] Xue L, Qian C, Zhou H. NDroid: Toward Tracking Information Flows Across Multiple Android Contexts. IEEE TransInformForensic Secur, 2019, 14: 814-828 CrossRef Google Scholar

[36] Dong S K, Li M H, Diao W R, et al. Understanding Android obfuscation techniques: a large-scale investigation in the wild. In: Proceedings of the Security and Privacy in Communication Networks (SecureComm), Singapore, 2018. 172--192. Google Scholar

[37] Wang P, Bao Q K, Wang L, et al. Software protection on the Go: a large-scale empirical study on mobile app obfuscation. In: Proceedings of the 40th International Conference on Software Engineering (ICSE), Gothenburg, 2018. 26--36. Google Scholar

[38] Rastogi V, Chen Y, Jiang X X. Droidchameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Berlin, 2013. 329--334. Google Scholar

[39] Son D. AVPASS-tool for leaking and bypassing Android malware detection system. 2017. https://www.kitploit.com/2017/08/avpass-tool-for-leaking-and-bypassing.html?m=1. Google Scholar

[40] Jordaney R, Sharad K, Dash S K, et al. Transcend: detecting concept drift in malware classification models. In: Proceedings of the 26th USENIX Security Symposium, Vancouver, 2017. 625--642. Google Scholar

[41] Liu Q, Li P, Zhao W. A Survey on Security Threats and Defensive Techniques of Machine Learning: A Data Driven View. IEEE Access, 2018, 6: 12103-12117 CrossRef Google Scholar

[42] Guidotti R, Monreale A, Ruggieri S. A Survey of Methods for Explaining Black Box Models. ACM Comput Surv, 2019, 51: 1-42 CrossRef Google Scholar

[43] Wei F G, Li Y P, Roy S, et al. Deep ground truth analysis of current android malware. In: Proceedings of International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Bonn, 2017. 252--276. Google Scholar

[44] Wang H Y, Si J J, Li H, et al. RmvDroid: towards a reliable Android malware dataset with app metadata. In: Proceedings of IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), Montreal, 2019. 404--408. Google Scholar

[45] Allix K, Bissyande T F, Klein J, et al. Androzoo: collecting millions of android apps for the research community. In: Proceedings of IEEE/ACM 13rd International Conference on Mining Software Repositories (MSR), Austin, 2016. 468--471. Google Scholar

[46] Meng G Z, Xue Y X, Siow J K, et al. Androvault: constructing knowledge graph from millions of android apps for automated analysis. 2017,. arXiv Google Scholar

[47] Sebastian M, Rivera R, Kotzias P, et al. Avclass: a tool for massive malware labeling. In: Proceedings of 19th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID), Paris, 2016. 230--253. Google Scholar

[48] Apktool. a tool for reverse engineering Android apk files. 2019. https://ibotpeaches.github.io/Apktool/. Google Scholar

[49] Xue L, Luo X P, Yu L, et al. Adaptive unpacking of Android apps. In: Proceedings of the 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017. 358--369. Google Scholar

[50] Zhang Y Q, Luo X P, Yin H Y. Dexhunter: toward extracting hidden code from packed android applications. In: Proceedings of the 20th European Symposium on Research in Computer Security (ESORICS), Vienna, 2015. 293--311. Google Scholar

[51] Duan Y, Zhang M, Bhaskar A V, et al. Things you may not know about android (un)packers: a systematic study based on whole-system emulation. In: Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2018. Google Scholar

[52] Ristad E S, Yianilos P N. Learning string-edit distance. IEEE Trans Pattern Anal Machine Intell, 1998, 20: 522-532 CrossRef Google Scholar

[53] Enck W, Ongtang M, McDaniel P. On lightweight mobile phone application certification. In: Proceedings of the ACM Conference on Computer and Communications Security, Chicago, 2009. 235--245. Google Scholar

[54] Zhou Y J, Wang Z, Zhou W, et al. Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: Proceedings of the 19th Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2012. Google Scholar

[55] Seo S H, Gupta A, Mohamed Sallam A. Detecting mobile malware threats to homeland security through static analysis. J Network Comput Appl, 2014, 38: 43-53 CrossRef Google Scholar

[56] Zheng M, Sun M S, Lui J. Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware. In: Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Melbourne, 2013. 163--171. Google Scholar

[57] Afonso V, Bianchi A, Fratantonio Y, et al. Going native: using a large-scale analysis of android apps to create a practical native-code sandboxing policy. In: Proceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2016. Google Scholar

[58] Sun M T, Tan G. Nativeguard: protecting android applications from third-party native libraries. In: Proceedings of the 7th ACM Conference on Security & Privacy in Wireless and Mobile Networks, Oxford, 2014. 165--176. Google Scholar

[59] Alam S, Qu Z, Riley R. DroidNative: Automating and optimizing detection of Android native code malware variants. Comput Security, 2017, 65: 230-246 CrossRef Google Scholar

[60] Alam S, Horspool R N, Traore I. MAIL: malware analysis intermediate language: a step towards automating and optimizing malware detection. In: Proceedings of the 6th International Conference on Security of Information and Networks, Aksaray, 2013. 233--240. Google Scholar

[61] Sanz B, Santos I, Laorden C, et al. Puma: permission usage to detect malware in android. In: Proceedings of International Joint Conference CISIS, Ostrava, 2012. 289--298. Google Scholar

[62] Moonsamy V, Rong J, Liu S. Mining permission patterns for contrasting clean and malicious android applications. Future Generation Comput Syst, 2014, 36: 122-132 CrossRef Google Scholar

[63] Aung Z, Zaw W. Permission-based Android malware detection. Int J Sci Technol Res, 2013, 2: 228--234. Google Scholar

[64] Liu X, Liu J Q. A two-layered permission-based Android malware detection scheme. In: Proceedings of 2nd IEEE International Conference on Mobile Cloud Computing, Services, and Engineering, Oxford, 2014. 142--148. Google Scholar

[65] Li J, Sun L, Yan Q. Significant Permission Identification for Machine-Learning-Based Android Malware Detection. IEEE Trans Ind Inf, 2018, 14: 3216-3225 CrossRef Google Scholar

[66] Aafer Y, Du W L, Yin H. Droidapiminer: mining api-level features for robust malware detection in Android. In: Proceedings of the International Conference on Security and Privacy in Communication Networks, Sydney, 2013. 86--103. Google Scholar

[67] Zhao M, Ge F B, Zhang T, et al. AntiMalDroid: an efficient SVM-based malware detection framework for android. In: Proceedings of the 2nd International Conference, Qinhuangdao, 2011. 158--166. Google Scholar

[68] Isohara T, Takemori K, Kubota A. Kernel-based behavior analysis for Android malware detection. In: Proceedings of the Seventh International Conference on Computational Intelligence and Security (CIS), Sanya, 2011. 1011--1015. Google Scholar

[69] Peiravian N, Zhu X Q. Machine learning for android malware detection using permission and api calls. In: Proceedings of the 25th IEEE International Conference on Tools with Artificial Intelligence, Herndon, 2013. 300--305. Google Scholar

[70] Chan P P, Song W K. Static detection of Android malware by using permissions and API calls. In: Proceedings of the International Conference on Machine Learning and Cybernetics, LanZhou, 2014. 82--87. Google Scholar

[71] Wu D J, Mao C H, Wei T E, et al. Droidmat: Android malware detection through manifest and api calls tracing. In: Proceedings of the Seventh Asia Joint Conference on Information Security, Kaohsiung, 2012. 62--69. Google Scholar

[72] Zhang L S, Niu Y, Wu X, et al. A3: automatic analysis of android malware. In: Proceedings of the 1st International Workshop on Cloud Computing and Information Security, 2013. Google Scholar

[73] Sanz B, Santos I, Xabier U P, et al. Anomaly detection using string analysis for android malware detection. In: Proceedings of the International Conference on Soft Computing Models in Industrial and Environmental Applications, Bilbao, 2014. 469--478. Google Scholar

[74] Wang X, Wang W, He Y. Characterizing Android apps' behavior for effective detection of malapps at large scale. Future Generation Comput Syst, 2017, 75: 30-45 CrossRef Google Scholar

[75] Tang A, Sethumadhavan S, Stolfo S J. Unsupervised anomaly-based malware detection using hardware features. In: Proceedings of the International Workshop on Recent Advances in Intrusion Detection, Gothenburg, 2014. 109--129. Google Scholar

[76] Garcia J, Hammad M, Malek S. Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware. ACM Trans Softw Eng Methodol, 2018, 26: 1-29 CrossRef Google Scholar

[77] Tian Z, Zheng Q, Liu T. Software Plagiarism Detection with Birthmarks Based on Dynamic Key Instruction Sequences. IIEEE Trans Software Eng, 2015, 41: 1217-1235 CrossRef Google Scholar

[78] Canfora G, De L A, Medvet E, et al. Effectiveness of opcode ngrams for detection of multi family android malware. In: Proceedings of 10th International Conference on Availability, Reliability and Security, Toulouse, 2015. 333--340. Google Scholar

[79] Zhang B, Xiao W, Xiao X. Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes. Future Generation Comput Syst, 2019, CrossRef Google Scholar

[80] Suarez-Tangil G, Tapiador J E, Peris-Lopez P. Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst Appl, 2014, 41: 1104-1117 CrossRef Google Scholar

[81] Teufl P, Ferk M, Fitzek A. Malware detection by applying knowledge discovery processes to application metadata on the Android Market (Google Play). Security Comm Networks, 2016, 9: 389-419 CrossRef Google Scholar

[82] Grampurohit V, Grampurohit V, Rawat S, et al. Category based malware detection for Android. In: Proceedings of the International Symposium on Security in Computing and Communication, Delhi, 2014. 239--249. Google Scholar

[83] Wang W, Li Y, Wang X. Detecting Android malicious apps and categorizing benign apps with ensemble of classifiers. Future Generation Comput Syst, 2018, 78: 987-994 CrossRef Google Scholar

[84] Gorla A, Tavecchia I, Gross F, et al. Checking app behavior against app descriptions. In: Proceedings of the 36th International Conference on Software Engineering (ICSE), Hyderabad, 2014. 1025--1035. Google Scholar

[85] Fan M, Luo X, Liu J. CTDroid: Leveraging a Corpus of Technical Blogs for Android Malware Analysis. IEEE Trans Rel, 2020, 69: 124-138 CrossRef Google Scholar

[86] Gascon H, Yamaguchi F, Arp D, et al. Structural detection of Android malware using embedded call graphs. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (AiSec), Berlin, 2013. 45--54. Google Scholar

[87] Hu W J, Tao J, Ma X B, et al. MIGDroid: detecting app-repackaging android malware via method invocation graph. In: Proceedings of the 23rd International Conference on Computer Communication and Networks (ICCCN), Shanghai, 2014. 1--7. Google Scholar

[88] Marastoni N, Continella A, Quarta D, et al. GroupDroid: automatically grouping mobile malware by extracting code similarities. In: Proceedings of the 7th Software Security, Protection, and Reverse Engineering/Software Security and Protection Workshop, Orlando, 2017. 1--12. Google Scholar

[89] Sun X, Zhongyang Y B, Xin Z, et al. Detecting code reuse in android applications using component-based control flow graph. In: Proceedings of the 23rd USENIX Security Symposium, San Diego, 2014. 142--155. Google Scholar

[90] Meng G Z, Xue Y X, Xu Z Z, et al. Semantic modelling of android malware for effective malware comprehension, detection, and classification. In: Proceedings of the 25th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), Saarbrucken, 2016. 306--317. Google Scholar

[91] Crussell J, Gibler C, Chen H. Attack of the clones: detecting cloned applications on Android markets. In: Proceedings of the 17th European Symposium on Research in Computer Security (ESORICS), Pisa, 2012. 37--54. Google Scholar

[92] Wolfe B, Elish K O, Yao D F. Comprehensive behavior profiling for proactive Android malware detection. In: Proceedings of the 17th International Conference Information Security and Cryptology, Seoul, 2014. 328--344. Google Scholar

[93] Zhang F F, Huang H Q, Zhu S C, et al. ViewDroid: towards obfuscation-resilient mobile application repackaging detection. In: Proceedings of the 7th ACM Conference on Security & Privacy in Wireless and Mobile Networks (WiSec), Oxford, 2014. 25--36. Google Scholar

[94] Shao Y R, Luo X P, Qian C X, et al. Towards a scalable resource-driven approach for detecting repackaged Android applications. In: Proceedings of the 30th Annual Computer Security Applications Conference (ACSAC), New Orleans, 2014. 56--65. Google Scholar

[95] Zheng C, Zhu S X, Dai S F, et al. Smartdroid: an automatic system for revealing ui-based trigger conditions in android applications. In: Proceedings of the 2nd ACM workshop on Security and Privacy in Smartphones and Mobile Devices, Raleigh, 2012. 93--104. Google Scholar

[96] Zhou W, Zhou Y J, Grace M, et al. Fast, scalable detection of piggybacked mobile applications. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, San Antonio, 2013. 185--196. Google Scholar

[97] Tian K, Yao D F, Ryder B G, et al. Analysis of code heterogeneity for high-precision classification of repackaged malware. In: Proceedings of the IEEE Security and Privacy Workshops, Austin, 2016. 262--271. Google Scholar

[98] Deshotels L, Notani V, Lakhotia A. Droidlegacy: automated familial classification of Android malware. In: Proceedings of the Program Protection and Reverse Engineering Workshop, New Orleans, 2014. 1--12. Google Scholar

[99] Hou S F, Ye Y F, Song Y Q, et al. Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), Halifax, 2017. 1507--1515. Google Scholar

[100] Rasthofer S, Arzt S, Bodden E. A machine-learning approach for classifying and categorizing Android sources and sinks. In: Proceedings of the 21st Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2014. Google Scholar

[101] Hanna S, Huang L, Wu E, et al. Juxtapp: a scalable system for detecting code reuse among android applications. In: Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Heraklion, 2012. 62--81. Google Scholar

[102] Zhou W, Zhou Y J, Jiang X X, et al. Detecting repackaged smartphone applications in third-party android marketplaces. In: Proceedings of the Second ACM Conference on Data and Application Security and Privacy, San Antonio, 2012. 317--326. Google Scholar

[103] Narayanan A, Chandramohan M, Chen L H, et al. subgraph2vec: learning distributed representations of rooted sub-graphs from large graphs. 2016,. arXiv Google Scholar

[104] Fan M, Luo X P, Liu J, et al. Graph embedding based familial analysis of Android malware using unsupervised learning. In: Proceedings of the 41st International Conference on Software Engineering (ICSE), Montreal, 2019. 771--782. Google Scholar

[105] Ribeiro L F, Saverese P H, Figueiredo D R. struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, 2017. 385--394. Google Scholar

[106] Lin Y D, Lai Y C, Chen C H. Identifying android malicious repackaged applications by thread-grained system call sequences. Comput Security, 2013, 39: 340-350 CrossRef Google Scholar

[107] Kang H, Jang J, Mohaisen A. Detecting and Classifying Android Malware Using Static Analysis along with Creator Information. Int J Distributed Sens Networks, 2015, 11: 479174 CrossRef Google Scholar

[108] Allix K, Bissyandé T F, Jérome Q. Empirical assessment of machine learning-based malware detectors for Android. Empir Software Eng, 2016, 21: 183-211 CrossRef Google Scholar

[109] Zhang Y, Yang M, Xu B Q, et al. Vetting undesirable behaviors in android apps with permission use analysis. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Berlin, 2013. 611--622. Google Scholar

[110] Enck W, Gilbert P, Han S, et al. TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst, 2014, 32: 5. Google Scholar

[111] Hornyack P, Han S, Jung J, et al. These aren't the droids you're looking for: retrofitting android to protect data from imperious applications. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Chicago, 2011. 639--652. Google Scholar

[112] Arzt S, Rasthofer S, Fritz C, et al. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Edinburgh, 2014. 259--269. Google Scholar

[113] Klieber W, Flynn L, Bhosale A, et al. Android taint flow analysis for app sets. In: Proceedings of the ACM SIGPLAN International Workshop on the State Of the Art in Java Program Analysis (SOAP), Edinburgh, 2014. 1--6. Google Scholar

[114] Li L, Bartel A, Bissyande T, et al. Iccta: Detecting inter-component privacy leaks in Android apps. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE), Florence, 2015. 280--291. Google Scholar

[115] Octeau D, Luchaup D, Dering M, et al. Composite constant propagation: Application to android inter-component communication analysis. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE), Florence, 2015. 77--88. Google Scholar

[116] Huang J J, Li Z C, Xiao X S, et al. SUPOR: precise and scalable sensitive user input detection for Android apps. In: Proceedings of the USENIX Security Symposium, Austin, 2015. 977--992. Google Scholar

[117] Felt A P, Chin E, Hanna S, et al. Android permissions demystified. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Chicago, 2011. 627--638. Google Scholar

[118] Chin E, Felt A, Greenwood K, et al. Analyzing inter-application communication in Android. In: Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys), Bethesda, 2011. 239--252. Google Scholar

[119] Lu L, Li Z C, Wu Z Y, et al. Chex: statically vetting android apps for component hijacking vulnerabilities. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Raleigh, 2012. 229--240. Google Scholar

[120] Kantola D, Chin E, He W, et al. Reducing attack surfaces for intra-application communication in android. In: Proceedings of the Workshop on Security and Privacy in Smartphones and Mobile Devices (SPSM), Raleigh, 2012. 69--80. Google Scholar

[121] Pandita R, Xiao X S, Yang W, et al. WHYPER: towards automating risk assessment of mobile applications. In: Proceedings of the USENIX Security Symposium, Washington, 2013. 527--542. Google Scholar

[122] Yu L, Luo X, Qian C. Enhancing the Description-to-Behavior Fidelity in Android Apps with Privacy Policy. IIEEE Trans Software Eng, 2018, 44: 834-854 CrossRef Google Scholar

[123] Yu L, Luo X P, Liu X L, et al. Can we trust the privacy policies of Android apps? In: Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, 2016. 538--549. Google Scholar

[124] Yu L, Luo X, Chen J. PPChecker: Towards Accessing the Trustworthiness of Android Apps' Privacy Policies. IIEEE Trans Software Eng, 2018, : 1-1 CrossRef Google Scholar

[125] Slavin R, Wang X Y, Hosseini M, et al. Toward a framework for detecting privacy policy violations in android application code. In: Proceedings of the 38th International Conference on Software Engineering (ICSE), Austin, 2016. 25--36. Google Scholar

[126] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Comput, 2006, 18: 1527--1554. Google Scholar

[127] Pascanu R, Stokes J W, Sanossian H, et al. Malware classification with recurrent networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Queensland, 2015. 1916--1920. Google Scholar

[128] David O E, Netanyahu N S. Deepsign: deep learning for automatic malware signature generation and classification. In: Proceedings of the International Joint Conference on Neural Networks, Killarney, 2015. 1--8. Google Scholar

[129] Saxe J, Berlin K. Deep neural network based malware detection using two dimensional binary program features. In: Proceedings of the 10th International Conference on Malicious and Unwanted Software, Fajardo, 2015. 11--20. Google Scholar

[130] Yuan Z, Lu Y, Xue Y. Droiddetector: android malware characterization and detection using deep learning. Tinshhua Sci Technol, 2016, 21: 114-123 CrossRef Google Scholar

[131] McLaughlin N, Martinez R J, Kang B, et al. Deep Android malware detection. In: Proceedings of the Conference on Data and Application Security and Privacy, Scottsdale, 2017. 301--308. Google Scholar

[132] Fereidooni H, Conti M, Yao D F, et al. ANASTASIA: Android malware detection using static analysis of applications. In: Proceedings of the 8th IFIP International Conference on New Technologies, Mobility and Security, Larnaca, 2016. 1--5. Google Scholar

[133] Kim T G, Kang B J, Rho M. A Multimodal Deep Learning Method for Android Malware Detection Using Various Features. IEEE TransInformForensic Secur, 2019, 14: 773-788 CrossRef Google Scholar

[134] Tan S, Caruana R, Hooker G, et al. Learning global additive explanations for neural nets using model distillation. 2018,. arXiv Google Scholar

[135] Ribeiro M T, Singh S, Guestrin C. Why should I trust you? explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, 2016. 1135--1144. Google Scholar

[136] Fratantonio Y, Bianchi A, Robertson W, et al. Triggerscope: towards detecting logic bombs in android applications. In: Proceedings of the IEEE Symposium on Security and Privacy, San Jose, 2016. 377--396. Google Scholar

[137] Suciu O, Coull S E, Johns J. Exploring adversarial examples in malware detection. 2018,. arXiv Google Scholar

[138] Grosse K, Papernot N, Manoharan P, et al. Adversarial examples for malware detection. In: Proceedings of the European Symposium on Research in Computer Security, Oslo, 2017. 62--79. Google Scholar

[139] Al-Dujaili A, Huang A, Hemberg E, et al. Adversarial deep learning for robust detection of binary encoded malware. In: Proceedings of the IEEE Security and Privacy Workshops (SPW), Gothenburg, 2018. 76--82. Google Scholar

[140] Shao R, Rastogi V, Chen Y. Understanding In-App Ads and Detecting Hidden Attacks through the Mobile App-Web Interface. IEEE Trans Mobile Comput, 2018, 17: 2675-2688 CrossRef Google Scholar

[141] Crussell J, Stevens R, Chen H. Madfraud: investigating ad fraud in Android applications. In: Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys), Bretton Woods, 2014. 123--134. Google Scholar

[142] Dong F, Wang H Y, Li L, et al. Frauddroid: automated ad fraud detection for android apps. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE), Lake Buena Vista, 2018. 257--268. Google Scholar

[143] Hu Y, Wang H, Zhou Y. Dating with Scambots: Understanding the Ecosystem of Fraudulent Dating Applications. IEEE Trans Dependable Secure Comput, 2019, : 1-1 CrossRef Google Scholar

  • Figure 1

    Overview of the machine learning-based malware detection methods

  • Figure 2

    Overview of three feature analysis techniques

  • Figure 3

    (Color online) An example of data flow analysis

  • Table 1   The ability of three kinds of methods to handle existing challenges
    Method
    Diversity of
    malicious code
    Huge samples
    to analyze
    Difficulty of
    labeling
    Bad interpretability
    Signature-based method Weak Strong Weak Medium
    Machine Learning-based method Strong Strong Weak Weak
    Behavior-based method Strong Weak Strong Strong
  • Table 2   Descriptions of datasets
    Dataset #Sample #Family Average file size (MB) Time
    Genome dataset [8] 1260 49 1.3 2011$\sim$2012
    Drebin dataset [15] 5560 179 1.3 2011$\sim$2014
    FalDroid dataset [26] 8407 36 1.9 2013$\sim$2014
    DroidBench dataset 119 0.2 2014$\sim$2016
    AMD dataset [43] 24553 71 2.1 2010$\sim$2016
    RmvDroid dataset [44] 9133 56 4.8 2014$\sim$2018
  • Table 3   Part of the family label dictionary
    Family label Other similar labels
    basebridge bridge
    droiddreamlight ddlight/lightdd/drdlightd/
    droidkungfu kungf/gongf/droidkungf/droidkungfu2
    fakeinst fakeinstall/fakeins
    plankton planktonc/plangton
    geinimi geinim/geinimia/geinimix
  • Table 4   Descriptions of the graph feature-based methods
    Graph model Typical method Node type Granularity
    Function call graph
    Adagio [86], MaMaDroid [17],
    MIGDroid [87], DAPASA [18], FalDroid [26]
    Function name Medium
    Control flow graph
    Centroid [3], GroupDroid [88],
    ADAM [89], SMART [90]
    Basic block Fine
    Data flow graph DNADroid [91], PVCS [92] Statement Fine
    UI graph
    ViewDroid [93], ResDroid [94],
    MassVet [10], SmartDroid [95]
    View Coarse
    Package dependency graph PiggyApp [96] Package name Coarse
    Class dependency graph DR-Droid [97], Droidlegacy [98] Class name Coarse
    API dependency graph DroidSIFT [28] API Medium
    Heterogeneous information network HinDroid [99] API, app Medium
  • Table 5   Performance of existing machine learning-based methods$^{\rm~a)}$
    Method Time Task #B #M Detection performance (%)
    Puma [61] 2012 MD 1811 249 TPR = 91, FPR = 19
    DroidMat [71] 2012 MD 1500 238 TPR = 87, FPR = 0.4
    SCSdroid [106] 2013 MD 100 49 Precision = 95.97
    DroidAPIMiner [66] 2013 MD 16000 3987 TPR = 99, FPR = 2
    Adagio [86] 2013 MD 135792 12158 TPR = 89, PFR = 1
    PVCS [92] 2014 MD 2436 1433 TPR = 96.52, FPR = 1
    V.Grampurohit [82] 2014 MD 24335 1530 TPR = 91.8, FPR = 11.4
    W.Wang [19] 2014 MD 310926 4868 TPR = 94.62, FPR = 0.6
    Drebin [15] 2014 MD 123453 5560 TPR = 94, FPR = 1
    Droidlegacy [98] 2014 MD/FC 48 1052 Precision = 97, ACC = 92.9
    DroidSIFT [28] 2014 MD/FC 13500 2200 TPR = 98, FPR = 5.15, ACC = 93
    Dendroid [80] 2014 FC 1260 ACC = 94.2
    MUDFLOW [2] 2015 MD 2866 15338 TPR = 86.4, FPR = 18.7
    AndroidTracker [107] 2015 MD 51179 4554 Precision = 90
    K.Allix [108] 2016 MD 51800 1200 Precision = 94
    SMART [90] 2016 MD 223170 5560 Precision = 97
    DAPASA [18] 2017 MD 44921 2551 TPR = 95, FPR = 0.7
    X.Wang [74] 2017 MD 166365 18363 TPR = 96, FPR = 0.06
    MaMaDroid [17] 2017 MD 8500 35500 F-measure = 99
    HinDroid [99] 2017 MD 15000 15000 TPR = 98.33, FPR = 0.87
    FalDroid [26] 2018 FC 8407 ACC = 94.2
    W.Wang [83] 2018 MD 107327 8701 Precison = 99.39
    RevealDroid [76] 2018 MD/FC 24679 30203 Precision = 98, ACC = 95

    a) MD denotes the malware detection task; FC denotes the familial identification task; #B denotes the number of benign samples; #M denotes the number of malicious samples; and ACC denotes the prediction accuracy of FC.

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号