SCIENCE CHINA Information Sciences, Volume 61 , Issue 3 : 032114(2018) https://doi.org/10.1007/s11432-017-9288-4

An adaptive system for detecting malicious queries in web attacks

More info
  • ReceivedAug 1, 2017
  • AcceptedOct 27, 2017
  • PublishedFeb 2, 2018


Web request query strings (queries), which pass parameters to a referenced resource, are always manipulated by attackers to retrieve sensitive data and even take full control of victim web servers and web applications.However, existing malicious query detection approaches in the literature cannot cope with changing web attacks.In this paper, we introduce a novel adaptive system (AMOD) that can adaptively detect web-based code injection attacks, which are the majority of web attacks, by analyzing queries.We also present a new adaptive learning strategy, called SVM HYBRID, leveraged by our system to minimize manual work.In the evaluation, an up-to-date detection model is trained on a ten-day query dataset collected from an academic institute's web server logs.The evaluation shows our approach overwhelms existing approaches in two respects.Firstly, AMOD outperforms existing web attack detection methods with an $F$-value of 99.50% and FP rate of 0.001%.Secondly, the total number of malicious queries obtained by SVM HYBRID isłinebreak 3.07 times that by the popular support vector machine adaptive learning (SVM AL) method.The malicious queries obtained can be used to update the web application firewall (WAF) signature library.


This work was supported in part by National Key Reasearch and Development Program of China (Grant No. 2016YFB0800703), in part by National Natural Science Foundation of China (Grant Nos. 61272481, 61572460), in part by Open Project Program of the State Key Laboratory of Information Security (Grant Nos. 2017-ZD-01, 2016-MS-02), and in part by National Information Security Special Project of the National Development and Reform Commission of China (Grant No. (2012)1424). We would also like to thank Dr. Xinyu Xing in Pennsylvania State University for his help with this work.


[1] Symantec. Internet security threat report. 2016. https://www.symantec.com/security-center/threat-report. Google Scholar

[2] Fonseca J, Vieira M, Madeira H. Evaluation of Web Security Mechanisms Using Vulnerability & Attack Injection. IEEE Trans Dependable Secure Comput, 2014, 11: 440-453 CrossRef Google Scholar

[3] Imperva. Web application attack report. 2015. https://www.imperva.com/docs/HII_Web_Application_Attack_Report_Ed6.pdf. Google Scholar

[4] WhiteHat. Web application security statistic report. 2016. https://info.whitehatsec.com/rs/675-YBI-674/images/WH-2016-Stats-Report-FINAL.pdf. Google Scholar

[5] Lawal M, Sultan A B M, Shakiru A O. Systematic literature review on SQL injection attack. Int J Soft Comput, 2016, 11: 26--35. Google Scholar

[6] Symantec. Team ghostshell hacking group back with a bang. 2015. https://www.symantec.com/connect/blogs/team-ghostshell-hacking-group-back-bang. Google Scholar

[7] Aleroud A, Zhou L. Phishing environments, techniques, and countermeasures: A survey. Comput Security, 2017, 68: 160-196 CrossRef Google Scholar

[8] Fang Z, Liu Q, Zhang Y. A static technique for detecting input validation vulnerabilities in Android apps. Sci China Inf Sci, 2017, 60: 052111 CrossRef Google Scholar

[9] Prokhorenko V, Choo K K R, Ashman H. Web application protection techniques: A taxonomy. J Network Comput Appl, 2016, 60: 95-112 CrossRef Google Scholar

[10] Kruegel C, Vigna G, Robertson W. A multi-model approach to the detection of web-based attacks. Comput Networks, 2005, 48: 717-738 CrossRef Google Scholar

[11] Robertson W K, Vigna G, Kruegel C, et al. Using generalization and characterization techniques in the anomaly-based detection of web attacks. In: Proceedings of the 13th Annual Network and Distributed System Security Symposium (NDSS'06), San Diego, 2006. Google Scholar

[12] Song Y, Keromytis A D, Stolfo S J. Spectrogram: a mixture-of-markov-chains model for anomaly detection in web traffic. In: Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS'09), San Diego, 2009. 121--135. Google Scholar

[13] Kozakevicius A, Cappo C, Mozzaquatro B A. URL query string anomaly sensor designed with the bidimensional Haar wavelet transform. Int J Inf Secur, 2015, 14: 561-581 CrossRef Google Scholar

[14] Juvonen A, Sipola T, H?m?l?inen T. Online anomaly detection using dimensionality reduction techniques for HTTP log analysis. Comput Networks, 2015, 91: 46-56 CrossRef Google Scholar

[15] Xie Y, Tang S, Huang X. Detecting latent attack behavior from aggregated Web traffic. Comput Commun, 2013, 36: 895-907 CrossRef Google Scholar

[16] Fan W K G. An adaptive anomaly detection of web-based attacks. In: Proceedings of the 7th International Conference on Computer Science & Education (ICCSE'12), Melbourne, 2012. 690--694. Google Scholar

[17] Pinzón C, De Paz J F, Bajo J, et al. AIIDA-SQL: an adaptive intelligent intrusion detector agent for detecting SQL injection attacks. In: Proceedings of the 10th International Conference on Hybrid Intelligent Systems (HIS'10), Atlanta, 2010. 73--78. Google Scholar

[18] Meng Y, Kwok L F. Adaptive blacklist-based packet filter with a statistic-based approach in network intrusion detection. J Network Comput Appl, 2014, 39: 83-92 CrossRef Google Scholar

[19] Wang W, Guyet T, Quiniou R. Autonomic intrusion detection: Adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowledge-Based Syst, 2014, 70: 103-117 CrossRef Google Scholar

[20] Zhang J, Li H, Gao Q. Detecting anomalies from big network traffic data using an adaptive detection approach. Inf Sci, 2015, 318: 91-110 CrossRef Google Scholar

[21] AlEroud A F, Karabatis G. Queryable Semantics to Detect Cyber-Attacks: A Flow-Based Detection Approach. IEEE Trans Syst Man Cybern Syst, 2018, 48: 207-223 CrossRef Google Scholar

[22] Aleroud A, Karabatis G. Contextual information fusion for intrusion detection: a survey and taxonomy. Knowl Inf Syst, 2017, 52: 563-619 CrossRef Google Scholar

[23] Sousa A F M, Prudêncio R B C, Ludermir T B. Active learning and data manipulation techniques for generating training examples in meta-learning. Neurocomputing, 2016, 194: 45-55 CrossRef Google Scholar

[24] Rossi A L D, de Carvalho A C P L F, Soares C. MetaStream: A meta-learning based method for periodic algorithm selection in time-changing data. Neurocomputing, 2014, 127: 52-64 CrossRef Google Scholar

[25] Folino G, Sabatino P. Ensemble based collaborative and distributed intrusion detection systems: A survey. J Network Comput Appl, 2016, 66: 1-16 CrossRef Google Scholar

[26] The HTTP dataset CSIC 2010. http://www.isi.csic.es/dataset/. Google Scholar

[27] Zheng Y H, Zhang X Y. Path sensitive static analysis of web applications for remote code execution vulnerability detection. In: Proceedings of the 35th International Conference on Software Engineering (ICSE'13), San Francisco, 2013. 652--661. Google Scholar

[28] Jamdagni A, Tan Z, He X. RePIDS: A multi tier Real-time Payload-based Intrusion Detection System. Comput Networks, 2013, 57: 811-824 CrossRef Google Scholar

[29] Garcia-Teodoro P, Diaz-Verdejo J E, Tapiador J E. Automatic generation of HTTP intrusion signatures by selective identification of anomalies. Comput Security, 2015, 55: 159-174 CrossRef Google Scholar

[30] Zhong Y, Asakura H, Takakura H, et al. Detecting malicious inputs of web application parameters using character class sequences. In: Proceedings of the 39th Annual Computer Software and Applications Conference (COMPSAC'15), Taichung, 2015. 525--532. Google Scholar

[31] Ariu D, Tronci R, Giacinto G. HMMPayl: An intrusion detection system based on Hidden Markov Models. Comput Security, 2011, 30: 221-241 CrossRef Google Scholar

[32] Wang K, Stolfo S J. Anomalous payload-based network intrusion detection. In: Proceedings of the 7th International Symposium on Recent Advances in Intrusion Detection (RAID'04), Sophia Antipolis, 2004. 203--222. Google Scholar

[33] Wang K, Parekh J J, Stolfo S J. Anagram: a Content Anomaly Detector Resistant to Mimicry Attack. Berlin: Springer, 2006. Google Scholar

[34] Oza A, Ross K, Low R M. HTTP attack detection using n-gram analysis. Comput Security, 2014, 45: 242-254 CrossRef Google Scholar

[35] Perdisci R, Ariu D, Fogla P. McPAD: A multiple classifier system for accurate payload-based anomaly detection. Comput Networks, 2009, 53: 864-881 CrossRef Google Scholar

[36] Swarnkar M, Hubballi N. OCPAD: One class Naive Bayes classifier for payload based anomaly detection. Expert Syst Appl, 2016, 64: 330-339 CrossRef Google Scholar

[37] Duessel P, Gehl C, Flegel U, et al. Detecting zero-day attacks using context-aware anomaly detection at the application-layer. Int J Inf Secur, 2016, 16: 475--490. Google Scholar

[38] Vapnik V, Kotz S. Estimation of Dependences Based on Empirical Data. New York: Springer-Verlag, 2006. Google Scholar

[39] Guo H S, Wang W J. An active learning-based SVM multi-class classification model. Pattern Recogn, 2015, 4: 1577--1597. Google Scholar

[40] Kremer J, Steenstrup Pedersen K, Igel C. Active learning with support vector machines. WIREs Data Min Knowl Discov, 2014, 4: 313-326 CrossRef Google Scholar

[41] Gao F, Lv W, Zhang Y. A novel semisupervised support vector machine classifier based on active learning and context information. Multidim Syst Sign Process, 2016, 27: 969-988 CrossRef Google Scholar

[42] Wang M, Min F, Zhang Z H. Active learning through density clustering. Expert Syst Appl, 2017, 85: 305-317 CrossRef Google Scholar

[43] Aghaee A, Ghadiri M, Baghshah M S, et al. Active distance-based clustering using K-medoids. In: Proceedings of the 20th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'16), Auckland, 2016. 253--264. Google Scholar

[44] Baram Y, Ran E Y, Luz K. Online choice of active learning algorithms. J Mach Learn Res, 2012, 5: 255--291. Google Scholar

[45] Wolpert D H. Stacked generalization. Neural Networks, 1992, 5: 241-259 CrossRef Google Scholar

[46] Hillstone Networks. Hillstone e-series next-generation firewalls. http://www.hillstonenet.com/our-products/next-gen-firewalls-e-series/. Google Scholar

[47] Fielding R, Gettys J, Mogul J, et al. RFC 2616: hypertext transfer protocol-HTTP/1.1. Comput Sci Commun Dict, 1999, 7: 3969--3973. Google Scholar

[48] Ambusaidi M A, He X, Nanda P. Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm. IEEE Trans Comput, 2016, 65: 2986-2998 CrossRef Google Scholar

[49] Ben-Hur A, Weston J. A user's guide to support vector machines. In: Data Mining Techniques for the Life Sciences. Berlin: Springer, 2010. 223--239. Google Scholar

[50] Xiong C, Johnson D M, Corso J J. Active Clustering with Model-Based Uncertainty Reduction.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 5-17 CrossRef PubMed Google Scholar

[51] Prandl S, Lazarescu M, Pham D S. A study of web application firewall solutions. In: Proceedings of the 11th International Conference on Information Systems Security (ICISS'15), Kolkata, 2015. 501--510. Google Scholar

[52] Trustwave. Modsecurity core rule set. 2016. https://www.owasp.org/index.php/Category:OWASP_ModSecurity_Core_Rule_Set_Project. Google Scholar

[53] Kantchelian A, Afroz S, Huang L, et al. Approaches to adversarial drift. In: Proceedings of the 6th ACM Workshop on Artificial Intelligence and Security (AISec'13), Berlin, 2013. 99--110. Google Scholar

  • Figure 1

    Sample entry in web server access logs.

  • Figure 2

    Taxonomy of the latest academic researches on anomaly detection methods of web attacks and HTTP attacks.

  • Figure 3

    (Color online) System overview.

  • Figure 4

    (Color online) Illustrative graph of SVM HYBRID.

  • Figure 5

    (Color online) (a) SS vs. SVM AL; (b) comparisons of $F$-values; (c) #Mal. queries obtained.

  • Table 1   Comparison of normal queries and malicious queries in web-based code injection attacks
    No. Type Query in a web request URL
    (a) Normal http://site/index.php?postID=123
    (b) SQLI http://site/index.php?postID='unionselect0,
    (c) XSS http://site/index.php?postID=$<$script$>$document.location
    (d) DT http://site/index.php?postID=../../../../etc/passwd
    (e) RFI http://site/index.php?postID=http://mal_site/hack.txt?ls
  • Table 2   Description of web server logs and data preprocessing
    Raw logs Data preprocessing
    Item Duration Log size Original Cleaned Normalized Filtered
    (days) (GB) requests queries queries queries
    # 10 31.1 4764598 1882901 1189228 1123497
  • Table 3   Statistics of query dataset
    #Queries Benign Malicious Total
    Initial set 900 100 1000
    Ten unlabeled sets 990000 10000 1000000
  • Table 4   Determining the best setting
    Feature reduction Stacking model SVM HYBRID
    IG PCA Base classifiers Meta classifier Partition #Obtained queries (/day)
    800 80 RF, Logistic, MLP SVM-RBF 1:1 150
  • Table 5   Detection performance and runtime of web attack detection methods
    Method HTTP dataset CSIC 2010 Institute dataset
    $F$-value (%) TPR (%) FPR (%) Time (s) $F$-value (%) TPR (%) FPR (%) Time (s)
    Linear combination [11] 97.12 95.57 1.23 16.2 91.18 93.00 0.01 26.7
    Wavelet transform [13] 96.60 94.19 0.81 18.3 85.42 82.00 0.01 28.1
    Dimension reduction [14] 96.73 94.13 0.49 7.9 88.89 84.00 0.005 12.2
    Adaptive learning (AMOD) 99.96 99.95 0.03 4.4 99.50 100.00 0.001 6.7

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备17057255号       京公网安备11010102003388号