logo

SCIENTIA SINICA Informationis, Volume 48, Issue 12: 1681-1696(2018) https://doi.org/10.1360/N112018-00138

A personalized mail re-filtering system based on the client

More info
  • ReceivedMay 28, 2018
  • AcceptedAug 22, 2018
  • PublishedDec 4, 2018

Abstract

Email is an essential communication tool, but a large number of spam emails canseriously affect the work and life of users and can even cause property damage. Due to differentinterests and hobbies, there may be huge differences in the definition of spam by users; therealization of personalized spam filtering has become an important issue in the field of spamfiltering. When emails are misjudged, the user has to manually modify it, which brings greatinconvenience to the user experience. In order to effectively solve the above problems and realizethe functions of personalized email filtering and automatic correction of mis-filtered emails, thispaper combined with rules and statistical methods presents a personalized email re-filteringsystem based on the client (PRFC) and implements the automatic modification of the mis-filteredemails. A large part of existing spam filters do not consider the difference between class prior probabilityand class imbalance problem; they only filter the mail online. Firstly, the proposed filter systemprocesses the mails entering the inbox and the garbage and then designs two mutually learnedfilters based on the multi-task learning principle to be used for the automatic modification of themis-filtered emails in inbox and garbage. To ensure the performance of the filterbased on the interests of users and data distribution of mails varying with time, amulti-window learning framework that combines important weights to effectively implement thedynamic adaptation of the filter was designed. Finally, our proposed filtering system on the TREC2006c and 2007p data sets that gets a significant filtering efficiency was verified.


Funded by

国家自然科学基金项目(61672281,61472186)


References

[1] Messaging Anti-Abuse Working Group. MAAWG email metrics program. First Quarter 2006 Report. 2006. http://www.maawg.org/about/FINAL_1Q2006_Metrics_Report.pdf. Google Scholar

[2] Teng W L, Teng W C. A personalized spam filtering approach utilizing two separately trained filters. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. Washington: IEEE Computer Society, 2008. 125--131. Google Scholar

[3] Lin H Z, Wang J L, Wu J P, et al. Effect of cold-rolling cladding on microstructure and properties of composite aluminum alloy foil. J Commun, 2017, 34: 121--132. Google Scholar

[4] Huang G W, Liu Y X, Chen Z. Personalized spam filtering method based on users' feedback. Electron Design Eng, 2014, 22: 53--56. Google Scholar

[5] Guzella T S, Caminhas W M. A review of machine learning approaches to Spam filtering. 2009, 36: 10206-10222 CrossRef Google Scholar

[6] Liu W Y, Wang T. Ensemble learning and active learning Based personal spam email filtering. Comput Eng Sci, 2011, 33: 34--41. Google Scholar

[7] Clark J, Koprinska I, Poon J. Linger-a smart personal assistant for e-mail classification. In: Proceedings of the 13th International Conference on Artificial Neural Networks (ICANN'03), 2003. 274--277. Google Scholar

[8] Sahami M, Dumais S, Heckerman D, et al. A Bayesian approach to filtering junk e-mail. In: Proceedings of AAAI Workshop on Learning for Text Categorization, 1998. 62: 98--105. Google Scholar

[9] Graham P. Better Bayesian filtering. 2003. http://www.paulgraham.com/better.html. Google Scholar

[10] Amayri O, Bouguila N. A study of spam filtering using support vector machines. 2010, 34: 73-108 CrossRef Google Scholar

[11] Sanghani G, Kotecha K. Personalized spam filtering using incremental training of support vector machine. In: Proceedings of Conference on Computing, Analytics and Security Trends (CAST), Pune, 2016. 323--328. Google Scholar

[12] Yeh C Y, Wu C H, Doong S H. Effective spam classification based on meta-heuristics. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, Waikoloa, 2005. 4: 3872--3877. Google Scholar

[13] Toolan F, Carthy J. Feature selection for spam and phishing detection. In: Proceedings of Conference on eCrime Researchers Summit (eCrime), Dallas, 2010. 1--12. Google Scholar

[14] Cheng V, Li C H. Personalized spam filtering with semi-supervised classifier ensemble. In: Proceedings of the 2006 IEEE/WIC/ACM international Conference on Web intelligence. Washington: IEEE Computer Society, 2006. 195--201. Google Scholar

[15] Gomes H M, Barddal J P, Enembreck F, et al. A survey on ensemble learning for data stream classification. ACM Comput Surv, 2017, 50: 23. Google Scholar

[16] Wang S, Minku L L, Yao X. A Systematic Study of Online Class Imbalance Learning With Concept Drift.. 2018, 29: 4802-4821 CrossRef PubMed Google Scholar

[17] Syed N A, Liu H, Sung K K. Handling concept drifts in incremental learning with support vector machines. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 1999. 317--321. Google Scholar

[18] Wang Y W, Liu Y N, Feng L Z, et al. A novel online spam identification method based on user interest degree. J South China Univ Tech (Nat Sci Ed), 2014, 7: 21--27. Google Scholar

[19] Junejo K N, Karim A. Robust personalizable spam filtering via local and global discrimination modeling. 2013, 34: 299-334 CrossRef Google Scholar

[20] Cohen L, Avrahami-Bakish G, Last M. Real-time data mining of non-stationary data streams from sensor networks. 2008, 9: 344-353 CrossRef Google Scholar

[21] Gama J, Medas P, Castillo G, et al. Learning with drift detection. In: Proceedings of Conference on Brazilian symposium on artificial intelligence. Berlin: Springer, 2004. 286--295. Google Scholar

[22] Harel M, Mannor S, El-Yaniv R, et al. Concept drift detection through resampling. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, 2014. 1009--1017. Google Scholar

[23] Bach S H, Maloof M A. Paired learners for concept drift. In: Proceedings of the 8th IEEE International Conference on Data Mining, Pisa, 2008. 23--32. Google Scholar

[24] Xu Y, Xu R, Yan W, et al. Concept drift learning with alternating learners. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), Anchorage, 2017. 2104--2111. Google Scholar

[25] Wang J, Xu S, Duan B, et al. An ensemble classification algorithm based on information entropy for data streams. 2017,. arXiv Google Scholar

[26] Mandelbaum A, Shalev A. Word embeddings and their use in sentence classification tasks. 2016,. arXiv Google Scholar

[27] Sugiyama M, Nakajima S, Kashima H, et al. Direct importance estimation with model selection and its application to covariate shift adaptation. In: Proceedings of Conference on Advances in neural information processing systems, Vancouver, 2008. 1433--1440. Google Scholar

[28] Zhang K, Zheng V, Wang Q, et al. Covariate shift in hilbert space: A solution via sorrogate kernels. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, 2013. 388--395. Google Scholar

[29] Liu A, Ziebart B. Robust classification under sample selection bias. In: Proceedings of the Conference on Advances in neural information processing systems, Montreal, 2014. 37--45. Google Scholar

[30] Huang J, Gretton A, Borgwardt K M, et al. Correcting sample selection bias by unlabeled data. In: Proceedings of Conference on Advances in Neural Information Processing Systems, Vancouver, 2007. 601--608. Google Scholar

[31] Kawahara Y, Sugiyama M. Sequential change-point detection based on direct density-ratio estimation. 2012, 5: 114-127 CrossRef Google Scholar

[32] Kanamori T, Hido S, Sugiyama M. Efficient direct density ratio estimation for non-stationarity adaptation and outlier detection. In: Proceedings of Conference on Advances in neural information processing systems, Vancouver, 2009. 809--816. Google Scholar

[33] Kivinen J, Smola A J, Williamson R C. Online Learning with Kernels. 2004, 52: 2165-2176 CrossRef ADS Google Scholar

[34] Junejo K N. Distribution shift resilient discrimination information space for SVM classification. In: Proceedings of 8th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, 2017. 378--383. Google Scholar

[35] Han Y, He X, Yang M, et al. Chinese spam filter based on relaxed online support vector machine. In: Proceedings of Conference on Asian Language Processing (IALP), Harbin, 2010. 185--188. Google Scholar

[36] Sun G, Li S, Chen T, et al. Active learning method for Chinese spam filtering. Int J Performability Eng, 2017, 17: 511. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1