logo

SCIENCE CHINA Information Sciences, Volume 55, Issue 9: 2008-2030(2012) https://doi.org/10.1007/s11432-012-4638-z

A new algorithm for fast mining frequent itemsets using N-lists

More info
  • AcceptedMar 7, 2012
  • PublishedAug 7, 2012

Abstract

Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets’ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m + n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though the algorithm consumes more memory when the datasets are sparse, it is still the fastest one.


References

[1] Han J W, Pei J, Yin Y W. Mining frequent itemsets without candidate generation. In: The 2000 ACM SIGMOD International Conference on Management of data (SIGMOD’00), New York, 2000. 1-12. Google Scholar

[2] Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: The 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD’93), Washington, 1993. 207-216. Google Scholar

[3] Han J, Cheng H, Xin D, et al. Frequent itemset mining: current status and future directions. Data Min Knowl Discov,2007, 15: 55-86. CrossRef Google Scholar

[4] Baralis E, Cerquitelli T, Chiusano S. IMine: index support for item set mining. IEEE TKDE J, 2009, 21: 493-506. Google Scholar

[5] Zaki M J, Gouda K. Fast vertical mining using diffsets, In: The 9th ACM SIGKDD International Conference on. Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 326-335. Google Scholar

[6] Deng Z H, Wang Z H. A new fast vertical method for mining frequent itemsets. Int J Comput Intell Syst, 2010, 3:733-744. CrossRef Google Scholar

[7] Agrawal R, Srikant R. Fast algorithm for mining Association rules. In: The 20th International Conference on Very Large Data Bases (VLDB’94), Santiago de Chile, 1994. 487-499. Google Scholar

[8] Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases. In: The21th International Conference on Very Large Data Bases (VLDB’95), Zurich, 1995. 432-443. Google Scholar

[9] Shenoy P, Haritsa J R, Sundarshan S, et al. Turbo-charging vertical mining of large databases. In: ACM International Conference on Management of Data and Symposium on Principles of Database Systems (SIGMOD’00), Dallas, 2000.22-33. Google Scholar

[10] Zaki M J. Scalable algorithms for association mining. IEEE TKDE J, 2000, 12: 372-390. Google Scholar

[11] Pei J, Han J, Lu H, et al. H-mine: Hyper-structure mining of frequent itemsets in large databases. In: IEEE International Conference on Data Mining (ICDM’01), San Jose, 2001. 441-448. Google Scholar

[12] Liu G, Lu H, Lou W, et al. Efficient mining of frequent itemsets using ascending frequency ordered prefix-tree. DMKD J, 2004, 9: 249-274. Google Scholar

[13] Grahne G, Zhu J. Fast algorithms for frequent itemset mining using FP-Trees. IEEE TKDE J, 2005, 17: 1347-1362. Google Scholar

[14] Woon Y K, Ng W K, Lim E P. A support-ordered trie for fast frequent itemset discovery. IEEE TKDE J, 2004, 16:875-879. Google Scholar

[15] Ananthanarayana V S, Murty N M, Subramanian D K. An incremental data mining algorithm for compact realization of prototypes. Pattern Recognit, 2001, 34: 2249-2251. Google Scholar

[16] Grust T. Accelerating xpath location steps, In: The 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD’02), Madison, 2002. 109-120. Google Scholar

[17] Kleinberg J, Tardos E. Algorithm design. Boston: Addison Wesley, 2005. Google Scholar

[18] Bayardo Jr R J. Efficiently mining long itemsets from databases. In: ACM SIGMOD International Conference on Management of Data (SIGMOD’98), Seattle, 1998. 85-93. Google Scholar

[19] Burdick D, Calimlim M, Flannick J, et al. Mafia: A maximal frequent itemset algorithm. IEEE TKDE J, 2005, 17:1490-1504. Google Scholar

[20] Wang J Y, Han J, Pei J. CLOSET+: Searching for the Best Strate-gies for Mining frequent closed itemsets. In: The9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03), Washington,2003. 236-245. Google Scholar

[21] Lee A J T, Wang C S, Weng W Y, et al. An efficient algorithm for mining closed inter-transaction itemsets. Data Knowle Eng, 2008, 66: 68-91. CrossRef Google Scholar

[22] Xin D, Han J, Yan X, et al. On compressing frequent itemsets. Data Knowl Eng, 2007, 60: 5-29. CrossRef Google Scholar

[23] Li H, Chen H. Mining non-derivable frequent itemsets over data stream. Data Knowl Eng, 2009, 68: 481-498. CrossRef Google Scholar

[24] Yao H, Hamilton H J, Butz C J. A foundational approach to mining itemset utilities from databases. In: The SIAM Data Mining (SDM’04), Florida, 2004. 482-486. Google Scholar

[25] Yao H, Hamilton H J. Mining itemset utilities from transaction databases. Data Knowl Eng, 2006, 59: 603-626. CrossRef Google Scholar

[26] Cao L B, Zhao Y C, Zhang C Q. Mining impact-targeted activity itemsets in imbalanced data. IEEE TKDE, 2008,20: 1053-1066. Google Scholar

[27] Chang J H, Lee W S. Finding recent frequent itemsets adaptively over online data streams. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 487-492. Google Scholar

[28] Li X, Deng Z H. Mining frequent itemsets from network flows for monitoring network. Expert Syst Appl, 2010, 37:8850-8860. CrossRef Google Scholar

[29] Agrawal R, Srikant R. Mining sequential itemsets. In: The 11th International Conference on Data Engineering (ICDE’95), Taiwan, 2003. 3-14. Google Scholar

[30] Yan X, Han J. gSpan: Graph-based substructure pattern mining. In: The 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi, 2002. 721-724. Google Scholar

[31] Cao L B, Zhang H F, Zhao Y C, et al. Combined mining: Discovering informative knowledge in complex data. IEEE Trans Syst Man Cybern Part B-Cybern, 2011, 41: 699-712. CrossRef Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1