logo

SCIENCE CHINA Information Sciences, Volume 60, Issue 1: 012101(2017) https://doi.org/10.1007/s11432-015-0935-3

Strict pattern matching under non-overlapping condition

More info
  • ReceivedFeb 14, 2016
  • AcceptedMay 11, 2016
  • PublishedNov 15, 2016

Abstract

Pattern matching (or string matching) is an essential task in computer science, especially in sequential pattern mining, since pattern matching methods can be used to calculate the support (or the number of occurrences) of a pattern and then to determine whether the pattern is frequent or not. A state-of-the-art sequential pattern mining with gap constraints (or flexible wildcards) uses the number of non-overlapping occurrences to denote the frequency of a pattern. Non-overlapping means that any two occurrences cannot use the same character of the sequence at the same position of the pattern. In this paper, we investigate strict pattern matching under the non-overlapping condition. We show that the problem is in P at first. Then we propose an algorithm, called NETLAP-Best, which uses Nettree structure. NETLAP-Best transforms the pattern matching problem into a Nettree and iterates to find the rightmost root-leaf path, to prune the useless nodes in the Nettree after removing the rightmost root-leaf path. We show that NETLAP-Best is a complete algorithm and analyse the time and space complexities of the algorithm. Extensive experimental results demonstrate the correctness and efficiency of NETLAP-Best.


Funded by

Youth Foundation of Education Commission of Hebei Province(QN2014192)

Graduate Student Innovation Program of Hebei Province(220056)

National Natural Science Foundation of China(61571180)

National Natural Science Foundation of China(61370144)

National Natural Science Foundation of China(61229301)

Natural Science Foundation of Hebei Province(F2013202138)

Natural Science Foundation of Hebei Province(G2014202031)


Acknowledgment

Acknowledgments

The work was supported by National Natural Science Foundation of China (Grant Nos. 61229301, 61571180, 61370144), Natural Science Foundation of Hebei Province (Grant Nos. F2013202138, G2014202031), Graduate Student Innovation Program of Hebei Province (Grant No. 220056), and Youth Foundation of Education Commission of Hebei Province (Grant No. QN2014192).


References

[1] Li C, Yang Q Y, Wang J Y, et al. Efficient mining of gap-constrained subsequences and its various applications. ACM Trans Knowl Discov Data, 2012, 6: 2 Google Scholar

[2] Wang P, Xu B W, Wu Y R, et al. Link prediction in social networks: the state-of-the-art. Sci China Inf Sci, 2015, 58: 011101 Google Scholar

[3] Liu J, Ma Z M, Feng X. Answering ordered tree pattern queries over fuzzy XML data. Knowl Inf Syst, 2015, 43: 473-495 CrossRef Google Scholar

[4] Xuan J F, Jiang H, Hu Y, et al. Towards effective bug triage with software data reduction techniques. IEEE Trans Knowl Data Eng, 2015, 27: 264-280 CrossRef Google Scholar

[5] Cook D, Krishnan N C, Rashidi P. Activity discovery and activity recognition: a new partnership. IEEE Trans Cybern, 2013, 43: 820-828 CrossRef Google Scholar

[6] Weng L N, Zhang P, Feng Z Y, et al. Short-term link quality prediction using nonparametric time series analysis. Sci China Inf Sci, 2015, 58: 082308-828 Google Scholar

[7] Rajpathak D, De S. A data-and ontology-driven text mining-based construction of reliability model to analyze and predict component failures. Knowl Inf Syst, 2016, 46: 87-113 CrossRef Google Scholar

[8] Navarro G. Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences. ACM Comput Surv, 2014, 46: 52-113 Google Scholar

[9] Jiang H, Xuan J F, Ren Z L, et al. Misleading classification. Sci China Inf Sci, 2014, 57: 052106-113 Google Scholar

[10] Le H, Prasanna V K. A memory-efficient and modular approach for large-scale string pattern matching. IEEE Trans Comput, 2013, 62: 844-857 CrossRef Google Scholar

[11] Claude F, Navarro G, Peltola H, et al. String matching with alphabet sampling. J Discrete Algorithms, 2012, 11: 37-50 CrossRef Google Scholar

[12] Wandelt S, Deng D, Gerdjikov S, et al. State-of-the-art in string similarity search and join. ACM SIGMOD Rec, 2014, 43: 64-76 CrossRef Google Scholar

[13] Li Z, Ge T J. Online windowed subsequence matching over probabilistic sequences. In: Proceedings of ACM International Conference on Management of Data. New York: ACM, 2012. 277--288. Google Scholar

[14] Chen K-H, Huang G-S, Lee R C-T. Bit-parallel algorithms for exact circular string matching. Comput J, 2014, 57: 731-743 CrossRef Google Scholar

[15] Hu H, Wang H Z, Li J Z, et al. An efficient pruning strategy for approximate string matching over suffix tree. Knowl Inf Syst, 2016, 49: 121-141 CrossRef Google Scholar

[16] Li F F, Yao B, Tang M W, et al. Spatial approximate string search. IEEE Trans Knowl Data Eng, 2013, 25: 1394-1409 CrossRef Google Scholar

[17] Wu X D, Qiang J P, Xie F. Pattern matching with flexible wildcards. J Comput Sci Technol, 2014, 29: 740-750 CrossRef Google Scholar

[18] Wu Y X, Wu X D, Min F, et al. A Nettree for pattern matching with flexible wildcard constraints. In: Proceeding of IEEE International Conference on Information Reuse and Integration, Las Vegas, 2010. 109--114. Google Scholar

[19] Retwitzer M D, Polishchuk M, Churkin E, et al. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res, 2015, doi: 10-750 Google Scholar

[20] Wang X M, Duan L, Dong G Z, et al. Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: Proceedings of International Conference Database Systems for Advanced Applications, Bali, 2014. 372--387. Google Scholar

[21] Liao V C-C, Chen M-S. Efficient mining gapped sequential patterns for motifs in biological sequences. BMC Syst Biol, 2013, 7: S7-750 Google Scholar

[22] Ding B L, Lo D, Han J W, et al. Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of IEEE International Conference on Data Engineering, Shanghai, 2009. 1024--1035. Google Scholar

[23] Yang H, Duan L, Hu B, et al. Mining top-$k$ distinguishing sequential patterns with gap constraint. J Softw, 2015, 26: 2994-3009 Google Scholar

[24] Crochemore M, Iliopoulos C, Makris C, et al. Approximate string matching with gaps. Nordic J Comput, 2002, 9: 54-65 Google Scholar

[25] Cantone D, Cristofaro S, Faro S. New efficient bit-parallel algorithms for the ($\delta$, $\alpha$)-matching problem with applications in music information retrieval. Int J Found Comput Sci, 2009, 20: 1087-1108 CrossRef Google Scholar

[26] Cole J, Chai B, Farris R, et al. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res, 2005, 33: 294-296 Google Scholar

[27] Cole R, Gottlieb L, Lewenstein M. Dictionary matching and indexing with errors and don't care. In: Proceeding of Symposium on Theory of Computing, Chicago, 2004. 91--100. Google Scholar

[28] Zhang M H, Kao B, Cheung D W, et al. Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data, 2007, 1: 7-296 CrossRef Google Scholar

[29] Wu Y X, Wang L L, Ren J D, et al. Mining sequential patterns with periodic wildcard gaps. Appl Intell, 2014, 41: 99-116 CrossRef Google Scholar

[30] Wu X D, Zhu X Q, He Y, et al. PMBC: pattern mining from biological sequences with wildcard constraints. Comput Biol Med, 2013, 43: 481-492 CrossRef Google Scholar

[31] Ibrahim A, Sastry S, Sastry P S. Discovering compressing serial episodes from event sequences. Knowl Inf Syst, 2016, 47: 405-432 CrossRef Google Scholar

[32] Lam H, Mörchen F, Fradkin D, et al. Mining compressing sequential patterns. Stat Anal Data Min, 2013, 7: 34-52 Google Scholar

[33] El-Ramly M, Stroulia E, Sorenson P. From run-time behavior to usage scenarios: an interaction-pattern mining approach. In: Proceeding of ACM International Conference on Knowledge Discovery and Data Mining, Edmonton, 2002. 315--324. Google Scholar

[34] Bille P, G\o rtz I, Vildh\o j H W, et al. String matching with variable length gaps. Theor Comput Sci, 2012, 443: 25-34 CrossRef Google Scholar

[35] Wu Y X, Fu S, Jiang H, et al. Strict approximate pattern matching with general gaps. Appl Intell, 2015, 42: 566-580 CrossRef Google Scholar

[36] Wu Y X, Tang Z Q, Jiang H, et al. Approximate pattern matching with gap constraints. J Inf Sci, 2016, 42: 639-658 CrossRef Google Scholar

[37] Chai X, Jia X F, Wu Y X, et al. Strict pattern matching with general gaps and one-off condition (in Chinese). J Softw, 2015, 26: 1096-1112 Google Scholar

[38] Guo D, Hu X G, Xie F, et al. Pattern matching with wildcards and gap-Length constraints based on a centrality-degree graph. Appl Intell, 2013, 39: 57-74 CrossRef Google Scholar

[39] Wu Y X, Wu X D, Jiang H, et al. A heuristic algorithm for MPMGOOC. Chin J Comput, 2011, 34: 1452-1462 CrossRef Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1