Pattern matching (or string matching) is an essential task in computer science, especially in sequential pattern mining, since pattern matching methods can be used to calculate the support (or the number of occurrences) of a pattern and then to determine whether the pattern is frequent or not. A stateoftheart sequential pattern mining with gap constraints (or flexible wildcards) uses the number of nonoverlapping occurrences to denote the frequency of a pattern. Nonoverlapping means that any two occurrences cannot use the same character of the sequence at the same position of the pattern. In this paper, we investigate strict pattern matching under the nonoverlapping condition. We show that the problem is in P at first. Then we propose an algorithm, called NETLAPBest, which uses Nettree structure. NETLAPBest transforms the pattern matching problem into a Nettree and iterates to find the rightmost rootleaf path, to prune the useless nodes in the Nettree after removing the rightmost rootleaf path. We show that NETLAPBest is a complete algorithm and analyse the time and space complexities of the algorithm. Extensive experimental results demonstrate the correctness and efficiency of NETLAPBest.
The work was supported by National Natural Science Foundation of China (Grant Nos. 61229301, 61571180, 61370144), Natural Science Foundation of Hebei Province (Grant Nos. F2013202138, G2014202031), Graduate Student Innovation Program of Hebei Province (Grant No. 220056), and Youth Foundation of Education Commission of Hebei Province (Grant No. QN2014192).
Figure 2
A Nettree for strict pattern matching with gap constraints. (a) A Nettree; (b) the new Nettree after pruning $\langle$9,10,12,14$\rangle$; (c) the new Nettree after pruning $n_3^{11}$; (d) a Nettree with minroot and maxroot. Note: The grey nodes are pruned.
Set $\textit~I^{\rm~g}$  Set $\textit~I^{\rm~g[0,2]c}$  Set $\textit~{I}^{\rm~g[0,2]c[0,2]g}$ 
$\langle1\rangle$  $\langle1,2\rangle$  $\langle1,2,3\rangle$ 
$\langle3\rangle$  $\langle3,4\rangle$  $\langle3,4,5\rangle$ 
$\langle5\rangle$  
$\text{sup}$(rm g) = 3  $\text{sup}$(rm g[0,2]c) = 2  $\text{sup}$(rm g[0,2]c[0,2]g) = 2 




[b]
Set $\textit~I^{\rm~g}$  Set $\textit~I^{\rm~g[0,1]c}$  Set $\textit~I^{\rm~g[0,1]c[0,1]g}$ 
$\langle1\rangle$  $\langle1,2\rangle$  
$\langle5\rangle$  
$\text{sup}$(rm g) = 2  $\text{sup}$(rm g[0,1]c) = 1  $\text{sup}$(rm g[0,1]c[0,1]g) = 0 
Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有
京ICP备18024590号1