SCIENTIA SINICA Informationis, Volume 48, Issue 9: 1227-1241(2018) https://doi.org/10.1360/N112017-00284

## Multiview-based group behavior analysis in optical image sequence

• AcceptedFeb 22, 2018
• PublishedAug 23, 2018
Share
Rating

### Abstract

Group behavior analysis is a hot topic in intelligent video surveillance, and has attracted a surge of interest in the field of artificial intelligence. Groups are the basic components of a crowd system, and provide a high-level representation of the crowd phenomenon. By investigating the motion dynamics within each image patch, this paper proposes a multiview-based group behavior analysis method that is able to divide the paths into different groups. The main contributions are threefold: (1) the correlation between image paths is captured from four views (interaction, distance, motion direction, and motion transition), (2) a multiview clustering method with diversity regularization is proposed to perceive the complementary information within the multiview data and alleviate the influence of redundant features, and (3) a cluster merging strategy is designed to combine the highly correlated clusters and determine the final groups automatically. Experimental results on several benchmark datasets validate the good performance of the proposed method.

### References

[1] Wang Q, Fang J W, Yuan Y. Multi-cue based tracking. Neurocomputing, 2014, 131: 227-236 CrossRef Google Scholar

[2] Zhang Y Y, Zhou D S, Chen S Q, et al. Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 589--597. Google Scholar

[3] Wang W Y, Lin W Y, Chen Y Z, et al. Finding coherent motions and semantic regions in crowd scenes: a diffusion and clustering approach. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 756--771. Google Scholar

[4] Yuan Y, Fang J W, Wang Q. Online anomaly detection in crowd scenes via structure analysis. IEEE Trans Cybern, 2015, 45: 548-561 CrossRef PubMed Google Scholar

[5] Zhou B L, Tang X O, Wang X G. Coherent filtering: detecting coherent motions from crowd clutters. In: Proceedings of European Conference on Computer Vision, Florence, 2012. 857--871. Google Scholar

[6] Shao J, Loy C C, Wang X G. Scene-independent group profiling in crowd. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 2227--2234. Google Scholar

[7] Wu Y P, Ye Y D, Zhao C Y. Coherent motion detection with collective density clustering. In: Proceedings of ACM Conference on Multimedia Conference, Brisbane, 2015. 361--370. Google Scholar

[8] Zhou B, Tang X O, Zhang H. Measuring crowd collectiveness. IEEE Trans Pattern Anal Mach Intel, 2014, 36: 1586-1599 CrossRef PubMed Google Scholar

[9] Zhou B L, Wang X G, Tang X O. Random field topic model for semantic region analysis in crowded scenes from tracklets. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, 2011. 3441--3448. Google Scholar

[10] Wang Q, Chen M L, Li X L. Quantifying and detecting collective motion by manifold learning. In: Proceedings of AAAI Conference on Artificial Intelligence, San Francisco, 2017. 4292--4298. Google Scholar

[11] Chen M L, Wang Q, Li X L. Anchor-based group detection in crowd scenes. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, New Orleans, 2017. 1378--1382. Google Scholar

[12] Ali S, Shah M. A Lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, 2007. Google Scholar

[13] Li X L, Chen M L, Nie F P, et al. A multiview-based parameter free framework for group detection. In: Proceedings of AAAI Conference on Artificial Intelligence, San Francisco, 2017. 4147--4153. Google Scholar

[14] Chen M, Wang Q, Li X. Patch-based topic model for group detection. Sci China Inf Sci, 2017, 60: 113101 CrossRef Google Scholar

[15] Sharma R, Guha T. A trajectory clustering approach to crowd flow segmentation in videos. In: Proceedings of IEEE International Conference on Image Processing, Phoenix, 2016. 1200--1204. Google Scholar

[16] Kumar A, Rai P, Daum H. Co-regularized multi-view spectral clustering. In: Proceedings of Advances in Neural Information Processing Systems, Granada, 2011. 1413--1421. Google Scholar

[17] Cai X, Nie F P, Huang H, et al. Heterogeneous image feature integration via multi-modal spectral clustering. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, 2011. 1977--1984. Google Scholar

[18] Li Y Q, Nie F P, Huang H, et al. Large-scale multi-view spectral clustering via bipartite graph. In: Proceedings of AAAI Conference on Artificial Intelligence, Texas, 2015. 2750--2756. Google Scholar

[19] Xia R K, Pan Y, Du L, et al. Robust multi-view spectral clustering via low-rank and sparse decomposition. In: Proceedings of AAAI Conference on Artificial Intelligence, Quebec, 2014. 2149--2155. Google Scholar

[20] Liu X W, Zhou S H, Wang Y Q, et al. Optimal neighborhood kernel clustering with multiple kernels. In: Proceedings of AAAI Conference on Artificial Intelligence, San Francisco, 2017. 2262--2272. Google Scholar

[21] Liu X W, Dou Y, Yin J P, et al. Multiple kernel kmeans clustering with matrix-induced regularization. In: Proceedings of AAAI Conference on Artificial Intelligence, Phoenix, 2016. 1888--1894. Google Scholar

[22] Cao X C, Zhang C Q, Fu H Z, et al. Diversity-induced multi-view subspace clustering. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 586--594. Google Scholar

[23] Nie F P, Li J, Li X L. Self-weighted multiview clustering with multiple graphs. In: Proceedings of International Joint Conference on Artificial Intelligence, Melbourne, 2017. 2564--2570. Google Scholar

[24] Achanta R, Shaji A, Smith K. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intel, 2012, 34: 2274-2282 CrossRef PubMed Google Scholar

[25] Senst T, Eiselein V, Sikora T. Robust local optical flow for feature tracking. IEEE Trans Circ Syst Video Technol, 2012, 22: 1377-1387 CrossRef Google Scholar

[26] Geng Q C, Zhou Z, Cao X C. Survey of recent progress in semantic image segmentation with CNNs. Sci China Inf Sci, 2018, 61: 051101 CrossRef Google Scholar

[27] Wang J H, Liu B, Xu K. Semantic segmentation of high-resolution images. Sci China Inf Sci, 2017, 60: 123101 CrossRef Google Scholar

[28] Ballerini M, Cabibbo N, Candelier R. From the cover: interaction ruling animal collective behavior depends on topological rather than metric distance: evidence from a field study. Proc Natl Acad Sci USA, 2008, 105: 1232-1237 CrossRef PubMed ADS arXiv Google Scholar

[29] Kullback S. On the convergence of discrimination information (corresp.). IEEE Trans Inf Theory, 1968, 14: 765-766 CrossRef Google Scholar

[30] Shumway R H, Stoffer D S. An approach to time series smoothing and forecasting using the EM algorithm. J Time Ser Anal, 1982, 3: 253-264 CrossRef Google Scholar

[31] Mohar B, Alavi Y, Chartrand G, et al. The Laplacian spectrum of graphs. Graph Theory Combin Appl, 1991, 18: 871--898. Google Scholar

[32] Nie F P, Wang X Q, Jordan M, et al. The constrained laplacian rank algorithm for graph-based clustering. In: Proceedings of AAAI Conference on Artificial Intelligence, Phoenix, 2016. 1969--1976. Google Scholar

[33] Nie F P, Wang H, Huang H. Joint Schatten (p)-norm and (ell _p)-norm robust matrix completion for missing value recovery. Knowl Inf Syst, 2015, 42: 525-544 CrossRef Google Scholar

[34] Winn J, Jojic N. LOCUS: learning object classes with unsupervised segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Beijing, 2005. 756--763. Google Scholar

[35] Breukelen M, Duin R, Tax D, et al. Handwritten digit recognition by combined classifiers. Kybernetika, 1998, 34: 381--386. Google Scholar

[36] Li F F, Fergus R, Perona P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vision Image Und, 2007, 106: 59-70 CrossRef Google Scholar

• Figure 1

(Color online) The pipeline of the proposed method. First, the image data is segmented into patches. Then, we construct the similarity graphs for the image patches from four aspects, including interaction, spatial distance, motion direction, and motion transition. And a multiview clustering is proposed to cluster the patches. Finally, the clusters with high consistency are merged into final groups.

• Figure 2

(Color online) The influence of ${\rm~th}_2$ on the group detection (a) ACC and (b) F-score with the training data

• Figure 3

Representation results on CUHK Crowd dataset. (a) Ground truth; (b) image patches; (c) multiview clustering results; (d) classes after merging; (e) groups detected by the proposed method; (f) groups detected by CF; (g) weight distribution of different views

• Figure 4

(Color online) Clustering accuracy with different $\beta$. (a) MSRC-v1; (b) Caltech101-7

• Figure 5

(Color online) F-score with different $\beta$. (a) MSRC-v1; (b) Caltech101-7

• Figure 6

(Color online) The convergence curves on different datasets. (a) MSRC-v1; (b) Caltech101-7

• Table 1   Experimental results on CUHK Crowd dataset$^{\rm~a)}$
 CF CT MCC CDC MPF MGBA V-inter V-dist V-direc V-trans ACC 0.70 0.75 0.68 0.67 0.80 0.85 0.74 0.75 0.73 0.76 F-score 0.67 0.74 0.67 0.67 0.79 0.83 0.74 0.73 0.72 0.75
•

Algorithm 1 问题(20)的求解算法

Set $1~<~\rho~<~2$, initial $\mu$ and $\Lambda$;

repeat

Update ${\tilde~w}$ with ${\tilde~w}=w-\frac{1}{\mu}(A^{\rm~T}w+\Lambda)$;

Update $w$ by solving $\mathop {\min }\limits_{w\ge 0,\sum\nolimits_v {{w_v}} = 1} ||w -{\tilde w}+ \frac{1}{\mu} \Lambda + \frac{Av-2b}{\mu }||_2^2$ with an efficient optimization method [32];

Update $\mu$ by $\mu=\rho\mu$;

Update $\Lambda$ by $\Lambda=\Lambda+\mu(w-{\tilde~w})$;

until Converge.

•

Algorithm 2 目标函数(14)的求解算法

Require:Graphs $\{G_v\}^{n_v}_{v=1}$, parameter $c$, $\beta$ and $\lambda$;

Output:Optimal Graph $S$;

Initialize $S$ and $w$;

repeat

Update $F$ with (13);

Update $S$ by solving problem (15);

Update $w$ by solving problem (20);

until Converge.

• Table 2   Performance of different methods on multiview clustering$^{\rm~a)}$
 ACC F-score Co-reg RMSC MMSC SMC MCD Co-reg RMSC MMSC SMC MCD MSRC-v1 0.70 0.67 0.71 0.70 0.75 0.59 0.59 0.61 0.60 0.73 Digits 0.79 0.77 0.84 0.88 0.90 0.72 0.69 0.79 0.86 0.88 Caltech101-7 0.43 0.59 0.68 0.68 0.81 0.45 0.56 0.69 0.64 0.76 Caltech101-20 0.48 0.51 0.51 0.60 0.64 0.39 0.46 0.41 0.42 0.51
• #### 0

Citations

• Altmetric

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有