SCIENCE CHINA Information Sciences, Volume 62, Issue 12: 220102(2019) https://doi.org/10.1007/s11432-019-2675-3

## Uncertainty-optimized deep learning model for small-scale person re-identification

• AcceptedSep 5, 2019
• PublishedNov 15, 2019
Share
Rating

### Abstract

In recent years, deep learning has developed rapidly and is widely used in various fields, such as computer vision, speech recognition, and natural language processing. For end-to-end person re-identification, most deep learning methods rely on large-scale datasets. Relatively few methods work with small-scale datasets. Insufficient training samples will affect neural network accuracy significantly. This problem limits the practical application of person re-identification. For small-scale person re-identification, the uncertainty of person representation and the overfitting problem associated with deep learning remain to be solved. Quantifying the uncertainty is difficult owing to complex network structures and the large number of hyperparameters. In this study, we consider the uncertainty of pedestrian representation for small-scale person re-identification. To reduce the impact of uncertain person representations, we transform parameters into distributions and conduct multiple sampling by using multilevel dropout in a testing process. We design an improved Monte Carlo strategy that considers both the average distance and shortest distance for matching and ranking. When compared with state-of-the-art methods, the proposed method significantly improve accuracy on two small-scale person re-identification datasets and is robust on four large-scale datasets.

### Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant Nos. 61673299, 61203247, 61573259, 61573255, 61876218), Fundamental Research Funds for the Central Universities, and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR). The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions.

### References

[1] Zheng L, Shen L Y, Tian L, et al. Scalable person re-identification: a benchmark. In: Proceedings of IEEE International Conference on Computer Vision, 2016. 1116--1124. Google Scholar

[2] Li W, Zhao R, Xiao T, et al. DeepReID: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014. 152--159. Google Scholar

[3] Gou M, Karanam S, Liu W, et al. DukeMTMC4ReID: a large-scale multi-camera person re-identification dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017. 1425--1434. Google Scholar

[4] Wei L H, Zhang S L, Gao W, et al. Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 79--88. Google Scholar

[5] Gray D, Tao H. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2008. 262--275. Google Scholar

[6] Ma B P, Su Y, Jurie F. Local descriptors encoded by fisher vectors for person re-identification. In: Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2012. 413--422. Google Scholar

[7] Matsukawa T, Okabe T, Suzuki E, et al. Hierarchical gaussian descriptor for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1363--1372. Google Scholar

[8] Pala F, Satta R, Fumera G. Multimodal Person Reidentification Using RGB-D Cameras. IEEE Trans Circuits Syst Video Technol, 2016, 26: 788-799 CrossRef Google Scholar

[9] Bai S, Tang P, Torr P H S, et al. Re-ranking via metric fusion for object retrieval and person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 740--749. Google Scholar

[10] Yu R, Zhou Z C, Bai S, et al. Divide and fuse: a re-ranking approach for person re-identification. 2017,. arXiv Google Scholar

[11] Davis J V, Kulis B, Jain P, et al. Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning. New York: ACM, 2007. 209--216. Google Scholar

[12] Köstinger M, Hirzer M, Wohlhart P, et al. Large scale metric learning from equivalence constraints. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2012. 2288--2295. Google Scholar

[13] Xiong F, Gou M, Camps O, et al. Person re-identification using kernel-based metric learning methods. In: Proceedings of the European Conference on Computer Vision, 2014. 1--16. Google Scholar

[14] Varior R R, Haloi M, Wang G. Gated siamese convolutional neural network architecture for human re-identification. In: Proceedings of the European Conference on Computer Vision, 2016. 791--808. Google Scholar

[15] Zheng L, Huang Y J, Lu H C, et al. Pose invariant embedding for deep person re-identification. 2017,. arXiv Google Scholar

[16] Cho Y J, Yoon K J. PaMM: pose-aware multi-shot matching for improving person re-identification. 2017,. arXiv Google Scholar

[17] Lin Y T, Zheng L, Zheng Z D, et al. Improving person re-identification by attribute and identity learning. 2017,. arXiv Google Scholar

[18] Geng M Y, Wang Y W, Xiang T, et al. Deep transfer learning for person re-identification. 2016,. arXiv Google Scholar

[19] Jin H B, Wang X B, Liao S C, Li S. Deep person re-identification with improved embedding and efficient training. In: Proceedings of IEEE International Joint Conference on Biometrics (IJCB). New York: IEEE, 2017. 261--267. Google Scholar

[20] Zhu J, Zeng H, Du Y. Joint Feature and Similarity Deep Learning for Vehicle Re-identification. IEEE Access, 2018, 6: 43724-43731 CrossRef Google Scholar

[21] Imani Z, Soltanizadeh H. Histogram of the node strength and histogram of the edge weight: two new features for RGB-D person re-identification. Sci China Inf Sci, 2018, 61: 092108 CrossRef Google Scholar

[22] Liao S C, Hu Y, Zhu X Y, et al. Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 2197--2206. Google Scholar

[23] Wei L H, Zhang S L, Yao H T, et al. Glad: global-local-alignment descriptor for pedestrian retrieval. In: Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM, 2017. 420--428. Google Scholar

[24] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar

[25] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 1--9. Google Scholar

[26] Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009. 248--255. Google Scholar

[27] Ahmed E, Jones M, Marks T K. An improved deep learning architecture for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3908--3916. Google Scholar

[28] Zhang X, Luo H, Fan X, et al. Alignedreid: Surpassing human-level performance in person re-identification. 2017,. arXiv Google Scholar

[29] Sun Y F, Zheng L, Yang Y, et al. Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European Conference on Computer Vision (ECCV), 2018. 480--496. Google Scholar

[30] Wang G S, Yuan Y F, Chen X, et al. Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of 2018 ACM Multimedia Conference on Multimedia Conference. New York: ACM, 2018. 274--282. Google Scholar

[31] Bai S, Bai X, Tian Q. Scalable person re-identification on supervised smoothed manifold. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 2530--2539. Google Scholar

[32] Yu R, Dou Z Y, Bai S, et al. Hard-aware point-to-set deep metric for person re-identification. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018. 188--204. Google Scholar

[33] Zheng Z D, Zheng L, Yang Y. A discriminatively learned CNN embedding for person re-identification. ACM Trans Multim Comput Commun Appl, 2017, 14: 13. Google Scholar

[34] Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification. 2017,. arXiv Google Scholar

[35] Zhong Z, Zheng L, Cao D L, et al. Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1318--1327. Google Scholar

[36] Wu L, Hong R C, Wang Y, et al. Cross-entropy adversarial view adaptation for person re-identification. IEEE Trans Circ Syst Video Tech, 2019. doi: 10.1109/TCSVT.2019.2909549. Google Scholar

[37] Liu Z, Wang Y, Li A. Hierarchical Integration of Rich Features for Video-based Person Re-identification. IEEE Trans Circuits Syst Video Technol, 2018, : 1-1 CrossRef Google Scholar

[38] Zhu Z, Huang T T, Shi B G, et al. Progressive pose attention transfer for person image generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2347--2356. Google Scholar

[39] Hou R B, Ma B P, Chang H, et al. VRSTC: occlusion-free video person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 7183--7192. Google Scholar

[40] Chen W H, Chen X T, Zhang J G, et al. A multi-task deep network for person re-identification. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017. Google Scholar

[41] Zheng Z D, Zheng L, Yang Y. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 3754--3762. Google Scholar

[42] Bui T, Hernández-Lobato D, Hernandez-Lobato J, et al. Deep Gaussian processes for regression using approximate expectation propagation. In: Proceedings of International Conference on Machine Learning, 2016. 1472--1481. Google Scholar

[43] Gal Y, Ghahramani Z. Bayesian convolutional neural networks with Bernoulli approximate variational inference. 2015,. arXiv Google Scholar

[44] Kwon J, Lee K M. Adaptive visual tracking with minimum uncertainty gap estimation. IEEE Trans Pattern Anal Mach Intell, 2016, 39: 18--31. Google Scholar

[45] Shen F, Yang Y, Zhou X. Face identification with second-order pooling in single-layer networks. Neurocomputing, 2016, 187: 11-18 CrossRef Google Scholar

[46] Li Z, Tang J. Weakly Supervised Deep Matrix Factorization for Social Image Understanding. IEEE Trans Image Process, 2017, 26: 276-288 CrossRef PubMed ADS Google Scholar

[47] Xu Y, Fang X, Li X, et al. Data uncertainty in face recognition. IEEE Trans Cybern, 2014, 44: 1950--1961. Google Scholar

[48] Blundell C, Cornebise J, Kavukcuoglu K, et al. Weight uncertainty in neural networks. 2015,. arXiv Google Scholar

[49] Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of International Conference on Machine Learning, 2016. 1050--1059. Google Scholar

[50] Minka T P. A family of algorithms for approximate Bayesian inference. Cambridge: Massachusetts Institute of Technology, 2001. Google Scholar

[51] Gray D, Brennan S, Tao H. Evaluating appearance models for recognition, reacquisition, and tracking. In: Proceedings of IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), 2007. 3: 1--7. Google Scholar

[52] Ren S, He K, Girshick R. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137-1149 CrossRef PubMed Google Scholar

[53] Bolle R M, Connell J H, Pankanti S, et al. The relation between the ROC curve and the CMC. In: Proceedings of the 4th IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05), 2005. 15--20. Google Scholar

[54] Cormack G V, Lynam T R. Statistical precision of information retrieval evaluation. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006. 533--540. Google Scholar

[55] Ketkar N. Introduction to pytorch. In: Deep Learning With Python. Berkeley: Apress, 2017. 195--208. Google Scholar

• Figure 1

(Color online) Typical examples of uncertainty in person representation.

• Figure 2

(Color online) Overall framework when testing.

• Figure 3

(Color online) Example images from the CUHK01 dataset.

• Figure 4

(Color online) Example images from the VIPeR dataset.

• Figure 5

(Color online) CMC curves on CUHK01 dataset.

• Figure 6

(Color online) CMC curves on VIPeR dataset.

• Figure 7

(Color online) Robustness of the proposed model to the number of samples on large-scale datasets of (a) Market1501, (b) CUHK03, (c) DukeMTMC, and (d) MSMT17.

•

Algorithm 1 Uncertainty-optimized testing process

Input: Probe and Gallery: $P,G_{i}$, number to repetitions $N$, trade off parameter $\lambda$;

Output: Ranking list $L^{*}\left(P,~G_{i}\right)$;

$t=0$;

while $t<N$ do

$t\Leftarrow~t+1$;

for all input images $P,G_{i}$

Compute feature embedding $x_{i}$ by forward propagation (multilevel dropout);

Compute $d(P,G_{i}~)$ by Euclidean distance of $x_{p}$, $x_{G_{i}}$;

$d^{*}\left(P,~G_{i}\right)+=d\left(P,~G_{i}\right)$;

if $d\left(P,~G_{i}\right)<d_{\rm~min}\left(P,~G_{i}\right)$ then

$d_{\rm~min}\left(P,~G_{i}\right)=d\left(P,~G_{i}\right)$;

end if

end for

$d^{*}\left(P,~G_{i}\right)=\lambda/n\times~d^{*}\left(P,~G_{i}\right)+\left(1-\lambda\right)\times~d_{\rm~min}\left(P,~G_{i}\right)$;

end while

$L^{*}\left(P,~G_{i}\right)=\operatorname{sort}~\left(d^{*}\left(P,~G_{i}\right)\right)$ for each $P$.

• Table 1   Structure of our backbone network
 Name Patch size/stride Output size #1$\times$1 #3$\times$3 reduce #3$\times$3 #5$\times$5 reduce #5$\times$5 pool + proj Dropout ratio Input – 3$\times$224$\times$224 – – – – – – – Conv1/Relu 7$\times$7/2 64$\times$112$\times$112 – – – – – – 0.1 Pool1 3$\times$3/2 64$\times$56$\times$56 – – – – – Max – Conv2/Relu 3$\times$3/1 192$\times$56$\times$56 – – – – – – 0.1 Pool2 3$\times$3/2 192$\times$28$\times$28 – – – – – Max – Inception 3a – 256$\times$28$\times$28 64 96 128 16 32 Max+32 0.1 Inception 3b – 480$\times$28$\times$28 128 128 192 16 32 Max+64 0.1 Pool3 3$\times$3/2 480$\times$14$\times$14 – – – – – Max – Inception 4a – 512$\times$14$\times$14 192 96 208 16 48 Max+64 0.2 Inception 4b – 512$\times$14$\times$14 160 112 224 24 64 Max+64 0.2 Inception 4c – 512$\times$14$\times$14 128 128 256 24 64 Max+64 0.2 Inception 4d – 512$\times$14$\times$14 112 144 288 32 48 Max+64 0.2 Inception 4e – 512$\times$14$\times$14 256 160 320 32 128 Max+64 0.3 Pool4 3$\times$3/2 832$\times$7$\times$7 – – – – – Max – Inception 5a – 832$\times$7$\times$7 256 160 320 32 128 Max+128 0.3 Inception 5b – 1024$\times$7$\times$7 384 192 384 48 128 Max+128 0.3 Pool5 7$\times$7/1 1024$\times$1$\times$1 – – – – – Average – fc – 1024 – – – – – – 0.3
• Table 2   Performance comparison on CUHK01 dataset
 Method Rank-1 (%) Rank-5 (%) Rank-10 (%) Rank-15 (%) Rank-20 (%) KISSME[12] 52.6 75.2 82.5 84.5 88.0 XQDA[7] 55.8 78.6 85.7 90.5 93.1 MGN[30] 44.7 56.5 63.2 77.7 82.6 PCB[29] 49.8 58.4 67.9 80.8 84.4 Part-net[16] 55.1 77.7 84.6 89.8 91.1 GLAD[23] 58.9 80.9 86.9 92.4 93.8 Resnet50 (Baseline) 55.2 78.1 85.6 91.5 95.3 Resnet50+Re-ranking[35] 60.0 80.8 86.7 92.2 97.3 Ours 55.2 89.4 94.2 96.3 99.5
• Table 3   Performance comparison on VIPeR dataset
 Method Rank-1 (%) Rank-5 (%) Rank-10 (%) Rank-15 (%) Rank-20 (%) KISSME[12] 32.3 64.9 77.9 83.8 85.2 XQDA[7] 39.0 69.3 81.3 85.1 88.9 MGN[30] 26.7 53.8 68.5 72.1 75.3 PCB[29] 30.4 59.2 72.5 77.9 81.2 Pose[15] 35.4 67.9 81.0 86.2 89.5 GLAD[23] 39.5 70.2 82.4 87.7 91.4 Resnet50 (Baseline) 29.4 55.7 70.9 76.3 79.3 Resnet50+Re-ranking[35] 36.8 61.4 78.5 83.2 90.0 Ours 53.3 72.3 85.2 89.4 92.1
• Table 4   Performance on large-scale datasets
 Dataset Rank-1 (%) Rank-5 (%) Rank-10 (%) Rank-20 (%) mAP (%) Market1501 (Ours) 86.2 94.6 97.1 83.8 67.8 Market1501 (Baseline) 82.3 89.9 95.4 97.9 60.3 CUHK03 (Ours) 80.4 92.5 94.3 97.0 59.6 CUHK03 (Baseline) 78.3 93.3 97.0 98.7 61.1 DukeMTMC (Ours) 76.9 84.5 87.5 90.2 62.0 DukeMTMC (Baseline) 73.5 78.6 81.8 85.1 57.4 MSMT17 (Ours) 68.4 78.8 82.6 88.4 40.2 MSMT17 (Baseline) 68.3 81.4 85.9 92.5 45.6
• #### 0

Citations

• Altmetric

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有