logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 7 : 1110-1120(2020) https://doi.org/10.1360/SSI-2020-0046

Mask-wearing recognition in the wild

More info
  • ReceivedMar 6, 2020
  • AcceptedApr 16, 2020
  • PublishedJun 23, 2020

Abstract

For public health and safety, wearing of masks is one of the most significant means to prevent infections. Additionally, masks protect employees of heavy industry from certain diseases during manufacture. To meet the demand of automatic mask-wearing recognition in scenes of life, we propose a recognition algorithm based on face detection and face attribute recognition. The face detection model not only adopted a fused feature pyramid and a spatial and channel attention mechanism but also a segmentation branch for weak supervision learning. Then for the detected face, we used classification for fast recognition. Moreover, we employed nearly 200000 images, attention mechanisms, data augmentation, and other techniques to enhance the robustness. Besides, this technology has been widely used in Didi Chuxing's inspection systems and achieves 99.50% accuracy. Importantly, both the service and key algorithms have been opened to the public to maximize their social and application value.


References

[1] Zou Z X, Shi Z W, Guo Y H, et al. Object detection in 20 years: a survey. 2019,. arXiv Google Scholar

[2] Ren S, He K, Girshick R. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137-1149 CrossRef Google Scholar

[3] Dai J F, Li Y, He K M, et al. R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of Conference on Advances in Nerual Information Processing Systems, Barcelona, 2016. 379--387. Google Scholar

[4] Lin T, Dollar P, Girshick R B, et al. Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 936--944. Google Scholar

[5] Redmon J, Farhadi A. YOLOv3: an incremental improvement. 2018,. arXiv Google Scholar

[6] Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 21--37. Google Scholar

[7] Lin T Y, Goyal P, Girshick R. Focal Loss for Dense Object Detection. IEEE Trans Pattern Anal Mach Intell, 2020, 42: 318-327 CrossRef Google Scholar

[8] Zhang K, Zhang Z, Li Z. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Process Lett, 2016, 23: 1499-1503 CrossRef ADS arXiv Google Scholar

[9] Wang H, Li Z F, Ji X, et al. Face R-CNN. 2017,. arXiv Google Scholar

[10] Najibi M, Samangouei P, Chellappa R, et al. SSH: single stage headless face detector. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), Venice, 2017. 4885--4894. Google Scholar

[11] Wang J F, Yuan Y, Yu G. Face attention network: an effective face detector for occluded faces. 2017,. arXiv Google Scholar

[12] Tang X, Du D K, He Z, et al. Pyramidbox: a context-assisted single shot face detector. In: Proceedings of European Conference on Computer Vision (ECCV), Munich, 2018. 797--813. Google Scholar

[13] Pang Y W, Xie J, Khan M H, et al. Mask-guided attention network for occluded pedestrian detection. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 2019. 4966--4974. Google Scholar

[14] Xie J, Pang Y W, Cholakkal H, et al. PSC-Net: learning part spatial co-occurence for occluded pedestrian detection. 2020,. arXiv Google Scholar

[15] Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. In: Proceedings of International Conference on Neural Information Processing System, Lake Tahoe, 2012. 1097--1105. Google Scholar

[16] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego, 2015. Google Scholar

[17] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015. Google Scholar

[18] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 770--778. Google Scholar

[19] Tian W X, Wang Z X, Shen H F, et al. Learning better features for face detection with feature fusion and segmentation supervision. 2018,. arXiv Google Scholar

[20] Qian Y, Ding X, Liu T. Identification method of user's travel consumption intention in chatting robot. Sci Sin-Inf, 2017, 47: 997-1007 CrossRef Google Scholar

  • Figure 1

    (Color online) Block diagram of face mask recognition

  • Figure 2

    (Color online) Architecture of DFS face detection algorithm

  • Figure 3

    (Color online) Structure diagram of mask recognition model based on Resnet50

  • Table 1   Comparison of DFS and other algorithms on WIDER FACE validation set
    Algorithms Easy Medium Hard
    MTCNN 84.8 82.5 59.8
    Face R-CNN 93.7 92.1 83.1
    SSH 93.1 92.1 84.5
    FAN 95.3 94.2 88.8
    PyramidBox 96.1 95.0 88.9
    DFS 96.9 95.9 91.2
  • Table 2   Experimental data sources and quantity distributions
    Data source Mask NoMask Total (thousand)
    Collected online 5 30 35
    In-car 40 65 105
    Phone 40 10 50
    Recorded 2 3 5
    Total 87 108 195
  • Table 3   Face mask recognition results on mobile phone image
    Test set Target Result
    NoMask Mask
    Designated drive (100 Mask + 105 NoMask) Precision (%) 100.00 98.99
    Recall (%) 99.06 100.00
    Accuracy (%) 99.51
    Designated drive (3247 Mask + 1111 NoMask) Precision (%) 93.33 98.63
    Recall (%) 96.03 97.66
    Accuracy (%) 97.24
    Car hailing driver (1455 Mask + 25 NoMask) Precision (%) 96.15 100.00
    Recall (%) 100.00 99.73
    Accuracy (%) 99.73
  • Table 4   Face mask recognition results based on vehicle monitoring image
    Test set Target Result
    NoMask Mask
    Test 2k (1k Mask + 1k NoMask) Precision (%) 99.10 100.00
    Recall (%) 100.00 99.11
    Accuracy (%) 99.55
    Test 1.5k (little Mask) Precision (%) 99.71 89.83
    Recall (%) 98.86 97.25
    Accuracy (%) 98.71
    Night bad case (263 Mask) Precision (%) NAN 100.00
    Recall (%) NAN 98.46
    Accuracy (%) 98.46
    Day 1.1k (117 Mask + 1037 NoMask) Precision (%) 96.72 70.34
    Recall (%) 96.62 70.94
    Accuracy (%) 94.02
    Night 3.6k (419 Mask + 3246 NoMask) Precision (%) 96.27 76.15
    Recall (%) 97.13 70.88
    Accuracy (%) 94.13
  • Table 5   Comparative experiment results on mobile phone image
    Test set Target Data collected online In-car data Attention mechanism
    NoMask Mask NoMask Mask NoMask Mask
    Designated drive (100 Mask + 105 NoMask) Precision (%) 91.70 100.00 100.00 98.99 100.00 98.99
    Recall (%) 73.30 65.00 99.06 100.00 99.06 100.00
    Accuracy (%) 96.60 99.51 99.51
    Designated drive (3247 Mask + 1111 NoMask) Precision (%) 77.49 99.46 91.26 99.02 93.33 98.63
    Recall (%) 98.56 90.21 97.20 96.83 96.03 97.66
    Accuracy (%) 92.34 96.92 97.24
  • Table 6   Comparative experiment results on vehicle monitoring image
    Test set Target Data collected online In-car data Attention mechanism
    NoMask Mask NoMask Mask NoMask Mask
    Test 2k (1k Mask + 1k NoMask) Precision (%) 98.31 100.00 99.10 100.00 99.10 100.00
    Recall (%) 100.00 98.31 100.00 99.11 100.00 99.11
    Accuracy (%) 99.15 99.55 99.55
    Test 1.5k (little Mask) Precision (%) 99.14 84.03 99.62 88.98 99.71 89.83
    Recall (%) 98.20 91.74 98.77 96.33 98.86 97.25
    Accuracy (%) 97.59 98.54 98.71
    Night bad case (263 Mask) Precision (%) NAN 100.00 NAN 100.00 NAN 100.00
    Recall (%) NAN 61.24 NAN 98.06 NAN 98.46
    Accuracy (%) 61.24 98.06 98.46
  • Table 7   Comparative experiment results on difficult vehicle monitoring samples
    Test set Target In-car data Attention mechanism
    NoMask Mask NoMask Mask
    Day 1.1k (117 Mask + 1037 NoMask) Precision (%) 96.76 62.69 96.72 70.34
    Recall (%) 95.18 71.79 96.62 70.94
    Accuracy (%) 92.18 94.02
    Night 3.6k (419 Mask + 3246 NoMask) Precision (%) 96.63 63.93 96.27 76.15
    Recall (%) 94.58 74.46 97.13 70.88
    Accuracy (%) 92.28 94.13

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号