logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 2 : 120113(2020) https://doi.org/10.1007/s11432-019-2723-1

MDSSD: multi-scale deconvolutional single shot detector for small objects

More info
  • ReceivedAug 28, 2019
  • AcceptedNov 20, 2019
  • PublishedJan 13, 2020

Abstract

There is no abstract available for this article.


Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant Nos. 61822701, 61672469, 61772474, 61802351, 61872324).


Supplement

Appendixes A–C.


References

[1] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, 2012. 1097--1105. Google Scholar

[2] Ren S Q, He K M, Girshick R B, et al. Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, Montreal, 2015. 91--99. Google Scholar

[3] Liu W, Anguelov D, Erhan D, et al. Ssd: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 21--37. Google Scholar

[4] Lin T Y, Dollár P, Girshick R B, et al. Feature pyramid networks for object detection. In: Proceedings of Computer Vision and Pattern Recognition, Honolulu, 2017. 936--944. Google Scholar

[5] Zhao J P, Guo W W, Zhang Z H. A coupled convolutional neural network for small and densely clustered ship detection in SAR images. Sci China Inf Sci, 2019, 62: 042301 CrossRef Google Scholar

[6] Zhu Z, Liang D, Zhang S H, et al. Traffic-sign detection and classification in the wild. In: Proceedings of Computer Vision and Pattern Recognition, Las Vegas, 2016. 2110--2118. Google Scholar

[7] Everingham M, van Gool L, Williams C K I. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis, 2010, 88: 303-338 CrossRef Google Scholar

[8] Lin T Y, Maire M, Belongie S J, et al. Microsoft coco: Common objects in context. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 740--755. Google Scholar

  • Figure 1

    (Color online) The architecture of MDSSD. First, we apply deconvolution layers to the high-level semantic feature maps at different scales (i.e., conv8_2, conv9_2, and conv$10\_2$) simultaneously. Then we build skip connections with lower-layers (conv3_3, conv4_3, and conv7) through Fusion Block and form 3 new fusion layers (Module 1, Module 2, and Module 3). Predictions are made on both new fusion layers (Module 1, Module 2, and Module 3) and original SSD layers (conv8_2, conv9_2, conv$10\_2$, and conv11_2) at the same time.

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号