logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 2 : 120113(2020) https://doi.org/10.1007/s11432-019-2723-1

MDSSD: multi-scale deconvolutional single shot detector for small objects

More info
  • ReceivedAug 28, 2019
  • AcceptedNov 20, 2019
  • PublishedJan 13, 2020

Abstract

There is no abstract available for this article.


Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant Nos. 61822701, 61672469, 61772474, 61802351, 61872324).


Supplement

Appendixes A–C.


References

[1] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, 2012. 1097--1105. Google Scholar

[2] Ren S Q, He K M, Girshick R B, et al. Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, Montreal, 2015. 91--99. Google Scholar

[3] Liu W, Anguelov D, Erhan D, et al. Ssd: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 21--37. Google Scholar

[4] Lin T Y, Dollár P, Girshick R B, et al. Feature pyramid networks for object detection. In: Proceedings of Computer Vision and Pattern Recognition, Honolulu, 2017. 936--944. Google Scholar

[5] Zhao J P, Guo W W, Zhang Z H. A coupled convolutional neural network for small and densely clustered ship detection in SAR images. Sci China Inf Sci, 2019, 62: 042301 CrossRef Google Scholar

[6] Zhu Z, Liang D, Zhang S H, et al. Traffic-sign detection and classification in the wild. In: Proceedings of Computer Vision and Pattern Recognition, Las Vegas, 2016. 2110--2118. Google Scholar

[7] Everingham M, van Gool L, Williams C K I. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis, 2010, 88: 303-338 CrossRef Google Scholar

[8] Lin T Y, Maire M, Belongie S J, et al. Microsoft coco: Common objects in context. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 740--755. Google Scholar

  • Figure 1

    (Color online) The architecture of MDSSD. First, we apply deconvolution layers to the high-level semantic feature maps at different scales (i.e., conv8_2, conv9_2, and conv$10\_2$) simultaneously. Then we build skip connections with lower-layers (conv3_3, conv4_3, and conv7) through Fusion Block and form 3 new fusion layers (Module 1, Module 2, and Module 3). Predictions are made on both new fusion layers (Module 1, Module 2, and Module 3) and original SSD layers (conv8_2, conv9_2, conv$10\_2$, and conv11_2) at the same time.

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备17057255号       京公网安备11010102003388号