logo

SCIENCE CHINA Information Sciences, Volume 62, Issue 4: 042301(2019) https://doi.org/10.1007/s11432-017-9405-6

A coupled convolutional neural network for small and densely clustered ship detection in SAR images

More info
  • ReceivedSep 22, 2017
  • AcceptedApr 2, 2018
  • PublishedSep 19, 2018

Abstract

Ship detection from synthetic aperture radar (SAR) imagery plays a significant role in global marine surveillance. However, a desirable performance is rarely achieved when detecting small and densely clustered ship targets, and this problem is difficult to solve. Recently, convolutional neural networks (CNNs) have shown strong detection power in computer vision and are flexible in complex background conditions, whereas traditional methods have limited ability. However, CNNs struggle to detect small targets and densely clustered ones that exist widely in many SAR images. To address this problem while preserving the good properties for complex background conditions, we develop a coupled CNN for small and densely clustered SAR ship detection. The proposed method mainly consists of two subnetworks: an exhaustive ship proposal network (ESPN) for ship-like region generation from multiple layers with multiple receptive fields, and an accurate ship discrimination network (ASDN) for false alarm elimination by referring to the context information of each proposal generated by ESPN. The motivation in ESPN is to generate as many ship proposals as possible, and in ASDN, the goal is to obtain the final results accurately. Experiments are evaluated on two data sets. One is collected from 60 wide-swath Sentinel-1 images and the other is from 20 GaoFen-3 (GF-3) images. Both data sets contain many ships that are small and densely clustered. The quantitative comparison results illustrate the clear improvements of the new method in terms of average precision (AP) and $F$1 score by 0.4028 and 0.3045 for the Sentinel-1 data set compared with the multi-step constant false alarm rate (CFAR-MS) method. The values are verified as 0.2033 and 0.1522 for the GF-3 data set. In addition, the new method is demonstrated to be more efficient than CFAR-MS.


Acknowledgment

This work was partially supported by National Natural Science Foundation of China (Grant No. 61331015) and China Postdoctoral Science Foundation (Grant No. 2015M581618). The authors are grateful to thank Prof. T. K. Truong for his helpful comments and suggestions that significantly improved this manuscript.


References

[1] Wang S G, Wang M, Yang S Y. New hierarchical saliency filtering for fast ship detection in high-resolution SAR images. IEEE Trans Geosci Remote Sens, 2017, 55: 351-362 CrossRef ADS Google Scholar

[2] Gao G, Shi G T. CFAR ship detection in nonhomogeneous sea clutter using polarimetric SAR data based on the notch filter. IEEE Trans Geosci Remote Sens, 2017, 55: 4811-4824 CrossRef ADS Google Scholar

[3] Zeng T, Zhang T, Tian W M. A novel subsidence monitoring technique based on space-surface bistatic differential interferometry using GNSS as transmitters. Sci China Inf Sci, 2015, 58: 062304 CrossRef Google Scholar

[4] Ma L, Chen L, Zhang X J. A waterborne salient ship detection method on SAR imagery. Sci China Inf Sci, 2015, 58: 089301 CrossRef Google Scholar

[5] Crisp D. The state-of-the-art in ship detection in synthetic aperture radar imagery. Org Lett, 2004, 35: 2165--2168. Google Scholar

[6] Wackerman C C, Friedman K S, Pichel W G. Automatic detection of ships in RADARSAT-1 SAR imagery. Canadian J Remote Sens, 2001, 27: 568-577 CrossRef Google Scholar

[7] Ferrara M N, Torre A. Automatic moving targets detection using a rule-based system: comparison between different study cases. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium Proceedings, Seattle, 1998. 1593--1595. Google Scholar

[8] Wang C L, Bi F K, Zhang W P. An intensity-space domain CFAR method for ship detection in HR SAR images. IEEE Geosci Remote Sens Lett, 2017, 14: 529-533 CrossRef ADS Google Scholar

[9] Bi H, Zhang B, Zhu X X. $L_{1}$-regularization-based SAR imaging and CFAR detection via complex approximated message passing. IEEE Trans Geosci Remote Sens, 2017, 55: 3426-3440 CrossRef ADS Google Scholar

[10] Iervolino P, Guida R, Whittaker P. A novel ship-detection technique for Sentinel-1 SAR data. In: Proceedings of the 5th Asia-Pacific Conference on Synthetic Aperture Radar, Singapore, 2015. 797--801. Google Scholar

[11] Feng J, Ma L, Bi F K. A coarse-to-fine image registration method based on visual attention model. Sci China Inf Sci, 2014, 57: 122302 CrossRef Google Scholar

[12] Wu X M, Du M N, Chen W H. Salient object detection via region contrast and graph regularization. Sci China Inf Sci, 2016, 59: 032104 CrossRef Google Scholar

[13] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, 2012. 1097--1105. Google Scholar

[14] Ren S, He K, Girshick R. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intel, 2017, 39: 1137-1149 CrossRef PubMed Google Scholar

[15] Dai J F, Li Y, He K M, et al. R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, 2016. 379--387. Google Scholar

[16] Bell S, Zitnick C L, Bala K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 2874--2883. Google Scholar

[17] Li X, Zhao L M, Wei L N. DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans Image Process, 2016, 25: 3919-3930 CrossRef PubMed ADS arXiv Google Scholar

[18] Girshick R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1440--1448. Google Scholar

[19] Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 21--37. Google Scholar

[20] Cai Z W, Fan Q F, Feris R S, et al. A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 354--370. Google Scholar

[21] Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 936--944. Google Scholar

[22] Xiang Y, Choi W, Lin Y Q, et al. Subcategory-aware convolutional neural networks for object proposals and detection. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, 2017. 924--933. Google Scholar

[23] Zhai L, Li Y, Su Y. Inshore ship detection via saliency and context information in high-resolution SAR images. IEEE Geosci Remote Sens Lett, 2016, 13: 1870-1874 CrossRef ADS Google Scholar

[24] Zhu J W, Qiu X L, Pan Z X. An improved shape contexts based ship classification in SAR images. Remote Sens, 2017, 9: 145 CrossRef ADS Google Scholar

[25] Schmidhuber J. Deep learning in neural networks: an overview.. Neur Netw, 2015, 61: 85-117 CrossRef PubMed Google Scholar

[26] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar

[27] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 580--587. Google Scholar

[28] Bottou L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of the 19th International Conference on Computational Statistics, Paris, 2010. 177--186. Google Scholar

[29] Neubeck A, van Gool L. Efficient non-maximum suppression. In: Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, 2006. Google Scholar

[30] Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, 2014. 675--678. Google Scholar

[31] Zuhlke M, Fomferra N, Brockmann C, et al. SNAP (sentinel application platform) and the ESA Sentinel-3 Toolbox. In: Proceedings of Sentinel-3 for Science Workshop, Venice, 2015. Google Scholar

[32] Pan Z X, Liu L, Qiu X L. Fast vessel detection in Gaofen-3 SAR images with ultrafine strip-map mode. Sensors, 2017, 17: 1578 CrossRef PubMed Google Scholar

[33] Philbin J, Chum O, Isard M, et al. Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, 2007. Google Scholar

[34] Flach P, Kull M. Precision-recall-gain curves: PR analysis done right. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, 2015. 838--846. Google Scholar

[35] Qin X X, Zhou S L, Zou H X. A CFAR detection algorithm for generalized gamma distributed background in high-resolution SAR images. IEEE Geosci Remote Sens Lett, 2013, 10: 806-810 CrossRef ADS Google Scholar

  • Figure 1

    Overview of proposed method, mainly including an ESPN and an ASDN, both of which share convolutional layers for feature learning. In ASDN, “RoI" represents the regions of interest and “FC" layer indicates the fully connected layer.

  • Figure 2

    Ship proposal generation strategy of (a) RPN and (b) ESPN.

  • Figure 3

    Ship proposals and their contextual regions are both bypassed by the RoI pooling layer. The upper row and the lower row show an example of a ship proposal and its context region, respectively. Both of them are pooled by the RoI pooling operation. Then, the feature maps are concatenated into one fused feature map to improve their representation capability.

  • Figure 4

    Illustration of image cropping strategy. The white grids are blocks without overlap. The red rectangles represent divided blocks with $50$ pixel overlap. In both cases, blocks without ship targets are discarded and the remaining ones are used for training and testing.

  • Figure 5

    Performance curves over eight Sentinel-1 images. (a) Recall vs. IoU curve for each method; (b) precision vs. recall curve for each method.

  • Figure 6

    Ship detection results in offshore area of Sentinel-1 images. (a-1)–(a-4) exhibit the visualization results by using the proposed $\text{Coupled-CNN}\_\text{E}\_\text{A}$ method. (b-1)–(b-4) show the detection results of the CFAR-MS method. The green boxes indicate the correctly detected targets, the red ones indicate false alarms, and the blue ones represent the ground-truth.

  • Figure 7

    Ship detection results with (a) $\text{Coupled-CNN}\_\text{E}\_\text{A}$ and (b) CFAR-MS for an image block cropped from the wide-swath Sentinel-1 SAR imagery over the Strait of Malacca, Singapore. In both subfigures, two areas (highlighted by the yellow boxes) are enlarged to exhibit a clear visual effect. The green box indicates the correctly detected targets, the red indicates false alarms, and the blue represents the ground-truth.

  • Figure 8

    Performance curves over four GF-3 images. (a) Recall vs. IoU curve for each method; (b) precision vs. recall curve for each method.

  • Figure 9

    Ship detection results with (a) $\text{Coupled-CNN}\_\text{E}\_\text{A}$ and (b) CFAR-MS for an image block cropped from the GF-3 SAR imagery. In both subfigures, two areas (highlighted by the yellow boxes) are enlarged for a clear visual effect. The green boxes indicate the correctly detected targets, the red ones indicate false alarms, and the blue ones represents the ground-truth.

  • Figure 10

    Ship detection results in offshore areas of GF-3 image. (a-1)–(a-4) exhibit the visualization results by using the proposed $\text{Coupled-CNN}\_\text{E}\_\text{A}$ method. (b-1)–(b-4) shows the detection result of the CFAR-MS method. The green boxes indicate the correctly detected targets, the red ones indicate false alarms, and the blue ones represent the ground-truth.

  • Table 1   Parameter configurations for three proposal branches in ESPN
    Layer name $\text{Conv}4\_3$$\text{Conv}5\_3$$\text{Conv}6\_1$
    Filter size (pixel) $3\times3$ $5\times5$$7\times7$ $3\times3$ $5\times5$ $7\times7$ $3\times3$ $5\times5$ $7\times7$
    Anchor height (pixel) $10$ $16$ $22$ $28$ $34$ $40$ $46$ $52$ $58$
    Height-to-width ratio 1:2,1:1,2:1 1:2,1:1,2:1 1:2,1:1,2:1 1:2,1:1,2:1 1:2,1:1,2:1 1:2,1:1,2:1 1:2,1:1,2:1 1:2,1:1,2:1 1:2,1:1,2:1
  • Table 2   Detailed structure, number of parameters, and MAC for each layer when using $\text{Coupled-CNN}\_\text{E}\_\text{A}$ method with $1024\times768$ input
    Part Name Type Stride Output #Params. MAC
    Shared CNN layersConv1_1$3\times3$ convolution1$1024\times768\times64$ $1.9$k $1359.0$M
    Conv1_2$3\times3$ convolution1$1024\times768\times64$ $41.0$k $28991.0$M
    Pool1 $2\times2$ max pooling2$512\times384\times64$
    Conv2_1$3\times3$ convolution1$512\times384\times128$ $81.9$k $14495.5$M
    Conv2_2$3\times3$ convolution1$512\times384\times128$ $163.8$k $28991.0$M
    Pool2 $2\times2$ max pooling2$256\times192\times128$
    Conv3_1$3\times3$ convolution1$256\times192\times256$ $327.7$k$14495.5$M
    Conv3_2$3\times3$ convolution1$256\times192\times256$ $655.4$k$28991.0$M
    Conv3_3$3\times3$ convolution1$256\times192\times256$ $655.4$k$28991.0$M
    Pool3 $2\times2$ max pooling2$128\times96\times256$
    Conv4_1$3\times3$ convolution1$128\times96\times512$ $1310.7$k $14495.5$M
    Conv4_2$3\times3$ convolution1$128\times96\times512$ $2621.4$k $28991.0$M
    Conv4_3$3\times3$ convolution1$128\times96\times512$ $2621.4$k $28991.0$M
    Pool4 $2\times2$ max pooling2$64\times48\times512$
    Conv5_1$3\times3$ convolution1$64\times48\times512$$2621.4$k $7247.8$M
    Conv5_2$3\times3$ convolution1$64\times48\times512$$2621.4$k $7247.8$M
    Conv5_3$3\times3$ convolution1$64\times48\times512$$2621.4$k $7247.8$M
    Pool5 $2\times2$ max pooling 2 $32\times24\times512$
    Conv6_1 $3\times3$ convolution 1 $32\times24\times512$ $2621.4$k $18119$M
    ESPNSPN4_3 $3\times3$ convolution 1 $128\times96\times6$ $30.7$k $339.7$M
    SPN4_5 $5\times5$ convolution 1 $128\times96\times6$ $79.9$k $943.7$M
    SPN4_7 $7\times7$ convolution 1 $128\times96\times6$ $153.6$k $1849.7$M
    SPN5_3 $3\times3$ convolution 1 $64\times48\times6$ $30.7$k $84.9$M
    SPN5_5 $5\times5$ convolution 1 $64\times48\times6$ $79.9$k $235.9$M
    SPN5_7 $7\times7$ convolution 1 $64\times48\times6$ $153.6$k $462.4$M
    SPN6_3 $3\times3$ convolution 1 $32\times24\times6$ $30.7$k $21.2$M
    SPN6_5 $5\times5$ convolution 1 $32\times24\times6$ $79.9$k $59.0$M
    SPN6_7 $7\times7$ convolution 1 $32\times24\times6$ $153.6$k $115.6$M
    ASDNRoIPooling1$7\times7$ RoI pooling$7\times7\times512$
    RoIPooling2$7\times7$ RoI pooling$7\times7\times512$
    RoI_concat $3\times3$ convolution$5\times5\times512$ $5242.9$k $118.0$M
    FC FC 4096 $52428.8$k $52.4$M
    FC_cls FC 2 $8.2$k $32.8$K
    FC_bbr FC 8 $32.8$k $32.8$K
    Total$\textbf{75.66M}$ $\textbf{256.77B}$
  • Table 3   Comparison result of the number of parameters and the MAC for each part when using $\text{Coupled-CNN}\_\text{E}\_\text{A}$ method with $1024\times768$ input
    #Params.MAC
    Shared CNN layers ESPN ASDN Total Shared CNN layers ESPN ASDN Total
    $\text{Coupled-CNN}\_\text{E}\_\text{A}$ $\underline{18966.2{\rm~k}}$ $792.6$k $57712.7$k $\textbf{75.66M}$ $\underline{258653.9{\rm~M}}$ $4112.1$M $170.4$M $\textbf{256.77B}$
  • Table 4   Information of Sentinel-1 imagery used in this study
    Satellite Imaging mode Band Polarization Product type Resolution Pixel spacing Average size per image
    (rg$\times$az) ($\mathrm{m}$)(rg$\times$az) ($\mathrm{m}$)(rg$\times$az) ($\mathrm{pixel}$)
    Sentinel-1 IW C VH GRD $20\times22$ $10~\times~10$ $25000\times18000$
  • Table 5   Information of GF-3 imagery used in this study
    Satellite Imaging mode Band Polarization Pixel spacing Average size per image
    (rg$\times$az) ($\mathrm{m}$)(rg$\times$az) ($\mathrm{pixel}$)
    GF-3 NSC C VH $20\times5$ $8800\times21000$
  • Table 6   Performance comparison of different methods for the Sentinel-1 data set$^{\rm~a)}$
    Methods
    Ground
    truth
    True
    positive
    False
    positive
    Recall Precision
    Average
    precision
    $F1$ score
    Average time (s)
    per image
    CFAR-MS $6814$ $2710$ $751$$0.3977$ $0.7830$ $0.3123$ $0.5275$ $2550$
    FRCN $6814$ $4544$ $845$$0.6669$ $0.8432$ $0.5812$ $0.7447$ $105$
    $\text{Coupled-CNN\_E}$ $6814$ $4843$ $560$$\underline{0.7107}$ $\underline{0.8964}$ $\underline{0.6519}$ $\underline{0.7928}$ $113$
    $\text{Coupled-CNN\_A}$ $6814$ $4656$ $823$ $0.6833$ $0.8498$ $0.6069$ $0.7575$ $108$
    $\text{Coupled-CNN\_E}\_\text{A}$$6814$ $5260$ $570$$\textbf{0.7719}$ $\textbf{0.9022}$ $\textbf{0.7151}$ $\textbf{0.8320}$ $115$

    a) The bold numbers denote the optimal values in each column. The underlined numbers denote the suboptimal

  • Table 7   Performance comparison of different methods on GF-3 data set$^{\rm~a)}$
    Methods
    Ground
    truth
    True
    positive
    False
    positive
    Recall Precision
    Average
    precision
    $F1$ score
    Average time (s)
    per image
    $\text{CFAR-MS}$ $1757$ $981$ $316$ $0.5582$ $0.7562$ $0.4832$ $0.6423$ $1630$
    $\text{FRCN}$ $1757$ $1179$ $373$ $0.6710$ $0.7597$ $0.5772$ $0.7126$ $85$
    $\text{Coupled-CNN\_E}$ $1757$ $1210$ $364$ $\underline{0.7433}$ $\underline{0.7906}$ $\underline{0.6784}$ $\underline{0.7662}$ $86$
    $\text{Coupled-CNN\_A}$ $1757$ $1306$ $346$ $0.6887$ $0.7687$ $0.5997$ $0.7265$ $89$
    $\text{Coupled-CNN\_E}\_\text{A}$ $1757$ $1324$ $252$ $\textbf{0.7536}$ $\textbf{0.8401}$ $\textbf{0.6865}$ $\textbf{0.7945}$ $90$

    a) The bold numbers denote the optimal values in each column. The underlined numbers denote the suboptimal

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号