logo

SCIENCE CHINA Information Sciences, Volume 64 , Issue 2 : 122301(2021) https://doi.org/10.1007/s11432-020-3077-5

Reciprocal translation between SAR and optical remote sensing images with cascaded-residual adversarial networks

More info
  • ReceivedApr 30, 2020
  • AcceptedSep 30, 2020
  • PublishedJan 21, 2021

Abstract


Acknowledgment

This work was supported in part by National Key RD Program of China (Grant No. 2017YFB0502703) and Natural Science Foundation of China (Grant Nos. 61822107, 61571134).


Supplement

Appendixes A and B.


References

[1] Isola P, Zhu J, Zhou T, et al. Image-to-image translation with conditional adversarial networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 5967--5976. Google Scholar

[2] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 2242--2251. Google Scholar

[3] Jin K H, McCann M T, Froustey E. Deep Convolutional Neural Network for Inverse Problems in Imaging. IEEE Trans Image Process, 2017, 26: 4509-4522 CrossRef ADS arXiv Google Scholar

[4] Zhu J Y, Zhang R, Pathak D, et al. Toward multimodal image-to-image translation. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 465--476. Google Scholar

[5] Heusel M, Ramsauer H, Unterthiner T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 6626--6637. Google Scholar

[6] Younggi Byun , Jaewan Choi , Youkyung Han . An Area-Based Image Fusion Scheme for the Integration of SAR and Optical Satellite Imagery. IEEE J Sel Top Appl Earth Observations Remote Sens, 2013, 6: 2212-2220 CrossRef ADS Google Scholar

[7] Garzelli A. Wavelet-based fusion of optical and sar image data over urban area. International Archives of Photogrammetry Remote Sensing and Spatial Information Sciences, 2002, 34: 59--62. Google Scholar

[8] Fan J, Wu Y, Li M. SAR and Optical Image Registration Using Nonlinear Diffusion and Phase Congruency Structural Descriptor. IEEE Trans Geosci Remote Sens, 2018, 56: 5368-5379 CrossRef ADS Google Scholar

[9] Liu J, Gong M, Qin K. A Deep Convolutional Coupling Network for Change Detection Based on Heterogeneous Optical and Radar Images. IEEE Trans Neural Netw Learning Syst, 2018, 29: 545-559 CrossRef Google Scholar

[10] Merkle N, Auer S, Muller R. Exploring the Potential of Conditional Adversarial Networks for Optical and SAR Image Matching. IEEE J Sel Top Appl Earth Observations Remote Sens, 2018, 11: 1811-1820 CrossRef ADS Google Scholar

[11] He W, Yokoya N. Multi-Temporal Sentinel-1 and -2 Data Fusion for Optical Image Simulation. IJGI, 2018, 7: 389 CrossRef ADS arXiv Google Scholar

[12] Schmitt M, Hughes L H, Zhu X X. The sen1-2 dataset for deep learning in sar-optical data fusion. In: Proceedings of ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-1, 2018. 141--146. Google Scholar

[13] Wang P, Patel V M. Generating high quality visible images from sar images using cnns. In: Proceedings of 2018 IEEE Radar Conference (RadarConf18), 2018. 0570--0575. Google Scholar

[14] Enomoto K, Sakurada K, Wang W, et al. Image translation between sar and optical imagery with generative adversarial nets. In: Proceedings of IGARSS IEEE International Geoscience and Remote Sensing Symposium, 2018. 1752--1755. Google Scholar

[15] Wang L, Xu X, Yu Y. SAR-to-Optical Image Translation Using Supervised Cycle-Consistent Adversarial Networks. IEEE Access, 2019, 7: 129136 CrossRef Google Scholar

[16] Li Y, Fu R, Meng X. A SAR-to-Optical Image Translation Method Based on Conditional Generation Adversarial Network (cGAN). IEEE Access, 2020, 8: 60338-60343 CrossRef Google Scholar

[17] Fuentes Reyes M, Auer S, Merkle N. SAR-to-Optical Image Translation Based on Conditional Generative Adversarial Networks-Optimization, Opportunities and Limits. Remote Sens, 2019, 11: 2067 CrossRef ADS Google Scholar

[18] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-assisted Intervention. Berlin: Springer, 2015. 234--241. Google Scholar

[19] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672--2680. Google Scholar

[20] Arjovsky M, Chintala S, Bottou L. Wasserstein gan. 2017,. arXiv Google Scholar

[21] Cozzolino D, Parrilli S, Scarpa G. Fast Adaptive Nonlocal SAR Despeckling. IEEE Geosci Remote Sens Lett, 2014, 11: 524-528 CrossRef ADS Google Scholar

[22] Chen L C, Zhu Y, Papandreou G, et al. Encoderdecoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018. 801--818. Google Scholar

[23] Lee J S, Pottier E. Polarimetric Radar Imaging: From Basics to Applications. Boca Raton: CRC Press, 2009. Google Scholar

[24] Jin Y Q, Xu F. Polarimetric Scattering and SAR Information Retrieval. Hoboken: John Wiley Sons, 2013. Google Scholar

  • Figure 1

    (Color online) Schematic diagram of the translation network inspired by Pix2Pix network [1]during training. A pair of translators are trained together. Each translator consists of an encoder and a decoder. The two discriminators are trained separately. `SAR', `OPT', `Synthesized OPT' and `Synthesized SAR' respectively represent the true SAR image, the true optical image, the fake optical image and the fake SAR image. The two vertical lines connecting `SAR' and `Synthesized SAR' mean that the network should make them equal constrained by a L1 norm loss.

  • Figure 2

    (Color online) Translator network architecture with cascaded-residual connections. The input data size is $256\times256\times~n_i$ and the output data size is $256\times256\times~n_o$. The first two numbers represent the size of the feature maps and the third number represents the channel of the feature map. The symbols $n_i$ and $n_o$ denote the channel number of the input and output image respectively, set to 1 for SAR images and 3 for optical images. The symbol $n_g$ denotes the benchmark number of feature maps in the generator, that is, the number of feature maps of first layer. The concatenation from the encoder and the input to the decoder is signified by lines with arrows.

  • Figure 3

    (Color online) Discriminator network architecture. The input data size is $256\times256\times~n_i$ and the output probability map size is $32\times32\times1$. The first two numbers represent the size of the feature maps and the third number represents the channel of the feature map. The symbol $n_i$ denotes the channel number of the input image, and $n_d$ denotes the benchmark number of feature maps in the discriminator, set to 64 here.

  • Figure 4

    (Color online) Different qualities of translated optical images are induced by different losses. The first column lists (a) the input SAR images; the intermediate three columns are respectively (b) translated optical images with L1-only loss, protect łinebreak (c) translated optical images with GAN-only loss and (d) translated optical images with L1+GAN loss; the last column are (e) the corresponding optical ground truths.

  • Figure 5

    (Color online) Modified network scheme for unsupervised learning with CycleGAN loops [2].

  • Figure 6

    (Color online) Example translation images with UAVSAR (test samples). Images in each row from left to right are the real SAR image ((a1), (a2)) and its translated optical image ((b1), (b2)), the real optical image ((c1), (c2)) and its translated SAR image ((d1), (d2)). The first row is chosen from 6 m UAVSAR dataset, and the latter is from 10 m UAVSAR dataset.

  • Figure 7

    (Color online) Example translation images with GF-3 data. Images in each row from left to right are the real SAR image ((a1), (a2)) and its translated optical image ((b1), (b2)), the real optical image ((c1), (c2)) and its translated SAR image ((d1), (d2)).

  • Figure 8

    (Color online) Images listed above in each row are (a) the optical ground truth and (b) its translated single-pol SAR image and (c) translated full-pol SAR image, (d) the single-pol SAR ground truth and (e) the optical image translated by single-pol SAR image, (f) the full-pol SAR ground truth and (g) the optical image translated by full-pol SAR image in order. Each row lists a kind of earth surfaces: waters, vegetation, farmlands and buildings.

  • Figure 9

    (Color online) Comparison of SAR-optical translation by different methods. Images in each row from left to right are (a) the real optical image, (b) the input SAR image, (c) its translated optical image by CycleGAN, (d) the translated optical image by Pix2Pix and (e) the translated optical image by CRAN. Each row lists a kind of earth surfaces: buildings, buildings, farmlands and roads.

  • Figure 10

    (Color online) Comparison of SAR-optical translation by different methods to verify the not exactly correctness of evaluation metrics. The left three columns are chosen from 6 m full-pol UAVSAR dataset: (a) optical ground truth, (b) translated optical image by CRAN, (c) translated optical image by CycleGAN; the right columns are chosen from 10 m single-pol UAVSAR dataset: protect łinebreak (d) optical ground truth, (e) translated optical image by CRAN, (f) translated optical image by Pix2Pix.

  • Figure 11

    (Color online) Translated images further refined with unsupervised learning. Images in each row from left to right are (a) the input SAR image, (b) the translated optical image and (c) the further refined optical image by unsupervised learning, protect łinebreak (d) the input optical image and (e) its translated SAR image and (f) the further refined SAR image by unsupervised learning. Each row lists a kind of earth surfaces: waters, vegetation, farmlands and buildings.

  • Figure 12

    (Color online) A contrastive experiment on SAR image segmentation. Images in each row from left to right are protect łinebreak (a) optical ground truth, (b) segmentation ground truth, (c) input SAR image, (d) map segmented from (c), (e) optical image generated from (c) by CRAN and (f) map segmented from (e). For segmentation maps, colors red, green, blue, yellow, and black represent buildings, vegetation, waters, roads, and others respectively.

  • Figure 13

    (Color online) Airplane synthesis by CRAN. Images in each row from left to right are optical ground truth ((a1), (a2)), input SAR image ((b1), (b2)), and translated optical images ((c1), (c2)).

  • Table 1  

    Table 1FIDs of different datasets

    0.51 m single-pol GF-3 6 m single-pol UAVSAR 10 m single-pol UAVSAR 6 m full-pol UAVSAR
    Optical 154.8 106.4 138.4 85.6
    SAR 53.0 56.0 64.7 52.8
  • Table 2  

    Table 2Decreasing FID with increasing the number of samples (for the case of 6 m full-pol UAVSAR in Table 1)

    500 1000 2048 3000 4000 5000 6000 7000 8000 9000 10000
    Optical 125.0 102.9 85.6 81.2 77.9 75.9 74.8 74.1 73.4 72.7 72.1
    SAR 86.9 68.8 52.8 49.4 46.8 45.9 44.5 43.2 42.5 42.0 41.9
  • Table 3  

    Table 3Result comparisons of different methods with different datasets using different evaluation methods$^{\rm~a)}$

    2*Dataset
    2*Method SSIMPSNRFID
    SAR Opt. SAR Opt. SAR Opt.
    3*0.51 m single-pol GF-3 CycleGAN 0.2535 0.2656 15.7171 14.9675 62.1420 185.3181
    Pix2Pix 0.2194 0.2317 15.4978 14.4686 77.6901 212.5304
    CRAN 0.2595 0.2799 15.9172 15.5820 53.0067 154.7532
    3*6 m single-pol UAVSAR CycleGAN 0.3585 0.3005 19.5424 16.1030 50.5496 132.1710
    Pix2Pix 0.3407 0.3081 19.6044 15.7463 48.5541 99.7782
    CRAN 0.3640 0.3092 20.2907 16.1323 56.0201 106.3988
    3*10 m single-pol UAVSAR CycleGAN 0.2879 0.2973 18.5911 16.2957 53.2890 113.288
    Pix2Pix 0.2917 0.3072 18.3707 16.0357 63.5519 146.7449
    CRAN 0.2819 0.3346 18.3092 16.4238 64.7359 138.3651
    3*6 m full-pol UAVSAR CycleGAN 0.3418 0.3254 18.3431 16.0414 46.0073 95.69
    Pix2Pix 0.3716 0.3308 19.5295 16.0421 65.1980 94.9724
    CRAN0.3768 0.3109 19.2188 16.1489 52.7645 85.5704

    a

  • Table 4  

    Table 4FIDs of results by supervised and unsupervised learning

    Supervised learning Unsupervised learning
    Optical 107.8 88.9
    SAR 58.1 41.2
  • Table 5  

    Table 5Number of trainable parameters and operations in the three translation networks

    Model Number of parameters Number of operations/FLOPs
    CycleGAN generator 113.73 M 152.39 G
    Pix2Pix Generator 107.16 M 89.50 G
    CRAN Generator 107.49 M 79.41 G
    Discriminator 5.35 M 6.53 G
  • Table 6  

    Table 6FCN-scores for the two segmentation scheme

    Scheme Per-pixel accuary Per-class accuary Class IOU
    SAR-Segmentation 0.5988 0.4927 0.3742
    Translated Optical-Segmentation 0.5052 0.4395 0.2923