This work was supported in part by National Key RD Program of China (Grant No. 2017YFB0502703) and Natural Science Foundation of China (Grant Nos. 61822107, 61571134).
Appendixes A and B.
[1] Isola P, Zhu J, Zhou T, et al. Image-to-image translation with conditional adversarial networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 5967--5976. Google Scholar
[2] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 2242--2251. Google Scholar
[3] Jin K H, McCann M T, Froustey E. Deep Convolutional Neural Network for Inverse Problems in Imaging. IEEE Trans Image Process, 2017, 26: 4509-4522 CrossRef ADS arXiv Google Scholar
[4] Zhu J Y, Zhang R, Pathak D, et al. Toward multimodal image-to-image translation. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 465--476. Google Scholar
[5] Heusel M, Ramsauer H, Unterthiner T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 6626--6637. Google Scholar
[6] Younggi Byun , Jaewan Choi , Youkyung Han . An Area-Based Image Fusion Scheme for the Integration of SAR and Optical Satellite Imagery. IEEE J Sel Top Appl Earth Observations Remote Sens, 2013, 6: 2212-2220 CrossRef ADS Google Scholar
[7] Garzelli A. Wavelet-based fusion of optical and sar image data over urban area. International Archives of Photogrammetry Remote Sensing and Spatial Information Sciences, 2002, 34: 59--62. Google Scholar
[8] Fan J, Wu Y, Li M. SAR and Optical Image Registration Using Nonlinear Diffusion and Phase Congruency Structural Descriptor. IEEE Trans Geosci Remote Sens, 2018, 56: 5368-5379 CrossRef ADS Google Scholar
[9] Liu J, Gong M, Qin K. A Deep Convolutional Coupling Network for Change Detection Based on Heterogeneous Optical and Radar Images. IEEE Trans Neural Netw Learning Syst, 2018, 29: 545-559 CrossRef Google Scholar
[10] Merkle N, Auer S, Muller R. Exploring the Potential of Conditional Adversarial Networks for Optical and SAR Image Matching. IEEE J Sel Top Appl Earth Observations Remote Sens, 2018, 11: 1811-1820 CrossRef ADS Google Scholar
[11] He W, Yokoya N. Multi-Temporal Sentinel-1 and -2 Data Fusion for Optical Image Simulation. IJGI, 2018, 7: 389 CrossRef ADS arXiv Google Scholar
[12] Schmitt M, Hughes L H, Zhu X X. The sen1-2 dataset for deep learning in sar-optical data fusion. In: Proceedings of ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-1, 2018. 141--146. Google Scholar
[13] Wang P, Patel V M. Generating high quality visible images from sar images using cnns. In: Proceedings of 2018 IEEE Radar Conference (RadarConf18), 2018. 0570--0575. Google Scholar
[14] Enomoto K, Sakurada K, Wang W, et al. Image translation between sar and optical imagery with generative adversarial nets. In: Proceedings of IGARSS IEEE International Geoscience and Remote Sensing Symposium, 2018. 1752--1755. Google Scholar
[15] Wang L, Xu X, Yu Y. SAR-to-Optical Image Translation Using Supervised Cycle-Consistent Adversarial Networks. IEEE Access, 2019, 7: 129136 CrossRef Google Scholar
[16] Li Y, Fu R, Meng X. A SAR-to-Optical Image Translation Method Based on Conditional Generation Adversarial Network (cGAN). IEEE Access, 2020, 8: 60338-60343 CrossRef Google Scholar
[17] Fuentes Reyes M, Auer S, Merkle N. SAR-to-Optical Image Translation Based on Conditional Generative Adversarial Networks-Optimization, Opportunities and Limits. Remote Sens, 2019, 11: 2067 CrossRef ADS Google Scholar
[18] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-assisted Intervention. Berlin: Springer, 2015. 234--241. Google Scholar
[19] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672--2680. Google Scholar
[20] Arjovsky M, Chintala S, Bottou L. Wasserstein gan. 2017,. arXiv Google Scholar
[21] Cozzolino D, Parrilli S, Scarpa G. Fast Adaptive Nonlocal SAR Despeckling. IEEE Geosci Remote Sens Lett, 2014, 11: 524-528 CrossRef ADS Google Scholar
[22] Chen L C, Zhu Y, Papandreou G, et al. Encoderdecoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018. 801--818. Google Scholar
[23] Lee J S, Pottier E. Polarimetric Radar Imaging: From Basics to Applications. Boca Raton: CRC Press, 2009. Google Scholar
[24] Jin Y Q, Xu F. Polarimetric Scattering and SAR Information Retrieval. Hoboken: John Wiley Sons, 2013. Google Scholar
Figure 1
(Color online) Schematic diagram of the translation network inspired by Pix2Pix network
Figure 2
(Color online) Translator network architecture with cascaded-residual connections. The input data size is $256\times256\times~n_i$ and the output data size is $256\times256\times~n_o$. The first two numbers represent the size of the feature maps and the third number represents the channel of the feature map. The symbols $n_i$ and $n_o$ denote the channel number of the input and output image respectively, set to 1 for SAR images and 3 for optical images. The symbol $n_g$ denotes the benchmark number of feature maps in the generator, that is, the number of feature maps of first layer. The concatenation from the encoder and the input to the decoder is signified by lines with arrows.
Figure 3
(Color online) Discriminator network architecture. The input data size is $256\times256\times~n_i$ and the output probability map size is $32\times32\times1$. The first two numbers represent the size of the feature maps and the third number represents the channel of the feature map. The symbol $n_i$ denotes the channel number of the input image, and $n_d$ denotes the benchmark number of feature maps in the discriminator, set to 64 here.
Figure 4
(Color online) Different qualities of translated optical images are induced by different losses. The first column lists (a) the input SAR images; the intermediate three columns are respectively (b) translated optical images with L1-only loss, protect łinebreak (c) translated optical images with GAN-only loss and (d) translated optical images with L1+GAN loss; the last column are (e) the corresponding optical ground truths.
Figure 5
(Color online) Modified network scheme for unsupervised learning with CycleGAN loops
Figure 6
(Color online) Example translation images with UAVSAR (test samples). Images in each row from left to right are the real SAR image ((a1), (a2)) and its translated optical image ((b1), (b2)), the real optical image ((c1), (c2)) and its translated SAR image ((d1), (d2)). The first row is chosen from 6 m UAVSAR dataset, and the latter is from 10 m UAVSAR dataset.
Figure 7
(Color online) Example translation images with GF-3 data. Images in each row from left to right are the real SAR image ((a1), (a2)) and its translated optical image ((b1), (b2)), the real optical image ((c1), (c2)) and its translated SAR image ((d1), (d2)).
Figure 8
(Color online) Images listed above in each row are (a) the optical ground truth and (b) its translated single-pol SAR image and (c) translated full-pol SAR image, (d) the single-pol SAR ground truth and (e) the optical image translated by single-pol SAR image, (f) the full-pol SAR ground truth and (g) the optical image translated by full-pol SAR image in order. Each row lists a kind of earth surfaces: waters, vegetation, farmlands and buildings.
Figure 9
(Color online) Comparison of SAR-optical translation by different methods. Images in each row from left to right are (a) the real optical image, (b) the input SAR image, (c) its translated optical image by CycleGAN, (d) the translated optical image by Pix2Pix and (e) the translated optical image by CRAN. Each row lists a kind of earth surfaces: buildings, buildings, farmlands and roads.
Figure 10
(Color online) Comparison of SAR-optical translation by different methods to verify the not exactly correctness of evaluation metrics. The left three columns are chosen from 6 m full-pol UAVSAR dataset: (a) optical ground truth, (b) translated optical image by CRAN, (c) translated optical image by CycleGAN; the right columns are chosen from 10 m single-pol UAVSAR dataset: protect łinebreak (d) optical ground truth, (e) translated optical image by CRAN, (f) translated optical image by Pix2Pix.
Figure 11
(Color online) Translated images further refined with unsupervised learning. Images in each row from left to right are (a) the input SAR image, (b) the translated optical image and (c) the further refined optical image by unsupervised learning, protect łinebreak (d) the input optical image and (e) its translated SAR image and (f) the further refined SAR image by unsupervised learning. Each row lists a kind of earth surfaces: waters, vegetation, farmlands and buildings.
Figure 12
(Color online) A contrastive experiment on SAR image segmentation. Images in each row from left to right are protect łinebreak (a) optical ground truth, (b) segmentation ground truth, (c) input SAR image, (d) map segmented from (c), (e) optical image generated from (c) by CRAN and (f) map segmented from (e). For segmentation maps, colors red, green, blue, yellow, and black represent buildings, vegetation, waters, roads, and others respectively.
Figure 13
(Color online) Airplane synthesis by CRAN. Images in each row from left to right are optical ground truth ((a1), (a2)), input SAR image ((b1), (b2)), and translated optical images ((c1), (c2)).
0.51 m single-pol GF-3 | 6 m single-pol UAVSAR | 10 m single-pol UAVSAR | 6 m full-pol UAVSAR | |
Optical | 154.8 | 106.4 | 138.4 | 85.6 |
SAR | 53.0 | 56.0 | 64.7 | 52.8 |
500 | 1000 | 2048 | 3000 | 4000 | 5000 | 6000 | 7000 | 8000 | 9000 | 10000 | |
Optical | 125.0 | 102.9 | 85.6 | 81.2 | 77.9 | 75.9 | 74.8 | 74.1 | 73.4 | 72.7 | 72.1 |
SAR | 86.9 | 68.8 | 52.8 | 49.4 | 46.8 | 45.9 | 44.5 | 43.2 | 42.5 | 42.0 | 41.9 |
2*Dataset | |||||||
2*Method | SSIM | PSNR | FID | ||||
SAR | Opt. | SAR | Opt. | SAR | Opt. | ||
3*0.51 m single-pol GF-3 | CycleGAN | 0.2535 | 0.2656 | 15.7171 | 14.9675 | 62.1420 | 185.3181 |
Pix2Pix | 0.2194 | 0.2317 | 15.4978 | 14.4686 | 77.6901 | 212.5304 | |
CRAN | |||||||
3*6 m single-pol UAVSAR | CycleGAN | 0.3585 | 0.3005 | 19.5424 | 16.1030 | 50.5496 | 132.1710 |
Pix2Pix | 0.3407 | 0.3081 | 19.6044 | 15.7463 | |||
CRAN | 56.0201 | 106.3988 | |||||
3*10 m single-pol UAVSAR | CycleGAN | 0.2879 | 0.2973 | 16.2957 | |||
Pix2Pix | 0.3072 | 18.3707 | 16.0357 | 63.5519 | 146.7449 | ||
CRAN | 0.2819 | 18.3092 | 64.7359 | 138.3651 | |||
3*6 m full-pol UAVSAR | CycleGAN | 0.3418 | 0.3254 | 18.3431 | 16.0414 | 95.69 | |
Pix2Pix | 0.3716 | 16.0421 | 65.1980 | 94.9724 | |||
CRAN | 0.3109 | 19.2188 | 52.7645 |
a
Supervised learning | Unsupervised learning | |
Optical | 107.8 | 88.9 |
SAR | 58.1 | 41.2 |
Model | Number of parameters | Number of operations/FLOPs |
CycleGAN generator | 113.73 M | 152.39 G |
Pix2Pix Generator | 107.16 M | 89.50 G |
CRAN Generator | 107.49 M | 79.41 G |
Discriminator | 5.35 M | 6.53 G |
Scheme | Per-pixel accuary | Per-class accuary | Class IOU |
SAR-Segmentation | 0.5988 | 0.4927 | 0.3742 |
Translated Optical-Segmentation | 0.5052 | 0.4395 | 0.2923 |