logo

SCIENTIA SINICA Informationis, Volume 51 , Issue 1 : 104(2021) https://doi.org/10.1360/SSI-2020-0153

Robotic cross-modal generative adversarial network based on variational Bayesian Gaussian mixture noise model

More info
  • ReceivedMay 30, 2020
  • AcceptedAug 13, 2020
  • PublishedDec 16, 2020

Abstract


Funded by

国家自然科学基金(61903175,61663027,91648206)

国家重点研发计划“云端融合的自然交互设备和工具"(2016YFB1001300)

江西省主要学科学术和技术带头人项目(20204BCJ23006)


References

[1] Liu H, Guo D, Sun F. Object Recognition Using Tactile Measurements: Kernel Sparse Coding Methods. IEEE Trans Instrum Meas, 2016, 65: 656-665 CrossRef Google Scholar

[2] Liu H, Yu Y, Sun F. Visual-Tactile Fusion for Object Recognition. IEEE Trans Automat Sci Eng, 2017, 14: 996-1008 CrossRef Google Scholar

[3] Liu H, Sun F, Fang B. Multimodal Measurements Fusion for Surface Material Categorization. IEEE Trans Instrum Meas, 2018, 67: 246-256 CrossRef Google Scholar

[4] Yuan W, Zhu C, Owens A, et al. Shape-independent hardness estimation using deep learning and a gelsight tactile sensor. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2017. 951--958. Google Scholar

[5] Su J, Zhang Y Z, Fang L J, et al. Estimation of the Grasping Pose of Unknown Objects Based on Multiple Geometric Constraints. ROBOT, 2020, 42: 129--138. Google Scholar

[6] Zhong X G, Xu M, Zhong X Y, et al. Multimodal Features Deep Learning for Robotic Potential Grasp Recognition. Acta Autom Sin, 2016, 42: 1022--1029. Google Scholar

[7] Garcia-Garcia A, Zapata-Impata B S, Orts-Escolano S, et al. TactileGCN: a graph convolutional network for predicting grasp stability with tactile sensors. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), 2019: 1--8. Google Scholar

[8] Zapata-Impata B S, Gil P, Torres F. Non-matrix tactile sensors: how can be exploited their local connectivity for predicting grasp stability? 2018,. arXiv Google Scholar

[9] Li J, Dong S, Adelson E. Slip detection with combined tactile and visual information. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2018. 7772--7777. Google Scholar

[10] Calandra R, Owens A, Jayaraman D. More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch. IEEE Robot Autom Lett, 2018, 3: 3300-3307 CrossRef Google Scholar

[11] Yuan W, Srinivasan M A, Adelson E H. Estimating object hardness with a gelsight touch sensor. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016. 208--215. Google Scholar

[12] Yuan W, Zhu C, Owens A, et al. Shape-independent hardness estimation using deep learning and a gelsight tactile sensor. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2017. 951--958. Google Scholar

[13] Yuan W, Li R, Srinivasan M A, et al. Measurement of shear and slip with a GelSight tactile sensor. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2015. 304--311. Google Scholar

[14] Jia X, Li R, Srinivasan M A, et al. Lump detection with a gelsight sensor. In: Proceedings of World Haptics Conference (WHC), 2013. 175--179. Google Scholar

[15] Li R, Adelson E H. Sensing and recognizing surface textures using a gelsight sensor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013. 1241--1247. Google Scholar

[16] Gurumurthy S, Sarvadevabhatla R K, Babu R V. DeLiGAN: generative adversarial networks for diverse and limited data. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 4941--4949. Google Scholar

[17] Kingma D P, Welling M. Auto-encoding variational bayes. 2013,. arXiv Google Scholar

[18] Erickson Z, Chernova S, Kemp C C. Semi-supervised haptic material recognition for robots using generative adversarial networks. 2017,. arXiv Google Scholar

[19] Wei X, Li J, Sun X, et al. Sparse representation of robot image based on dictionary learning algorithm. Acta Autom Sin, 2020, 1--15. https://doi.org/10.16383/j.aas.c190743. Google Scholar

[20] Li M Y, Zhou F Y, Tian T, et al. Design of the Indoor WiFi Cloud Positioning System Based on GANand Its Application to Service Robots. ROBOT, 2018, 40: 693--703. Google Scholar

[21] Lee J T, Bollegala D, Luo S. “Touching to See" and “Seeing to Feel": robotic cross-modal sensory data generation for visual-tactile perception. In: Proceedings of International Conference on Robotics and Automation (ICRA), 2019. 4276--4282. Google Scholar

[22] Falco P, Lu S, Cirillo A, et al. Cross-modal visuo-tactile object recognition using robotic active exploration. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2017. 5273--5280. Google Scholar

[23] Zheng W, Liu H, Wang B. Cross-Modal Surface Material Retrieval Using Discriminant Adversarial Learning. IEEE Trans Ind Inf, 2019, 15: 4978-4987 CrossRef Google Scholar

[24] Li Y, Zhu J Y, Tedrake R, et al. Connecting touch and vision via cross-modal prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 10609--10618. Google Scholar

[25] Li X, Liu H, Zhou J. Learning cross-modal visual-tactile representation using ensembled generative adversarial networks. Cognitive Computation Syst, 2019, 99: 40-44 CrossRef Google Scholar

[26] Luo S, Yuan W, Adelson E, et al. Vitac: Feature sharing between vision and tactile sensing for cloth texture recognition. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2018. 2722--2727. Google Scholar

[27] Liu H, Sun F, Fang B. Cross-Modal Zero-Shot-Learning for Tactile Object Recognition. IEEE Trans Syst Man Cybern Syst, 2020, 50: 2466-2474 CrossRef Google Scholar

[28] Tang X L, Du Y M, Liu Y W, et al. Image recognition with conditional deep convolutional generative adversarial networks. Acta Autom Sin, 2018, 44: 855--864. Google Scholar

[29] Dumoulin V, Belghazi I, Poole B, et al. Adversarially learned inference. 2016,. arXiv Google Scholar

[30] Chen X, Duan Y, Houthooft R, et al. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016. 2172--2180. Google Scholar

[31] Yan S X, Li Y P, Feng X S. An AUV adaptive sampling method based on Gaussian process regression. ROBOT, 2019, 41: 232--241. Google Scholar

  • Figure 3

    BGM-CGAN cross-modal image generation process with Bayesian Gaussian mixture noise

  • Figure 6

    Comparison of the cross-modal generated images with the real images of an object

  • Figure 7

    Selected 40 objects

  • Figure 8

    Visual-tactile fusion ConvLSTM network

  • Table 1   IS comparison of various algorithms
    Algorithm Visual-tactile Tactile-visual
    CGAN 1.02 1.12
    SGM-CGAN 1.01 1.19
    MOG-CGAN 1.02 1.36
    BGM-CGAN 1.05 1.60
  • Table 2   FID comparison of various algorithms
    Algorithm Visual-tactile Tactile-visual
    CGAN 1.99 4.25
    SGM-CGAN 2.57 4.12
    MOG-CGAN 2.30 4.43
    BGM-CGAN 2.32 3.69
  • Table 3   SSIM comparison of various algorithms
    Algorithm Visual-tactile Tactile-visual
    CGAN 0.94 0.76
    SGM-CGAN 0.94 0.74
    MOG-CGAN 0.93 0.75
    BGM-CGAN 0.93 0.78
  • Table 4   Comparison of the running time (min) of various algorithms
    Algorithm Visual-tactile Tactile-visual
    CGAN 20.2 20.2
    SGM-CGAN 21.2 21.3
    MOG-CGAN 21.7 21.7
    BGM-CGAN 22.5 22.4
  • Table 5   Comparison of sliding detection results before and after data enhancing of test objects
    Object Raw data (%) Enhanced data (%)
    Object1 69.2 71.4
    Object2 54.5 58.3
    Object3 70.0 72.7
    Object4 66.7 68.4
    Object5 58.3 61.5
    Object6 64.2 66.7
    Object7 70.0 72.7
    Object8 63.6 66.7
    Object9 75.0 77.8
    Object10 52.9 55.6
    Mean value 64.4 67.2