SCIENCE CHINA Information Sciences, Volume 64 , Issue 2 : 122101(2021) https://doi.org/10.1007/s11432-020-3024-5

Neural compositing for real-time augmented reality rendering in low-frequency lighting environments

More info
  • ReceivedJun 13, 2020
  • AcceptedJul 29, 2020
  • PublishedJan 5, 2021



Appendix A.


[1] Sloan P P, Kautz J, Snyder J. Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments. ACM Trans Graph, 2002, 21: 527-536 CrossRef Google Scholar

[2] Reinhard E, Adhikhmin M, Gooch B. Color transfer between images. IEEE Comput Grap Appl, 2001, 21: 34-41 CrossRef Google Scholar

[3] Pitie F, Kokaram A. The linear monge-kantorovitch linear colour mapping for example-based colour transfer. In: Proceedings of the 4th European Conference on Visual Media Production, 2007. 1--9. Google Scholar

[4] Sunkavalli K, Johnson M K, Matusik W. Multi-scale image harmonization. ACM Trans Graph, 2010, 29: 1-10 CrossRef Google Scholar

[5] Pérez P, Gangnet M, Blake A. Poisson image editing. ACM Trans Graph, 2003, 22: 313 CrossRef Google Scholar

[6] Tao M W, Johnson M K, Paris S. Error-tolerant image compositing. In: Proceedings of European Conference on Computer Vision. Berlin: Springer, 2010. 31--44. Google Scholar

[7] Johnson M K, Dale K, Avidan S. CG2Real: Improving the Realism of Computer Generated Images Using a Large Collection of Photographs. IEEE Trans Visual Comput Graphics, 2011, 17: 1273-1285 CrossRef Google Scholar

[8] Lalonde J F, Efros A A. Using color compatibility for assessing image realism. In: Proceedings of 2007 IEEE 11th International Conference on Computer Vision, 2007. 1--8. Google Scholar

[9] Tsai Y H, Shen X, Lin Z, et al. Deep image harmonization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 3789--3797. Google Scholar

[10] Cong W, Zhang J, Niu L, et al. Dovenet: deep image harmonization via domain verification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 8394--8403. Google Scholar

[11] Debevec P. Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques. New York: Association for Computing Machinery, 1998. 189--198. Google Scholar

[12] Agusanto K, Li L, Chuangui Z, et al. Photorealistic rendering for augmented reality using environment illumination. In: Proceedings of the 2nd IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. 208--216. Google Scholar

[13] Karsch K, Sunkavalli K, Hadap S. Automatic Scene Inference for 3D Object Compositing. ACM Trans Graph, 2014, 33: 1-15 CrossRef Google Scholar

[14] Aittala M. Inverse lighting and photorealistic rendering for augmented reality. Vis Comput, 2010, 26: 669-678 CrossRef Google Scholar

[15] Boivin S, Gagalowicz A. Image-based rendering of diffuse, specular and glossy surfaces from a single image. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 2001. 107--116. Google Scholar

[16] Hold-Geoffroy Y, Sunkavalli K, Hadap S, et al. Deep outdoor illumination estimation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2017. Google Scholar

[17] Hold-Geoffroy Y, Athawale A, Lalonde J. Deep sky modeling for single image outdoor lighting estimation. 2019,. arXiv Google Scholar

[18] LeGendre C, Ma W C, Fyffe G, et al. Deeplight: Learning illumination for unconstrained mobile mixed reality. In: Proceedings of ACM SIGGRAPH 2019 Talks. New York: Association for Computing Machinery, 2019. Google Scholar

[19] Song S, Funkhouser T. Neural illumination: Lighting prediction for indoor environments. 2019,. arXiv Google Scholar

[20] Garon M, Sunkavalli K, Hadap S, et al. Fast spatially-varying indoor lighting estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 6908--6917. Google Scholar

[21] Sengupta S, Gu J, Kim K, et al. Neural inverse rendering of an indoor scene from a single image. In: Proceedings of International Conference on Computer Vision (ICCV), 2019. Google Scholar

[22] Li X, Dong Y, Peers P. Modeling surface appearance from a single photograph using self-augmented convolutional neural networks. ACM Trans Graph, 2017, 36: 1-11 CrossRef Google Scholar

[23] Li Z, Xu Z, Ramamoorthi R, et al. Learning to reconstruct shape and spatially-varying reflectance from a single image. In: Proceedings of SIGGRAPH Asia 2018 Technical Papers. New York: ACM, 2018. 269. Google Scholar

[24] Gao D, Li X, Dong Y. Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images. ACM Trans Graph, 2019, 38: 1-15 CrossRef Google Scholar

[25] Meka A, Maximov M, Zollhoefer M, et al. Lime: live intrinsic material estimation. In: Proceedings of Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 6315--6324. Google Scholar

[26] Karsch K, Hedau V, Forsyth D. Rendering synthetic objects into legacy photographs. ACM Trans Graph, 2011, 30: 1-12 CrossRef Google Scholar

[27] Kán P, Kaufmann H. High-quality reflections, refractions, and caustics in augmented reality and their contribution to visual coherence. In: Proceedings of 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2012. 99--108. Google Scholar

[28] Kán P, Kaufmann H. Differential irradiance caching for fast high-quality light transport between virtual and real worlds. In: Proceedings of 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2013. 133--141. Google Scholar

[29] Meshry M, Goldman D B, Khamis S, et al. Neural rerendering in the wild. 2019,. arXiv Google Scholar

[30] Thies J, Zollh?fer M, Nie?ner M. Deferred neural rendering. ACM Trans Graph, 2019, 38: 1-12 CrossRef Google Scholar

[31] Li T M, Aittala M, Durand F. Differentiable Monte Carlo ray tracing through edge sampling. ACM Trans Graph, 2019, 37: 1-11 CrossRef Google Scholar

[32] Che C, Luan F, Zhao S, et al. Inverse transport networks. 2018,. arXiv Google Scholar

[33] Zhang C, Wu L, Zheng C. A differential theory of radiative transfer. ACM Trans Graph, 2019, 38: 1-16 CrossRef Google Scholar

[34] Loper M M, Black M J. Opendr: an approximate differentiable renderer. In: Computer Vision -- ECCV 2014. Berlin: Springer, 2014. 154--169. Google Scholar

[35] Kato H, Ushiku Y, Harada T, et al. Neural 3d mesh renderer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Google Scholar

[36] Dobashi Y, Iwasaki W, Ono A. An inverse problem approach for automatically adjusting the parameters for rendering clouds using photographs. ACM Trans Graph, 2012, 31: 1-10 CrossRef Google Scholar

[37] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, Berlin: Springer, 2015. 234--241. Google Scholar

[38] Walter B, Marschner S, Li H, et al. Microfacet models for refraction through rough surfaces. In: Proceedings of Eurographics Symposium on Rendering, 2007. 195--206. Google Scholar

[39] Gardner M A, Sunkavalli K, Yumer E. Learning to predict indoor illumination from a single image. ACM Trans Graph, 2017, 36: 1-14 CrossRef Google Scholar

[40] Wald I, Woop S, Benthin C. Embree. ACM Trans Graph, 2014, 33: 1-8 CrossRef Google Scholar

[41] Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego, 2015. Google Scholar

[42] Bojanowski P, Joulin A, Lopez-Paz D, et al. Optimizing the latent space of generative networks. 2017,. arXiv Google Scholar

[43] Ng R, Ramamoorthi R, Hanrahan P. All-frequency shadows using non-linear wavelet lighting approximation. In: Proceedings of ACM SIGGRAPH 2003 Papers. New York: Association for Computing Machinery, 2003. 376--381. Google Scholar

  • Figure 1

    (Color online) Our real-time AR system. Left: a virtual armadillo rendered into a photograph. The system correctly shows reflection on the table while only casting shadow on the blue paper. Right: a user manipulating a virtual coffee cup on a laptop at 25 fps.

  • Figure 2

    (Color online) The pipeline of our method and network architectures. Some images have been enhanced to improve legibility. The pipeline is composed of three U-Nets [37]like networks (roughness network, reflection network and shadow network, a light networkresembling DeepLight [18], a conventional real-time renderer and several fixed-function blending steps.

  • Figure 3

    (Color online) Intermediate results in a compositing process. From left to right: the input image, the reflection mask, the color layer (with the estimated SH lighting shown as inset), the refined reflection layer, the refined shadow layer, and the final output image.

  • Figure 4

    (Color online) Example training scenes. We use image resolution of $128\times~128$. Background images are shown in the upper row and final images are shown in the lower one.

  • Figure 5

    (Color online) Ablation study. The “All effects” column runs our full pipeline. The “Raw shadow” column disables the shadow network and blends with the raw shadow layer. The “Raw refl.” column disables both the reflection network and the roughness network and adds the specular reflection directly. The background images in the first three rows are real-world photos while that in the fourth row is synthetically rendered.

  • Figure 6

    (Color online) Visualization of raw shadow layers and refined shadow layers. The “Foreground” column shows the rendered virtual object, the “Result” column shows the compositing result of our pipeline, the “Raw shadow” column shows the directly added shadow layers, and the “Refined shadow” shows the refined shadow layers by our network. A gamma correction ($\gamma=2.2$) is applied to both layers to better visualize details. Shadow on glossy surface is slightly reduced by the shadow network. Note that the occlusion on non-plane background objects is also learnt automatically by our network.

  • Figure 7

    (Color online) Ablation study of reflection mask. Background images and reflection masks are shown in the first column. Compositing results with and without reflection masks are shown in the second and the third columns.

  • Figure 8

    (Color online) Comparison with ground truth (a real photo). (a) Background; (b) our result; (c) real photo.

  • Figure 9

    (Color online) Comparison with ground truth (a synthetic image). The estimated lighting and the ground truth lighting are also shown as insets. (a) Background; (b) our result; (c) reference.

  • Figure 10

    (Color online) Comparison with the work of Karsch et al. [13].

  • Figure 11

    (Color online) Comparison with DeepLight [18]. The bunny is virtual in (b) and (c).

  • Figure 12

    (Color online) Comparisons with Tsai et al. [9], Garon et al. [20]. The virtual objects of the first two rows are the dragon and lucy. Multiple virtual objects (bunny, dragon and buddha) are inserted in the last row to valify our method's capacity in a more general case.

  • Figure 13

    (Color online) From left to right: (a) problematic lighting estimation leads to inconsistent shading; (b) over blurring reflection caused by upsampling from the limited network resolution; (c) reflection overlapping artifact. The virtual objects are the buddha, dragon and mobile phone.

  • Table 1  

    Table 1Rendered layers$^{\rm~a)}$

    Pass Term Description
    BG $B$ Background layer
    FG $A$ Foreground alpha
    FG $\tilde{C}^{l,m}$ Foreground color under SH basis lighting $Y_{l,m}$
    FG $\tilde{R}_\mathrm{spec}^{l,m}$ Specular reflection under $Y_{l,m}$
    FG $\tilde{R}_\mathrm{gloss}^{l,m}$ Glossy reflection under $Y_{l,m}$
    FG $\tilde{S}_\mathrm{spec}^{l,m}\circ~B$ Specular shadow under $Y_{l,m}$
    FG $\tilde{S}_\mathrm{gloss}^{l,m}\circ~B$ Glossy shadow under $Y_{l,m}$
    FG $\tilde{S}_\mathrm{diff,n}^{l,m}$ Diffuse shadow numerator under $Y_{l,m}$
    FG $\tilde{S}_\mathrm{diff,d}^{l,m}$ Diffuse shadow denominator under $Y_{l,m}$
    ALL $I$ Ground truth final image
    ALL $R$ Ground truth floor reflection
    ALL $S$ Ground truth shadow
    ALL $M$ Pre-training reflection mask


  • Table 2  

    Table 2Quantitative comparisons of L1 and L2 loss on synthetic dataset$^{\rm~a)}$

    Loss All effects Raw shadow Raw refl.
    L1 0.0088 0.0301 0.0135
    L2 0.0247 0.0840 0.0384


  • Table 31  

    Table 3Table 1

    The user study results$^{\rm~a)}$