logo

SCIENTIA SINICA Informationis, Volume 48, Issue 8: 1076-1082(2018) https://doi.org/10.1360/N112018-00025

Digital retina: revolutionizing camera systems for the smart city

More info
  • ReceivedJan 31, 2018
  • AcceptedMar 3, 2018
  • PublishedMay 21, 2018

Abstract

The primary viewpoints presented in this article are as follows: (1) The method to real-time gather and aggregate all kinds of urban big data, especially image and video data from video surveillance networks, and subsequently analyze and mine the value of these big data in the city brain to effectively support the urban operation and management is a key problem in the development of smart cities. (2) Recently, some city brains are established to mine the large visual data source to obtain valuable insights about the activities in the city (e.g., the urban traffic status). However, it is recognized that compression will inevitably affect visual feature extraction, and consequently degrading the subsequent analysis and retrieval performance. More importantly, it is impractical to aggregate all video streams from hundreds of thousands of cameras distributed across the city into a city brain for big data analysis and retrieval. These issues and challenges are rooted in the camera framework currently in use. (3) To address these challenges, a new camera framework should be developed from the fact that retina can encode both pixels and features. Such a retina-like camera, or directly referred to as digital retina, is typically equipped with a globally unified timer and an accurate positioner, and can output two streams simultaneously, including a compressed video stream for online/offline viewing and data storage, and a compact feature stream extracted from the original image/video signals for visual analysis and search. By real-time feeding only the feature streams into the city brain, these digital cameras form a compound-eye camera system for the smart city. (4) To promote the wide application of digital retinas in the smart city, the relevant works should be addressed in the near future, including standardization, hardware implementation, open-source software development, and the deployment of large-scale testbeds.


Funded by

国家重点研发计划“云计算与大数据"重点专项(2017YFB1002400)

国家重点基础研究发展计划(973)(2015CB351800)

国家自然科学基金大数据科学中心项目(U1611461)


References

[1] Gao W, Tian Y H, Huang T J. The IEEE 1857 standard: empowering smart video surveillance systems. IEEE Intell Syst, 2014, 29: 30-39 CrossRef Google Scholar

[2] Gao W, Ma S W. An overview of AVS2 standard. In: Advanced Video Coding Systems. Berlin: Springer, 2015. 35--49. Google Scholar

[3] Silveira R A D, Roska B. Cell types, circuits, computation. Curr Opin Neurobiol, 2011, 21: 664-671 CrossRef PubMed Google Scholar

[4] Zhang X G, Huang T J, Tian Y H. Background-modeling based adaptive prediction for surveillance video coding. IEEE Trans Image Process, 2014, 23: 769-784 CrossRef PubMed ADS Google Scholar

[5] Duan L Y, Chandrasekhar V, Chen J. Overview of the MPEG-CDVS standard. IEEE Trans Image Process, 2016, 25: 179-194 CrossRef PubMed ADS Google Scholar

[6] Ding L, Tian Y H, Fan H F. Rate-performance-loss optimization for inter-frame deep feature coding from videos. IEEE Trans Image Process, 2017, 26: 5743-5757 CrossRef PubMed Google Scholar

[7] Zhang X, Ma S W, Wang S S. A joint compression scheme of video feature descriptors and visual content. IEEE Trans Image Process, 2017, 26: 633-647 CrossRef PubMed ADS Google Scholar

  • Figure 1

    (Color online) The effect of video compression on different analysis and retrieval tasks, including (a) visual search, (b) face recognition, and (c) person re-identification. In the experiments, we selected one benchmark dataset for each task, and utilized the state-of-the-art AVS2 codec to obtain the reconstructed images and videos with different quantization parameters (QPs). Then the reconstructed images and videos were used to evaluate the performance of different tasks

  • Figure 2

    (Color onine) The compound-eye camera system for the smart city by connecting a large number of digital retinas. In this system, feature streams can be realtime aggregated into the city brain, while video streams are saved in the local storages and pulled to the city brain only on demand

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1