logo

SCIENTIA SINICA Informationis, Volume 50, Issue 5: 621-636(2020) https://doi.org/10.1360/SSI-2019-0189

The design of Apache IoTDB distributed framework

Tianan LI1,2,3, Xiangdong HUANG1,2,3,*, Jianmin WANG1,2,3, Dongfang MAO1,2,3, Yi XU1,2,3, Jun YUAN1,2,3
More info
  • ReceivedAug 30, 2019
  • AcceptedDec 17, 2019
  • PublishedApr 27, 2020

Abstract

Apache Internet of Things Database (IoTDB) is a new open-source timeseries database management system. A distributed data management system not only needs to solve the problem of metadata synchronization between nodes caused by data partition and multiple replicas, but also needs to support efficient query request processing. To solve the problem of metadata synchronization among nodes, we propose a dual-layer granularity metadata management strategy. Based on the consistency hash partitioning method and Raft protocol, we designed a distributed framework that supports both strong consistency query and eventual consistency query. Based on the single-machine version of Apache IoTDB, we carried out the system implementation and experimental test. Compared with the single-level granularity management strategy, the test results showed that the two-level granularity metadata management strategy takes less memory resources and improves the write performance by $5%\sim10%$. Also, the results showed that the read and write performance of the distributed Apache IoTDB increases linearly with the extension of cluster size.


Funded by

国家重点研发计划(2016YFB1000701)

国家自然科学基金(61802224,71690231)


References

[1] Fu T. A review on time series data mining. Eng Appl Artificial Intelligence, 2011, 24: 164-181 CrossRef Google Scholar

[2] Stonebraker M, Çetintemel U, Zdonik S. The 8 requirements of real-time stream processing. SIGMOD Rec, 2005, 34: 42-47 CrossRef Google Scholar

[3] Ongaro D, Ousterhout J. In search of an understandable consensus algorithm. In: Proceedings of USENIX Annual Technical Conference, 2014. 305--319. Google Scholar

[4] Karger D. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In: Proceedings of ACM Symposium on Theory of Computing, 1997. Google Scholar

[5] Kwon Y C, Ren K, Balazinska M, et al. Managing Skew in Hadoop. IEEE Data Eng Bull, 2013, 36: 24-33. Google Scholar

[6] Shvachko K, Kuang H, Radia S, et al. The hadoop distributed file system. In: Proceedings of IEEE 26th Symposium on Mass Storage Systems and Technologies, 2010. 1--10. Google Scholar

[7] Dubreuil M, Gagne C, Parizeau M. Analysis of a master-slave architecture for distributed evolutionary computations.. IEEE Trans Syst Man Cybern B, 2006, 36: 229-235 CrossRef PubMed Google Scholar

[8] Lakshman A, Malik P. Cassandra. SIGOPS Oper Syst Rev, 2010, 44: 35 CrossRef Google Scholar

[9] Stoica I, Morris R, Liben-Nowell D. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans Networking, 2003, 11: 17-32 CrossRef Google Scholar

[10] Chen P M, Lee E K, Gibson G A. RAID: high-performance, reliable secondary storage. ACM Comput Surv, 1994, 26: 145-185 CrossRef Google Scholar

[11] Aguilera M K. Stumbling Over Consensus Research: Misunderstandings and Issues. Berlin: Springer, 2010. 59--72. Google Scholar

[12] Stonebraker M. Retrospection on a database system. ACM Trans Database Syst, 1980, 5: 225-240 CrossRef Google Scholar

[13] Naqvi S N Z, Yfantidou S, Zimanyi E. Time Series Databases and InfluxDB. Studienarbeit, Universite Libre de Bruxelles, 2017. Google Scholar

[14] Prasad S, Avinash S B. Smart meter data analytics using OpenTSDB and Hadoop. In: Proceedings of IEEE Innovative Smart Grid Technologies-Asia, 2013. 1--6. Google Scholar

[15] Vora M N. Hadoop-HBase for large-scale data. In: Proceedings of International Conference on Computer Science and Network Technology, 2011. 1: 601--605. Google Scholar

[16] Van Renesse R, Dumitriu D, Gough V, et al. Efficient reconciliation and flow control for anti-entropy protocols. In: proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, 2008. 6. Google Scholar

[17] Chang F, Dean J, Ghemawat S. Bigtable. ACM Trans Comput Syst, 2008, 26: 1-26 CrossRef Google Scholar

[18] DeCandia G, Hastorun D, Jampani M. Dynamo. SIGOPS Oper Syst Rev, 2007, 41: 205-220 CrossRef Google Scholar

[19] Chen Z, Yang S, Tan S, et al. Hybrid Range Consistent Hash Partitioning Strategy--A New Data Partition Strategy for NoSQL Database. In: Proceedings of the 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 2013. 1161--1169. Google Scholar

[20] Hunt P, Konar M, Junqueira F P, et al. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In: Proceedings of USENIX Annual Technical Conference, 2010. 8. Google Scholar

[21] Lamport L. Paxos made simple. ACM Sigact News, 2001, 32: 18--25. Google Scholar

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号