SCIENCE CHINA Information Sciences, Volume 61, Issue 10: 100303(2018) https://doi.org/10.1007/s11432-018-9499-0

Storage and repair bandwidth tradeoff for distributed storage systems with clusters and separate nodes$^\dagger$

• AcceptedJun 19, 2018
• PublishedAug 20, 2018
Share
Rating

Abstract

The optimal tradeoff between node storage and repair bandwidth is an important issue for distributed storage systems (DSSs). For realistic DSSs with clusters, while repairing a failed node, downloading more data from intra-cluster nodes than from cross-cluster nodes is effective. Therefore, differentiating the repair bandwidth from intra-cluster and cross-cluster is useful. For cluster DSSs, the tradeoff is considered with special repair assumptions where all alive nodes are used for repairing a failed node. In this paper, we investigate the optimal tradeoff for the cluster DSSs under more general storage/repair parameters. Furthermore, we propose a regenerating code construction strategy that achieves the points in the optimal tradeoff curve for the cluster DSSs with specific parameters as a numerical example. Moreover, we consider the influence of separate nodes for the tradeoff for the DSSs with clusters and separated nodes.

References

[1] Hu Y C, Li X L, Zhang M. Optimal repair layering for erasure-coded data centers. ACM Trans Storage, 2017, 13: 1-24 CrossRef Google Scholar

[2] Zhang H Y, Li H, Li R S. Repair tree: fast repair for single failure in erasure-coded distributed storage systems. IEEE Trans Parallel Distrib Syst, 2017, 28: 1728-1739 CrossRef Google Scholar

[3] Huang C, Chen M H, Li J. Pyramid codes: flexible schemes to trade space for access efficiency in reliable data storage systems. In: Proceedings of the 6th IEEE International Symposium on Network Computing and Applications, Cambridge, 2007. Google Scholar

[4] Dimakis A G, Godfrey P B, Wu Y N. Network coding for distributed storage systems. IEEE Trans Inf Theory, 2010, 56: 4539-4551 CrossRef Google Scholar

[5] Ernvall T, Rouayheb S E, Hollanti C. Capacity and security of heterogeneous distributed storage systems. IEEE J Sel Areas Commun, 2013, 31: 2701-2709 CrossRef Google Scholar

[6] Yu Q, Shum K W, Sung C W. Tradeoff between storage cost and repair cost in heterogeneous distributed storage systems. Trans Emerg Telecommun Technol, 2015, 26: 1201-1211 CrossRef Google Scholar

[7] Akhlaghi S, Kiani A, Ghanavati M R. Cost-bandwidth tradeoff in distributed storage systems. Comput Commun, 2010, 33: 2105-2115 CrossRef Google Scholar

[8] Ford D, Labelle F, Popovici F I, et al. Availability in globally distributed storage systems. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, Vancouver, 2010. Google Scholar

[9] Hu Y C, Lee P P C, Zhang X Y. Double regenerating codes for hierarchical data centers. In: Proceedings of IEEE International Symposium on Information Theory (ISIT), Barcelona, 2016. 245--249. Google Scholar

[10] Hou H X, Lee P P C, Shum K W, et al. Rack-aware regenerating codes for data centers. 2018,. arXiv Google Scholar

[11] Prakash N, Abdrashitov V, Médard M. The storage vs repair-bandwidth trade-off for clustered storage systems. IEEE Trans Inf Theory, 2017,. arXiv Google Scholar

[12] Pernas J, Yuen C, Gastón B, et al. Non-homogeneous two-rack model for distributed storage systems. In: Proceedings of IEEE International Symposium on Information Theory (ISIT), Istanbul, 2013. 1237--1241. Google Scholar

[13] Sohn J Y, Choi B, Yoon S W, et al. Capacity of clustered distributed storage. In: Proceedings of IEEE International Conference on Communications (ICC), Paris, 2017. Google Scholar

[14] Kubiatowicz J, Weimer W, Wells C. OceanStore: an architecture for global-scale persistent storage. SIGPLAN Notice, 2000, 35: 190-201 CrossRef Google Scholar

[15] Dimakis A G, Ramchandran K, Yunnan Wu K. A survey on network codes for distributed storage. Proc IEEE, 2011, 99: 476-489 CrossRef Google Scholar

[16] Rashmi K V, Shah N B, Kumar P V. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans Inf Theory, 2011, 57: 5227-5239 CrossRef Google Scholar

[17] Li S Y R, Yeung R W, Ning Cai R W. Linear network coding. IEEE Trans Inf Theory, 2003, 49: 371-381 CrossRef Google Scholar

[18] Ahlswede R, Cai N, Li R S. Network information flow. IEEE Trans Inf Theory, 2000, 46: 1204-1216 CrossRef Google Scholar

[19] Smith D K, Ahuja R K, Magnanti T L. Network flows: theory, algorithms, and applications. J Oper Res Soc, 1994, 45: 1340 CrossRef Google Scholar

[20] Goparaju S, Fazeli A, Vardy A. Minimum storage regenerating codes for all parameters. IEEE Trans Inf Theory, 2017, 63: 6318-6328 CrossRef Google Scholar

• Figure 1

System model for CSN-DSS.

• Figure 2

The IFGs of DSS with one cluster and separate nodes. (a) One node in cluster 1 and one separate node have failed; (b) two nodes in cluster 1 have failed.

• Figure 5

(Color online) The numbered nodes are selected nodes. There are two cluster orders $\boldsymbol{\pi}^*$ and $\boldsymbol{\pi}$ for a selected node distribution ${\boldsymbol~s}=(0,4,3,1)$. (a) $\boldsymbol{\pi}^*={(1,2,3,1,2,1,2,1)}$; (b) $\boldsymbol{\pi}={(1,2,1,2,1,2,1,3)}$.

•

Algorithm 1 Vertical-order algorithm

Require:${\boldsymbol~s}=(s_0,s_1,\ldots,s_L).$ Initial cluster label $j~\leftarrow~1$.

Output:$\boldsymbol{\pi}^*=(\pi_1^*,\ldots,\pi_k^*).$

for $i=1$ to $k$

if textthe $i$-th selected node is a separate node then

$\pi^*_i\gets~0;$ continue;

end if

if $s_j=0$ then

$j\gets~1;$

else

$\pi_i^*\gets~j$; $s_j\gets~s_j-1$; $j\gets~(j~{\rm~mod}~L)+1$;

end if

end for

• 2

Citations

• Altmetric

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有