SCIENCE CHINA Information Sciences, Volume 63 , Issue 2 : 122301(2020) https://doi.org/10.1007/s11432-019-9937-7

## Storage and repair bandwidth tradeoff for heterogeneous cluster distributed storage systems

Yuan LUO 1,*,
• AcceptedJul 3, 2019
• PublishedJan 15, 2020
Share
Rating

### Abstract

The storage and repair bandwidth tradeoff is an important issue in distributed storage systems (DSSs) where large scale data are stored in multiple nodes with erasure coding to ensure reliability. There are lots of studies on the DSS model with multiple clusters where the repair bandwidths from intra-cluster and cross-cluster nodes are differentiated to improve repair efficiency based on the realistic network topological structures. At the same time, separate nodes are also prevalent due to the variety of practical networks, but the work on the cluster DSS model with multiple separate nodes is insufficient, which is a main motivation of this paper. We formulate the tradeoff bound between storage repair bandwidth for a heterogeneous DSS model consisting of clusters and separate nodes by analyzing the min-cuts of heterogeneous information flow graphs corresponding to the orders of failed nodes. Additionally, the tradeoff bounds are investigated in multiple aspects when the repair bandwidth constraints and the amount of separate nodes vary, respectively. Moreover, a class of regenerating codes are illustrated to achieve the tradeoff in the heterogeneous cases.

### Acknowledgment

This work was partially supported by National Natural Science Foundation of China (Grant No. 61571293), China Program of International S$\&$T Cooperation (Grant No. 2016YFE0100300), and SJTU-CUHK Joint Research Collaboration Fund 2018.

### References

[1] Hu Y C, Li X L, Zhang M. Optimal Repair Layering for Erasure-Coded Data Centers. ACM Trans Storage, 2017, 13: 1-24 CrossRef Google Scholar

[2] Huang C, Chen M H, Li J. Pyramid codes: flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Transactions on Storage, 2013, 9(1): 79--86 DOI: 10.1109/NCA.2007.37. Google Scholar

[3] Zhang H Y, Li H, Li S Y R. Repair Tree: Fast Repair for Single Failure in Erasure-Coded Distributed Storage Systems. IEEE Trans Parallel Distrib Syst, 2017, 28: 1728-1739 CrossRef Google Scholar

[4] Balaji S B, Krishnan M N, Vajha M. Erasure coding for distributed storage: an overview. Sci China Inf Sci, 2018, 61: 100301 CrossRef Google Scholar

[5] Hou H X, Han Y S. A class of binary MDS array codes with asymptotically weak-optimal repair. Sci China Inf Sci, 2018, 61: 100302 CrossRef Google Scholar

[6] Dimakis A G, Godfrey P B, Wu Y N. Network Coding for Distributed Storage Systems. IEEE Trans Inform Theor, 2010, 56: 4539-4551 CrossRef Google Scholar

[7] Li S Y R, Yeung R W, Cai N. Linear network coding. IEEE Trans Inform Theor, 2003, 49: 371-381 CrossRef Google Scholar

[8] Yang B, Tang X, Li J. A Systematic Piggybacking Design for Minimum Storage Regenerating Codes. IEEE Trans Inform Theor, 2015, 61: 5779-5786 CrossRef Google Scholar

[9] Goparaju S, Fazeli A, Vardy A. Minimum Storage Regenerating Codes for All Parameters. IEEE Trans Inform Theor, 2017, 63: 6318-6328 CrossRef Google Scholar

[10] Rashmi K V, Shah N B, Kumar P V. Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction. IEEE Trans Inform Theor, 2011, 57: 5227-5239 CrossRef Google Scholar

[11] Rashmi K V, Shah N B, Ramchandran K. A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes. IEEE Trans Inform Theor, 2017, : 1-1 CrossRef Google Scholar

[12] Tang X, Yang B, Li J. A New Repair Strategy for the Hadamard Minimum Storage Regenerating Codes for Distributed Storage Systems. IEEE Trans Inform Theor, 2015, 61: 5271-5279 CrossRef Google Scholar

[13] Ford D, Labelle F, Popovici F I, et al. Availability in globally distributed storage systems. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, Vancouver, 2010. 61--74. Google Scholar

[14] Kubiatowicz J, Bindel D, Chen Y. OceanStore. SIGPLAN Not, 2000, 35: 190-201 CrossRef Google Scholar

[15] Ahmad F, Chakradhar S T, Raghunathan A, et al. Shufflewatcher: Shuffle-aware scheduling in multi-tenant mapreduce clusters. In: Proceedings of 2014 USENIX Annual Technical Conference (USENIX ATC 14), Philadelphia, 2014. 1--13. Google Scholar

[16] Prakash N, Abdrashitov V, Medard M. The Storage Versus Repair-Bandwidth Trade-off for Clustered Storage Systems. IEEE Trans Inform Theor, 2018, 64: 5783-5805 CrossRef Google Scholar

[17] Sohn J Y, Choi B, Yoon S W. Capacity of Clustered Distributed Storage. IEEE Trans Inform Theor, 2019, 65: 81-107 CrossRef Google Scholar

[18] Hou H X, Lee P P C, Shum K W. Rack-Aware Regenerating Codes for Data Centers. IEEE Trans Inform Theor, 2019, 65: 4730-4745 CrossRef Google Scholar

[19] Hu Y C, Lee P P C, Zhang X. Double regenerating codes for hierarchical data centers. In: Proceedings of 2016 IEEE International Symposium on Information Theory (ISIT), 2016. 245--249. Google Scholar

[20] Abdrashitov V, Prakash N, and Médard M. The storage vs repair bandwidth trade-off for multiple failures in clustered storage networks. In: Proceedings of 2017 IEEE Information Theory Workshop (ITW), 2017. 46--50. Google Scholar

[21] Akhlaghi S, Kiani A, Ghanavati M R. Cost-bandwidth tradeoff in distributed storage systems. Comput Commun, 2010, 33: 2105-2115 CrossRef Google Scholar

[22] Wang J Z, Wang T H, Luo Y. Storage and repair bandwidth tradeoff for distributed storage systems with clusters and separate nodes. Sci China Inf Sci, 2018, 61: 100303 CrossRef Google Scholar

[23] Pernas J, Yuen C, Gastón B, et al. Non-homogeneous two-rack model for distributed storage systems. In: Proceedings of 2013 IEEE International Symposium on Information Theory (ISIT), 2013. 1237--1241. Google Scholar

[24] Sohn J Y, Choi B, Yoon S W, et al. Capacity of clustered distributed storage. In: Proceedings of 2017 IEEE International Conference on Communications (ICC), Paris, 2017. Google Scholar

[25] Ernvall T, El Rouayheb S, Hollanti C. Capacity and Security of Heterogeneous Distributed Storage Systems. IEEE J Sel Areas Commun, 2013, 31: 2701-2709 CrossRef Google Scholar

[26] Golrezaei N, Dimakis A G, Molisch A F. Wireless device-to-device communications with distributed caching. In: Proceedings of IEEE International Symposium on Information Theory, Cambridge, 2012. Google Scholar

[27] Yu Q, Shum K W, Sung C W. Tradeoff between storage cost and repair cost in heterogeneous distributed storage systems. Trans Emerging Tel Tech, 2015, 26: 1201-1211 CrossRef Google Scholar

[28] Wang J Z, Wang T H, Luo Y, et al. Capacity of Distributed Storage Systems with Clusters and Separate Nodes. 2019,. arXiv Google Scholar

[29] Dimakis A G, Ramchandran K, Wu Y N. A Survey on Network Codes for Distributed Storage. Proc IEEE, 2011, 99: 476-489 CrossRef Google Scholar

[30] Ahlswede R, Cai N, Li S Y R. Network information flow. IEEE Trans Inform Theor, 2000, 46: 1204-1216 CrossRef Google Scholar

[31] Ahuja R K, Magnanti T L, Orlin J B. Network flows: theory, algorithms, and applications. Upper Saddle River: Prentice Hall, 1993. 77--78. Google Scholar

[32] Zorgui M, Wang Z Y. Code constructions for multi-node exact repair in distributed storage. Sci China Inf Sci, 2018, 61: 100304 CrossRef Google Scholar

[33] Wu Y N, and Dimakis A G. Reducing repair traffic for erasure coding-based storage via interference alignment. In: Proceedings of 2009 IEEE International Symposium on Information Theory (ISIT), 2009. 2276--2280. Google Scholar

• Figure 1

The HC-DSS model.

• Figure 2

(Color online) An information flow graph.

• Figure 3

(Color online) The part-cut values of cutting $\{x^{t_i}\}_{i=1}^k$ in the topological ordering.

• Figure 4

(Color online) As the numbered nodes show, $\boldsymbol{\pi}$ and $\boldsymbol{\pi}&apos;$ are generated by Algorithm 1 for the same ${\boldsymbol~s}=(2,4,2,0)$ and different separate location vectors. (a) $\boldsymbol{\pi}=(1,2,0,1,2,1,0,1)$; (b) $\boldsymbol{\pi}&apos;=(1,2,1,0,2,1,0,1)$.

• Figure 5

The cluster nodes of ${\boldsymbol~s}=(2,3,3,0)$ and $\boldsymbol{\pi}=(1,2,1,0,2,1,0,2)$ shown in (a) are represented with $\overline{{\boldsymbol~s}}=(0,3,3,0)$ and $\overline{\boldsymbol{\pi}}=(1,2,1,2,1,2)$ respectively in (b).

• Figure 6

In (a), there are two selected separate node and the cluster order is $\boldsymbol{\pi}^*(2)=(1,2,1,2,1,1,0,0)$. In (b), a selected separate node is added, and the cluster changes to $\boldsymbol{\pi}^*(3)=(1,2,1,1,1,0,0,0)$

• Figure 7

(Color online) Optimal tradeoff bounds between node storage $\alpha$ and repair bandwidth parameter $\beta_C$ for the $(n=14,k=8,L=3,R=4,E)$ HC-DSS model with the original file size $\mathcal{M}=32$. (a) shows the tradeoffs of various $d_C$ when the amount of separate nodes $E=2$ and the bandwidth parameter constraint $\tau=\beta_I/\beta_C=2$ are fixed. (b) illustrates the tradeoffs of various $E$ when $d_C=$ and $\tau=2$ are fixed. (c) compares the tradeoffs of various $\tau=\beta_I/\beta_C$ when $d_C=6$ and $E=2$ are fixed.

• Figure 8

(Color online) In the $(n=6,k=4,L=2,R=2,E=2)$ HC-DSS model with $\tau=\beta_I/\beta_C=2$, $d_I=1$, $d_C=4$ and $\mathcal{M}=8$, (a) illustrates the optimal tradeoff bound between $\alpha$ and $\beta_C$, (b) shows the data assignment in the MSR code construction example.

•

Algorithm 1 Heterogeneous vertical order algorithm

Require:Selected node distribution ${\boldsymbol~s}=(s_0,s_1,\ldots,s_L)$; separate location vector ${\boldsymbol~v}=(v_1,v_2,\ldots,v_k)$.

Output:Cluster order $\boldsymbol{\pi}^*=(\pi_1^*,\ldots,\pi_k^*)$.

Initial cluster label $j~\leftarrow~1$ and count variable $t\gets~1$;

for $i=1$ to $k$

if $i=v_t$ then

$\pi^*_i\gets~0$; $t\gets~t+1$; continue;

end if

if $s_j=0$ then

$j\gets~1;$

else

$\pi_i^*\gets~j$; $s_j\gets~s_j-1$; $j\gets~(j\mod~L)+1$;

end if

end for

•

Algorithm 2 Heterogeneous horizontal selection algorithm

Require:Node parameters $(n,k,L,R,E)$; the amount of selected separate nodes $N_{\rm~sep}$ ($1\leq~N_{\rm~sep}\leq~E$)

Output:Selected node distribution ${\boldsymbol~s}^*=(s_0^*,~s_1^*,\ldots,s_L^*)$.

$s_0^*~\gets~N_{\rm~sep}$;

for $i=1$ to $L$

if $i~\leq~\left\lfloor~\frac{k-s_0^*}{R}\right\rfloor$ then

$s_i^*~\gets~R$;

end if

if $i~=~\left\lfloor~\frac{k-s_0^*}{R}\right\rfloor+1$ then

$s_i^*~\gets~k-s_0^*-\left\lfloor~\frac{k-s_0^*}{R}\right\rfloor~R$;

end if

if $i~>~\left\lfloor~\frac{k-s_0^*}{R}\right\rfloor+1$ then

$s_i^*~\gets~0$;

end if

end for

Citations

• #### 0

Altmetric

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有