logo

SCIENCE CHINA Information Sciences, Volume 60, Issue 6: 062402(2017) https://doi.org/10.1007/s11432-016-0306-y

HyBar: high efficient barrier synchronization based on a hybrid packet-circuit switching Network-on-Chip

More info
  • ReceivedAug 11, 2016
  • AcceptedNov 6, 2016
  • PublishedFeb 9, 2017

Abstract

Realizing barrier synchronization in multi-/many-core processors with high efficiency becomes more and more challenging as the number of cores integrated in a single chip keeps growing. Quite a few barrier solutions have been proposed, while they provide limited improvements for synchronizing large amounts of cores or incur unfavorable restrictions on performing concurrent barriers. This paper presents HyBar, a hardware barrier based on a hybrid switching NoC which adopts packet switching and circuit switching methods in two sub-networks respectively. Dedicated channels in the circuit-switching sub-network are dynamically built and removed when barrier requests traverse the packet-switching sub-network according to a modified dimension-order routing algorithm. The efficiency of inter-core communication for concurrent barriers is improved by merging barrier arrival requests and broadcasting release requests along the circuit channels. The execution time of synthetic cases, benchmark kernels and parallel applications using various barrier solutions are evaluated in an RTL-based simulation platform. Experimental results show that our proposal provides about 15\%--50\% performance improvement compared to previous solutions, while the hardware overhead is marginal under SMIC 40 nm technology. Moreover, HyBar introduces a minor efficiency loss for concurrent barriers with no limitation on their layouts of participating cores in the on-chip network.


Acknowledgment

Acknowledgments

This work was partially supported by Equipment Pre-Research Foundation of China (Grant No. 9140A08010414JW03025).


References

[1] Wilkinson B, Allen M. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Upper Saddle River: Prentice Hall, 2004. Google Scholar

[2] Sartori J, Kumar R. Low-overhead, high-speed multi-core barrier synchronization. In: Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC'10), Pisa, 2010. 18--34. Google Scholar

[3] Shen X B. Evolution of MPP SoC architecture techniques. Sci China Ser F-Inf Sci, 2008, 51: 756-764 CrossRef Google Scholar

[4] Villa O, Palermo G, Silvano C. Efficiency and scalability of barrier synchronization on NoC based many-core architectures. In: Proceedings of International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'08), New York, 2008. 81--90. Google Scholar

[5] Monchiero M, Palermo G, Silvano C, et al. Efficient synchronization for embedded on-chip multiprocessors. IEEE Trans Very Large Scale Integration Syst, 2006, 14: 1049-1062 CrossRef Google Scholar

[6] Xiao H, Wu N, Ge F, et al. Efficient synchronization for distributed embedded multiprocessors. IEEE Trans Very Large Scale Integration Syst, 2016, 24: 779-783 CrossRef Google Scholar

[7] Wei Z Q, Liu P L, Sun R D, et al. TAB barrier: hybrid barrier synchronization for NoC-based processors. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS'15), Lisbon, 2015. 409--412. Google Scholar

[8] Chen X, Lu Z, Jantsch A, et al. Cooperative communication based barrier synchronization in on-chip mesh architectures. IEICE Electron Expr, 2011, 8: 1856-1862 CrossRef Google Scholar

[9] Chen X W, Lu Z, Jantsch A, et al. Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs. IEICE Electron Expr, 2014, 11: 20140542-1862 CrossRef Google Scholar

[10] Abellan J L, Fernandez J, Acacio M E, et al. Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs. In: Proceedings of Design, Automation Test in Europe Conference Exhibition (DATE'12), Dresden, 2012. 491--496. Google Scholar

[11] Oh J, Prvulovic M, Zajic A. TLSync: support for multiple fast barriers using on-chip transmission lines. In: Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11), San Jose, 2011. 105--115. Google Scholar

[12] Kumar A, Peh L S, Kundu P, et al. Express virtual channels: towards the ideal interconnection fabric. In: Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA'07), San Diego, 2007. 150--161. Google Scholar

[13] Krishna T, Peh L S. Single-cycle collective communication over a shared network fabric. In: Proceedings of the 8th IEEE/ACM International Symposium on Networks-on-Chip (NoCS'14), Ferrara, 2014. 1--8. Google Scholar

[14] Daneshtalab M, Ebrahimi M, Mohammadi S, et al. Low-distance path-based multicast routing algorithm for network-on-chips. IET Comput Digit Tech, 2009, 3: 430-442 CrossRef Google Scholar

[15] Modarressi M, Sarbazi-Azad H, Arjomand M. A hybrid packet-circuit switched on-chip network based on SDM. In: Proceedings of Conference on Design, Automation and Test in Europe (DATE'09), Nice, 2009. 566--569. Google Scholar

[16] Lin J, Zhou W, Yu Z, et al. A hybrid router combining circuit switching and packet switching with virtual channels for on-chip networks. In: Proceedings of the 10th IEEE International Conference on ASIC (ASICON'13), Shenzhen, 2013. 1--4. Google Scholar

[17] Abousamra A K, Melhem R G, Jones A K. Déjà Vu switching for multiplane NoCs. In: Proeedings of the 6th IEEE/ACM International Symposium on Networks on Chip (NoCS'12), Copenhagen, 2012. 11--18. Google Scholar

[18] Ou P, Zhang J, Quan H, et al. A 65nm 39 GOPS/W 24-core processor with 11 Tb/s/W packet-controlled circuit-switched doublelayer network-on-chip and heterogeneous execution array. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC'13), San Francisco, 2013. 56--57. Google Scholar

[19] Jerger N D E, Peh L S, Lipasti M H. Circuit-switched coherence. In: Proceedings of the 2nd IEEE/ACM International Symposium on Networks-on-Chip (NoCS'08), Newcastle upon Tyne, 2008. 193--202. Google Scholar

[20] Chen G, Anders M A, Kaul H, et al. A 340 mV-to-0.9V 20.2 Tb/s source-synchronous hybrid packet/circuit-switched 16$\times$16 network-on-chip in 22nm tri-gate CMOS. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC'14), San Francisco, 2014. 276--277. Google Scholar

[21] Glass C J, Ni L M. The turn model for adaptive routing. In: Proceedings of the 19th Annual International Symposium on Computer Architecture (ISCA'92). New York: ACM, 1992. 278--287. Google Scholar

[22] Becker D U. Efficient microarchitecture for network-on-chip routers. Dissertation for Ph.D. Degree. Palo Alto: Stanford University, 2012. Google Scholar

[23] McMahon F H. Livermore Fortran Kernels: a Computer Test of Numerical Performance Range. Technical Report UCRL-53745. 1986. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1