logo

SCIENCE CHINA Information Sciences, Volume 60, Issue 9: 092102(2017) https://doi.org/10.1007/s11432-015-0328-y

Stealth-ACK: stealth transmissions of NoC acknowledgements

More info
  • ReceivedFeb 29, 2016
  • AcceptedApr 7, 2016
  • PublishedJan 23, 2017

Abstract

Network-on-Chip (NoC) is a promising replacement of bus architecture due to its better scalability. In state-of-the-art NoCs, each packet contains several fixed-length flits, which facilitates allocations of network resources but brings in many unused bits. In this paper, we propose a novel technique called Stealth-ACK to effectively address the above problem. Stealth-ACK leverages unused bits in head flits of non-ACK packets to carry and stealthily transmit ACK information. Such stealth transmissions of ACK information effectively reduce not only the amount of dedicated ACK packets on NoC, but also the number of unused bits in head flits of non-ACK packets, which significantly reduces wastes on NoC bandwidth. Experimental results show that Stealth-ACK averagely increases the throughput of $16\times16$ 2-D mesh NoC by 11.9\%, and averagely reduces the NoC latency by 34.8\% on application traces of SPLASH-2. Moreover, Stealth-ACK only requires trivial hardware modification to basic router architectures, which incurs negligible power consumption and area cost.


References

[1] Vangal S, Howard J, Ruhl G, et al. An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. In: {Proceedings of International Solid-State Circuits Conference}, San Francisco, 2007. Google Scholar

[2] Wentzlaff D, Griffin P, Hoffmann H, et al. On-chip interconnection architecture of the tile processor. In: {Proceedings of International Symposium on Microarchitecture}, Chicago, Illinois, USA, 2007, 27: 15--31. Google Scholar

[3] Dally W, Towles B. Principles and Practices of Interconnection Networks. San Francisco: Morgan Kaufmann Publishers Inc., 2003. Google Scholar

[4] Benini L, De Micheli G. Networks on chip: a new paradigm for systems on chip design. In: {Proceedings of Design, Automation and Test in Europe Conference and Exhibition}, Paris, 2002. 418--419. Google Scholar

[5] Dally W, Towles B. Route packets, not wires: on-chip interconnection networks. In: {Proceedings of Design Automation Conference}, Las Vegas, 2001. 684--689. Google Scholar

[6] Gratz P, Kim C, McDonald R, et al. Implementation and evaluation of on-chip network architectures. In: {Proceedings of International Conference on Computer Design}, San Jose, 2006. 477--484. Google Scholar

[7] Landin A, Hagersten E, Haridi S. Race-free interconnection networks and multiprocessor consistency. In: {Proceedings of International Symposium on Computer Architecture}, Toronto, 1991. 106--115. Google Scholar

[8] Sanchez D, Michelogiannakis G, Kozyrakis C. An analysis of on-chip interconnection networks for large-scale chip multiprocessors. ACM Trans Architect Code Optim, 2010, 7: 4 Google Scholar

[9] Bakhoda A, Kim J, Aamodt T. Throughput-effective on-chip networks for manycore accelerators. In: {Proceedings of International Symposium on Microarchitecture}, Atlanta, 2010. 421--432. Google Scholar

[10] Kim G, Kim J, Yoo S. FlexiBuffer: reducing leakage power in on-chip network routers. In: {Proceedings of Design Automation Conference}, Pacifico Yokohama, 2011. 936--941. Google Scholar

[11] Kim H, Kim G, Kim J. Scalable on-chip network in power constrained manycore processors. In: {Proceedings of International Green Computing Conference}, San Jose, 2012. 1--2. Google Scholar

[12] Kim H, Ghoshal P, Grot B, et al. Reducing network-on-chip energy consumption through spatial locality speculation. In: {Proceedings of International Symposium on Networks-on-Chip}, Pittsburgh, 2011. 233--240. Google Scholar

[13] Kim J. Low-cost router microarchitecture for on-chip networks. In: {Proceedings of International Symposium on Microarchitecture}, New York City, 2009. 255--266. Google Scholar

[14] Owens J, Dally W, Ho R, et al. Research challenges for on-chip interconnection networks. In: {Proceedings of International Symposium on Microarchitecture}, Chicago, 2007. 27: 96--108. Google Scholar

[15] Enright Jerger N D, Peh L S. {On-Chip Networks}. 1st ed. San Francisco: Morgan and Claypool Publishers, 2009. Google Scholar

[16] Gratz P, Grot B, Keckler S. Regional congestion awareness for load balance in networks-on-chip. In: {Proceedings of International Symposium on High Performance Computer Architecture}, Salt Lake City, 2008. 203--214. Google Scholar

[17] Ma S, Enright Jerger N B, Wang Z Y. DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip. In: {Proceedings of International Symposium on Computer Architecture}, San Jose, 2011. 413--424. Google Scholar

[18] Woo S, Ohara M, Torrie E, et al. The SPLASH-2 programs: characterization and methodological considerations. In: {Proceedings of International Symposium on Computer Architecture}, Santa Margherita Ligure, 1995. 24--36. Google Scholar

[19] Peh L S, Dally W. A delay model and speculative architecture for pipelined routers. In: {Proceedings of International Symposium on High-Performance Computer Architecture}, Nuevo Leone, 2001. 255--266. Google Scholar

[20] Galles M. Spider: a high-speed network interconnect. In: {Proceedings of International Symposium on Microarchitecture}, Research Triangle Park, 1997. 34--39. Google Scholar

[21] McKeown N. Whole packet forwarding: efficient design of fully adaptive routing algorithms for networks-on-chip. In: {Proceedings of International Symposium on High Performance Computer Architecture}, New Orleans, 2012. 1--12. Google Scholar

[22] McKeown N. The islip scheduling algorithm for input-queued switches. IEEE/ACM Trans Netw, 1999, 7: 188-201 CrossRef Google Scholar

[23] Kumar A, Kundu P, Singhx A, et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In: {Proceedings of International Conference on Computer Design}, Lake Tahoe, 2007. 63--70. Google Scholar

[24] Intel Corporation. A touchstone delta system description. 1991. Google Scholar

[25] Miller J, Kasture H, Kurian G, et al. Graphite: a distributed parallel simulator for multicores. In: {Proceedings of International Symposium on High Performance Computer Architecture}, Bangalore, 2010. 1--12. Google Scholar

[26] Kim C, Burger D, Keckler S. Nonuniform cache architectures for wire-delay dominated on-chip caches. In: {Proceedings of International Symposium on Microarchitecture}, San Diego, 2003. 99--107. Google Scholar

[27] Kahng A, Li B, Peh L S, et al. ORION 2. 0: a power-area simulator for interconnection networks. IEEE Trans Very Large Scale Integr Syst, 2012, 20: 191-196 CrossRef Google Scholar

[28] Li M, Zeng Q A, Jone W B. DyXY---a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In: {Proceedings of Design Automation Conference}, San Francisco, 2006. 849--852. Google Scholar

[29] Singh A, Dally W, Gupta A, et al. GOAL: a load-balanced adaptive routing algorithm for torus networks. In: {Proceedings of International Symposium on Computer Architecture}, San Diego, 2003. 194--295. Google Scholar

[30] Jiang N, Kim J, Dally W J. Indirect adaptive routing on large scale interconnection networks. In: {Proceedings of International Symposium on Computer Architecture}, Austin, 2009. 220--231. Google Scholar

[31] Das R, Mutlu O, Moscibroda T, et al. A{é}rgia: exploiting packet latency slack in on-chip networks. In: {Proceedings of International Symposium on Computer Architecture}, Saint-Malo, 2010. Google Scholar

[32] Lee J, Shin M, Kim H, et al. Exploiting mutual awareness between prefetchers and on-chip networks in multi-cores. In: {Proceedings of Parallel Architectures and Compilation Techniques}, Galveston, 2011. 177--178. Google Scholar

[33] Dally W, Aoki H. Deadlock-free adaptive routing in multicomputer networks using virtual channels. IEEE Trans Parallel Distr Syst, 1993, 4: 466-475 CrossRef Google Scholar

[34] Duato J. A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distr Syst, 1993, 4: 1320-1331 CrossRef Google Scholar

[35] Duato J. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distr Syst, 1995, 6: 1055-1067 CrossRef Google Scholar

[36] Duato J. A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks. IEEE Trans Parallel Distr Syst, 1996, 7: 841-854 CrossRef Google Scholar

[37] Krishna T, Peh L S, Beckmann B M, et al. Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication. In: {Proceedings of International Symposium on Microarchitecture}, Porto Alegre, 2011. 71--82. Google Scholar

[38] Badr H, Podar S. An optimal shortest-path routing policy for network computers with regular mesh-connected topologies. IEEE Trans Comput, 1989, 38: 1362-1371 CrossRef Google Scholar

[39] Ted Nesson S L J. ROMM routing on mesh and torus networks. In: {Proceedings of International Symposium on Parallelism in Algorithms and Architectures}, Santa Barbara, 1995. Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1