The routing algorithm of a Network-on-Chip (NoC) is essential to its performance and power consumption. This paper presents a novel shortest path routing algorithm for TriBA-Net. First, the algorithm designs a coding scheme based on the topological features of TriBA-Net. The set of words 1, 3, and 2, used in the coding scheme, has the same meaning as the well-known group ${S_3}$ on 3-letters. Second, a communication model, which contains 6 types of flow modes, has been proposed for reflecting the status of the path within two hops. Finally, the algorithm is simplified by the cyclic permutation characteristic of the ${S_3}$ group. Whats more, the implementation of the SPR4T router is completed under the XC6VLX550TL chip. Experimental results show that under the uniform traffic pattern in the 27-node TriBA-Net performance test, SPR4T routing algorithm has a 7.5% higher saturation injection rate and a 7.7% higher throughput rate, with the obvious savings of hardware overhead and lower power consumption when compared to the SPORT routing algorithm.
国家自然科学基金(61300011)
国家自然科学基金(61300010)
[1] Borkar S, Chien A A. The future of microprocessors. Commun ACM, 2011, 54: 67--77. Google Scholar
[2] Wulf W A, McKee S A. Hitting the memory wall: implications of the obvious. Comput Archit New, 1995, 23: 20--24. Google Scholar
[3] Das R, Mutlu O, Moscibroda T, et al. Aergia: a network-on-chip exploiting packet latency slack. IEEE Micro, 2011, 31: 29--41. Google Scholar
[4] Bhattacharjee A, Contreras G, Martonosi M. Parallelization libraries: characterizing and reducing overheads. ACM Trans Archit Code Opt, 2011, 8: 1--29. Google Scholar
[5] Shi F, Ji W X, Qiao B J, et al. A triplet-based computer architecture supporting parallel object computing. In: Proceedings of IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), Montreal, 2007. 192--197. Google Scholar
[6] Vecchia G D, Sanges C. A recursively scalable network VLSI implementation. Future Gener Comput Syst, 1988, 4: 235--243. Google Scholar
[7] Hu S S, Shi F, Ji W X, et al. Exploring grouped coherence for clustered hierarchical cache. J Supercomput, 2017, 73: 4137--4157. Google Scholar
[8] Sterling T L, Zima H P. Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing. In: Proceedings of ACM/IEEE Conference on Supercomputing, Baltimore, 2002. 1--23. Google Scholar
[9] Mubeen S, Kumar S. Designing efficient source routing for mesh topology network on chip platforms. In: Proceedings of the 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, Lille, 2010. 181--188. Google Scholar
[10] Xiang D, Luo W. An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans Parall Distrib Syst, 2012, 23: 800--808. Google Scholar
[11] Xiang D, Zhang Y, Shan S, et al. A fault-tolerant routing algorithm design for on-chip optical networks. In: Proceedings of the 32nd International Symposium on Reliable Distributed Systems, Braga, 2013. 1--9. Google Scholar
[12] Hu J, Marculescu R. Energy-aware mapping for tile-based NoC architectures under performance constraints. In: Proceedings of Asia and South Pacific Design Automation Conference, Kitakyushu, 2003. 233--239. Google Scholar
[13] Marvasti M B, Daneshtalab M, Afzali-Kusha A, et al. PAMPR: power-aware and minimum path routing algorithm for NoCs. In: Proceedings of the 15th IEEE International Conference on Electronics, Circuits and Systems, Kitakyushu, 2008. 418--421. Google Scholar
[14] Hu J, Marculescu R. Exploiting the routing flexibility for energy/performance-aware mapping of regular NoC architectures. In: Proceedings of Europe Conference and Exhibition on Design, Automation and Test, Munich, 2003. 688--693. Google Scholar
[15] Qiao B j, Shi F, Ji W X. A new hierarchical interconnection network for multi-core processor. In: Proceedings of the 2nd IEEE Conference on Industrial Electronics and Applications, Harbin, 2007. 246--250. Google Scholar
[16] Wang Z, Shi F. A shortest path routing algorithm in triplet-based network. Trans Beijing Inst Technol, 2009, 29: 410--414. Google Scholar
[17] Zhang Y, Shi F. Design and evaluation of low-latency and shortest-path routing algorithm for triplet-based hierarchical interconnection network. J Test Eval, 2013, 41: 541--550. Google Scholar
[18] Catania V, Mineo A, Monteleone S, et al. Cycle-accurate network on chip simulation with noxim. ACM Trans Model Comput Simul, 2016, 27: 1--25. Google Scholar
[19] Weerasinghe H D, Tackett R, Fu H R. Verifying position and velocity for vehicular ad-hoc networks. Secur Commun Netw, 2011, 4: 785--791. Google Scholar
[20] Wu L F, Meng Q H, Liang H W, et al. Accurate localization in combination with wireless sensor networks and laser localization. In: Proceedings of IEEE International Conference on Automation and Logistics, Shenyang, 2009. 146--151. Google Scholar
[21]
Synopsys Inc. Data sheet: primePower full-chip dynamic power analysis for multimillion-gate design. 2004.
Figure 1
(Color online) Plan of ${\rm{}}{{\rm~TG}^L}$
Figure 2
(Color online) Diagram of using ${s_3}=~({123})$ to transform ${\rm~TG}^2$
Figure 3
(Color online) The shortest path routing approach in ${\rm~FM}_0$ mode
Figure 4
(Color online) Distance evaluation from vertex to tip
Figure 5
(Color online) Flow modes of communication data in TriBA-Net
Figure 6
(Color online) Port naming of nodes in TriBA-Net
Figure 7
(Color online) Optimized route comparison circuit
Figure 8
(Color online) Curve of average latency versus injection rate. (a) Uniform; (b) Bitreversal; (c) Shuffle
Figure 9
(Color online) Curve of throughput versus injection rate. (a) Uniform; (b) Bitreversal; (c) Shuffle
Current mode (uvw) | 000 | 001 | 010 | 011 | 100 | 101 |
Current code (ab) | Transformed code ($x′′x′$) | |||||
01 | 01 | 10 | 01 | 10 | 11 | 11 |
10 | 10 | 01 | 11 | 11 | 10 | 01 |
11 | 11 | 11 | 10 | 01 | 01 | 10 |
ending point identifier; $l$,size of ${\rm~FRG}^l$; |
LenPA $ \leftarrow s′_{l - 1} \cdots s′_1 + 1 + t′′_{l - 1} \cdots t′′_1$; |
LenPB $ \leftarrow ( {\bar s′′_{l - 1} \cdots \bar s′′_1 + \bar s′_{l - 1} \cdots \bar s′_1} ) + ( {\bar t′′_{l - 1} \cdots \bar t′′_1 + \bar t′_{l - 1} \cdots \bar t′_1} ) + {2^{l - 1}} + 1$; |
|
|
ending point identifier; $l$,size of TriBA-Net; |
|
|
|
$~{\rm~break}$; |
|
|
$i \leftarrow {\rm FM}( {s′′_ls′_l \cdots s′′_1s′_1, t′′_lt′_l \cdots t′′_1t′_1} )$; |
${\bar u_{\rm EQ}} \leftarrow {{\rm Convert}_i}( {s′′_ls′_l \cdots s′′_1s′_1} )$; |
${\bar v_{\rm EQ}} \leftarrow {{\rm Convert}_i}( {t′′_lt′_l \cdots t′′_1t′_1} )$; |
|
Configuration | Topology | Network size | Switching | Flit size | Buffer | Packet size | Virtual channel |
Parameter | TriBA-Net | 27 nodes | Wormhole | 32 bits | 4 flits | 4 flits | 4 |
Slices | LUTs | Flips-Flops | |
SPORT router | 139 | 269 | 114 |
SPR4T router | 118 | 229 | 103 |
Uniform (nW) | Bitreversal (nW) | |
SPORT router | 32.28 | 30.42 |
SPR4T router | 29.54 | 27.78 |