SCIENTIA SINICA Informationis, Volume 48, Issue 1: 100-114(2018) https://doi.org/10.1360/N112017-00065

A shortest path routing mechanism based on ${\boldsymbol~S_3}$ for TriBA-Net

More info
  • ReceivedMar 30, 2017
  • AcceptedMay 27, 2017
  • PublishedDec 28, 2017


The routing algorithm of a Network-on-Chip (NoC) is essential to its performance and power consumption. This paper presents a novel shortest path routing algorithm for TriBA-Net. First, the algorithm designs a coding scheme based on the topological features of TriBA-Net. The set of words 1, 3, and 2, used in the coding scheme, has the same meaning as the well-known group ${S_3}$ on 3-letters. Second, a communication model, which contains 6 types of flow modes, has been proposed for reflecting the status of the path within two hops. Finally, the algorithm is simplified by the cyclic permutation characteristic of the ${S_3}$ group. Whats more, the implementation of the SPR4T router is completed under the XC6VLX550TL chip. Experimental results show that under the uniform traffic pattern in the 27-node TriBA-Net performance test, SPR4T routing algorithm has a 7.5% higher saturation injection rate and a 7.7% higher throughput rate, with the obvious savings of hardware overhead and lower power consumption when compared to the SPORT routing algorithm.

Funded by




[1] Borkar S, Chien A A. The future of microprocessors. Commun ACM, 2011, 54: 67--77. Google Scholar

[2] Wulf W A, McKee S A. Hitting the memory wall: implications of the obvious. Comput Archit New, 1995, 23: 20--24. Google Scholar

[3] Das R, Mutlu O, Moscibroda T, et al. Aergia: a network-on-chip exploiting packet latency slack. IEEE Micro, 2011, 31: 29--41. Google Scholar

[4] Bhattacharjee A, Contreras G, Martonosi M. Parallelization libraries: characterizing and reducing overheads. ACM Trans Archit Code Opt, 2011, 8: 1--29. Google Scholar

[5] Shi F, Ji W X, Qiao B J, et al. A triplet-based computer architecture supporting parallel object computing. In: Proceedings of IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), Montreal, 2007. 192--197. Google Scholar

[6] Vecchia G D, Sanges C. A recursively scalable network VLSI implementation. Future Gener Comput Syst, 1988, 4: 235--243. Google Scholar

[7] Hu S S, Shi F, Ji W X, et al. Exploring grouped coherence for clustered hierarchical cache. J Supercomput, 2017, 73: 4137--4157. Google Scholar

[8] Sterling T L, Zima H P. Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing. In: Proceedings of ACM/IEEE Conference on Supercomputing, Baltimore, 2002. 1--23. Google Scholar

[9] Mubeen S, Kumar S. Designing efficient source routing for mesh topology network on chip platforms. In: Proceedings of the 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, Lille, 2010. 181--188. Google Scholar

[10] Xiang D, Luo W. An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans Parall Distrib Syst, 2012, 23: 800--808. Google Scholar

[11] Xiang D, Zhang Y, Shan S, et al. A fault-tolerant routing algorithm design for on-chip optical networks. In: Proceedings of the 32nd International Symposium on Reliable Distributed Systems, Braga, 2013. 1--9. Google Scholar

[12] Hu J, Marculescu R. Energy-aware mapping for tile-based NoC architectures under performance constraints. In: Proceedings of Asia and South Pacific Design Automation Conference, Kitakyushu, 2003. 233--239. Google Scholar

[13] Marvasti M B, Daneshtalab M, Afzali-Kusha A, et al. PAMPR: power-aware and minimum path routing algorithm for NoCs. In: Proceedings of the 15th IEEE International Conference on Electronics, Circuits and Systems, Kitakyushu, 2008. 418--421. Google Scholar

[14] Hu J, Marculescu R. Exploiting the routing flexibility for energy/performance-aware mapping of regular NoC architectures. In: Proceedings of Europe Conference and Exhibition on Design, Automation and Test, Munich, 2003. 688--693. Google Scholar

[15] Qiao B j, Shi F, Ji W X. A new hierarchical interconnection network for multi-core processor. In: Proceedings of the 2nd IEEE Conference on Industrial Electronics and Applications, Harbin, 2007. 246--250. Google Scholar

[16] Wang Z, Shi F. A shortest path routing algorithm in triplet-based network. Trans Beijing Inst Technol, 2009, 29: 410--414. Google Scholar

[17] Zhang Y, Shi F. Design and evaluation of low-latency and shortest-path routing algorithm for triplet-based hierarchical interconnection network. J Test Eval, 2013, 41: 541--550. Google Scholar

[18] Catania V, Mineo A, Monteleone S, et al. Cycle-accurate network on chip simulation with noxim. ACM Trans Model Comput Simul, 2016, 27: 1--25. Google Scholar

[19] Weerasinghe H D, Tackett R, Fu H R. Verifying position and velocity for vehicular ad-hoc networks. Secur Commun Netw, 2011, 4: 785--791. Google Scholar

[20] Wu L F, Meng Q H, Liang H W, et al. Accurate localization in combination with wireless sensor networks and laser localization. In: Proceedings of IEEE International Conference on Automation and Logistics, Shenyang, 2009. 146--151. Google Scholar

[21] Synopsys Inc. Data sheet: primePower full-chip dynamic power analysis for multimillion-gate design. 2004. https://www.synopsys.com/. Google Scholar

  • Figure 1

    (Color online) Plan of ${\rm{}}{{\rm~TG}^L}$

  • Figure 2

    (Color online) Diagram of using ${s_3}=~({123})$ to transform ${\rm~TG}^2$

  • Figure 3

    (Color online) The shortest path routing approach in ${\rm~FM}_0$ mode

  • Figure 4

    (Color online) Distance evaluation from vertex to tip

  • Figure 5

    (Color online) Flow modes of communication data in TriBA-Net

  • Figure 6

    (Color online) Port naming of nodes in TriBA-Net

  • Figure 7

    (Color online) Optimized route comparison circuit

  • Figure 8

    (Color online) Curve of average latency versus injection rate. (a) Uniform; (b) Bitreversal; (c) Shuffle

  • Figure 9

    (Color online) Curve of throughput versus injection rate. (a) Uniform; (b) Bitreversal; (c) Shuffle

  • 1   Table 1Truth table of equivalent transformation of layer codes
    Current mode (uvw) 000 001 010 011 100 101
    Current code (ab) Transformed code ($xx$)
    01 01 10 01 10 11 11
    10 10 01 11 11 10 01
    11 11 11 10 01 01 10

    Algorithm 1 SPR0

    Require:$s_Ls_L~\cdots~s_1s_1$, starting point identifier;


    ending point identifier;


    size of ${\rm~FRG}^l$;

    Output:Direction of forwarding at $s_Ls_L~\cdots~s_1s_1$;

    LenPA $~\leftarrow~s_{l~-~1}~\cdots~s_1~+~1~+~t_{l~-~1}~\cdots~t_1$;

    LenPB $~\leftarrow~(~{\bar~s_{l~-~1}~\cdots~\bar~s_1~+~\bar~s_{l~-~1}~\cdots~\bar~s_1}~)~+~(~{\bar~t_{l~-~1}~\cdots~\bar~t_1~+~\bar~t_{l~-~1}~\cdots~\bar~t_1}~)~+~{2^{l~-~1}}~+~1$;

    if LenPA LenPB then

    return $~2;~~//~{\rm~Port}~2$;


    return $~3;~//~{\rm~Port}~3$;

    end if


    Algorithm 2 SPR4T

    Require:$s_Ls_L~\cdots~s_1s_1$, starting point identifier;


    ending point identifier;


    size of TriBA-Net;

    Output:Direction of forwarding at $s_Ls_L~\cdots~s_1s_1$;

    if $l~=~0$ then

    return $0;~~//~{\rm~Port}~0$;


    for $l~=~L~\to~1$

    if $s_ls_l~\ne~t_lt_l$ then


    end if

    end for




    return $~s_i^{~-~1}(~{{\rm~SPR0}(~{{{\bar~u}_{\rm~EQ}},{{\bar~v}_{\rm~EQ}},l}~)}~)$;

    end if

  • 2   Table 2Configuration of the parameters of the simulator
    Configuration Topology Network size Switching Flit size Buffer Packet size Virtual channel
    Parameter TriBA-Net 27 nodes Wormhole 32 bits 4 flits 4 flits 4
  • 3   Table 3Router hardware overhead list
    Slices LUTs Flips-Flops
    SPORT router 139 269 114
    SPR4T router 118 229 103
  • 4   Table 4Comparison of power consumption between SPR4T and SPORT routers
    Uniform (nW) Bitreversal (nW)
    SPORT router 32.28 30.42
    SPR4T router 29.54 27.78

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有