logo

SCIENTIA SINICA Informationis, Volume 48 , Issue 1 : 100-114(2018) https://doi.org/10.1360/N112017-00065

A shortest path routing mechanism based on ${S_3}$ for TriBA-Net

More info
  • ReceivedMar 30, 2017
  • AcceptedMay 27, 2017
  • PublishedDec 28, 2017

Abstract

The routing algorithm of a Network-on-Chip (NoC) is essential to its performance and power consumption. This paper presents a novel shortest path routing algorithm for TriBA-Net. First, the algorithm designs a coding scheme based on the topological features of TriBA-Net. The set of words 1, 3, and 2, used in the coding scheme, has the same meaning as the well-known group ${S_3}$ on 3-letters. Second, a communication model, which contains 6 types of flow modes, has been proposed for reflecting the status of the path within two hops. Finally, the algorithm is simplified by the cyclic permutation characteristic of the ${S_3}$ group. Whats more, the implementation of the SPR4T router is completed under the XC6VLX550TL chip. Experimental results show that under the uniform traffic pattern in the 27-node TriBA-Net performance test, SPR4T routing algorithm has a 7.5% higher saturation injection rate and a 7.7% higher throughput rate, with the obvious savings of hardware overhead and lower power consumption when compared to the SPORT routing algorithm.


Funded by

国家自然科学基金(61300011)

国家自然科学基金(61300010)


References

[1] Borkar S, Chien A A. The future of microprocessors. Commun ACM, 2011, 54: 67--77. Google Scholar

[2] Wulf W A, McKee S A. Hitting the memory wall: implications of the obvious. Comput Archit New, 1995, 23: 20--24. Google Scholar

[3] Das R, Mutlu O, Moscibroda T, et al. Aergia: a network-on-chip exploiting packet latency slack. IEEE Micro, 2011, 31: 29--41. Google Scholar

[4] Bhattacharjee A, Contreras G, Martonosi M. Parallelization libraries: characterizing and reducing overheads. ACM Trans Archit Code Opt, 2011, 8: 1--29. Google Scholar

[5] Shi F, Ji W X, Qiao B J, et al. A triplet-based computer architecture supporting parallel object computing. In: Proceedings of IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), Montreal, 2007. 192--197. Google Scholar

[6] Vecchia G D, Sanges C. A recursively scalable network VLSI implementation. Future Gener Comput Syst, 1988, 4: 235--243. Google Scholar

[7] Hu S S, Shi F, Ji W X, et al. Exploring grouped coherence for clustered hierarchical cache. J Supercomput, 2017, 73: 4137--4157. Google Scholar

[8] Sterling T L, Zima H P. Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing. In: Proceedings of ACM/IEEE Conference on Supercomputing, Baltimore, 2002. 1--23. Google Scholar

[9] Mubeen S, Kumar S. Designing efficient source routing for mesh topology network on chip platforms. In: Proceedings of the 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, Lille, 2010. 181--188. Google Scholar

[10] Xiang D, Luo W. An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans Parall Distrib Syst, 2012, 23: 800--808. Google Scholar

[11] Xiang D, Zhang Y, Shan S, et al. A fault-tolerant routing algorithm design for on-chip optical networks. In: Proceedings of the 32nd International Symposium on Reliable Distributed Systems, Braga, 2013. 1--9. Google Scholar

[12] Hu J, Marculescu R. Energy-aware mapping for tile-based NoC architectures under performance constraints. In: Proceedings of Asia and South Pacific Design Automation Conference, Kitakyushu, 2003. 233--239. Google Scholar

[13] Marvasti M B, Daneshtalab M, Afzali-Kusha A, et al. PAMPR: power-aware and minimum path routing algorithm for NoCs. In: Proceedings of the 15th IEEE International Conference on Electronics, Circuits and Systems, Kitakyushu, 2008. 418--421. Google Scholar

[14] Hu J, Marculescu R. Exploiting the routing flexibility for energy/performance-aware mapping of regular NoC architectures. In: Proceedings of Europe Conference and Exhibition on Design, Automation and Test, Munich, 2003. 688--693. Google Scholar

[15] Qiao B j, Shi F, Ji W X. A new hierarchical interconnection network for multi-core processor. In: Proceedings of the 2nd IEEE Conference on Industrial Electronics and Applications, Harbin, 2007. 246--250. Google Scholar

[16] Wang Z, Shi F. A shortest path routing algorithm in triplet-based network. Trans Beijing Inst Technol, 2009, 29: 410--414. Google Scholar

[17] Zhang Y, Shi F. Design and evaluation of low-latency and shortest-path routing algorithm for triplet-based hierarchical interconnection network. J Test Eval, 2013, 41: 541--550. Google Scholar

[18] Catania V, Mineo A, Monteleone S, et al. Cycle-accurate network on chip simulation with noxim. ACM Trans Model Comput Simul, 2016, 27: 1--25. Google Scholar

[19] Weerasinghe H D, Tackett R, Fu H R. Verifying position and velocity for vehicular ad-hoc networks. Secur Commun Netw, 2011, 4: 785--791. Google Scholar

[20] Wu L F, Meng Q H, Liang H W, et al. Accurate localization in combination with wireless sensor networks and laser localization. In: Proceedings of IEEE International Conference on Automation and Logistics, Shenyang, 2009. 146--151. Google Scholar

[21] Synopsys Inc. Data sheet: primePower full-chip dynamic power analysis for multimillion-gate design. 2004. https://www.synopsys.com/. Google Scholar

  • Figure 1

    (Color online) Plan of ${\rm{}}{{\rm~TG}^L}$

  • Figure 2

    (Color online) Diagram of using ${s_3}=~({123})$ to transform ${\rm~TG}^2$

  • Figure 3

    (Color online) The shortest path routing approach in ${\rm~FM}_0$ mode

  • Figure 4

    (Color online) Distance evaluation from vertex to tip

  • Figure 5

    (Color online) Flow modes of communication data in TriBA-Net

  • Figure 6

    (Color online) Port naming of nodes in TriBA-Net

  • Figure 7

    (Color online) Optimized route comparison circuit

  • Figure 8

    (Color online) Curve of average latency versus injection rate. (a) Uniform; (b) Bitreversal; (c) Shuffle

  • Figure 9

    (Color online) Curve of throughput versus injection rate. (a) Uniform; (b) Bitreversal; (c) Shuffle

  • Table 1   Truth table of equivalent transformation of layer codes
    Current mode (uvw) 000 001 010 011 100 101
    Current code (ab) Transformed code ($x′′x′$)
    01 01 10 01 10 11 11
    10 10 01 11 11 10 01
    11 11 11 10 01 01 10
  •   

    Algorithm 1 SPR0

    Require:$s′′_Ls′_L \cdots s′′_1s′_1$, starting point identifier;

    $t′′_Lt′_L \cdots t′′_1t′_1$,

    ending point identifier;

    $l$,

    size of ${\rm~FRG}^l$;

    Output:Direction of forwarding at $s′′_Ls′_L \cdots s′′_1s′_1$;

    LenPA $ \leftarrow s′_{l - 1} \cdots s′_1 + 1 + t′′_{l - 1} \cdots t′′_1$;

    LenPB $ \leftarrow ( {\bar s′′_{l - 1} \cdots \bar s′′_1 + \bar s′_{l - 1} \cdots \bar s′_1} ) + ( {\bar t′′_{l - 1} \cdots \bar t′′_1 + \bar t′_{l - 1} \cdots \bar t′_1} ) + {2^{l - 1}} + 1$;

    if LenPA LenPB then

    return $~2;~~//~{\rm~Port}~2$;

    else

    return $~3;~//~{\rm~Port}~3$;

    end if

  •   

    Algorithm 2 SPR4T

    Require:$s′′_Ls′_L \cdots s′′_1s′_1$, starting point identifier;

    $t′′_Lt′_L \cdots t′′_1t′_1$,

    ending point identifier;

    $l$,

    size of TriBA-Net;

    Output:Direction of forwarding at $s′′_Ls′_L \cdots s′′_1s′_1$;

    if $l~=~0$ then

    return $0;~~//~{\rm~Port}~0$;

    else

    for $l~=~L~\to~1$

    if$s′′_ls′_l \ne t′′_lt′_l$ then

    $~{\rm~break}$;

    end if

    end for

    $i \leftarrow {\rm FM}( {s′′_ls′_l \cdots s′′_1s′_1, t′′_lt′_l \cdots t′′_1t′_1} )$;

    ${\bar u_{\rm EQ}} \leftarrow {{\rm Convert}_i}( {s′′_ls′_l \cdots s′′_1s′_1} )$;

    ${\bar v_{\rm EQ}} \leftarrow {{\rm Convert}_i}( {t′′_lt′_l \cdots t′′_1t′_1} )$;

    return $~s_i^{~-~1}(~{{\rm~SPR0}(~{{{\bar~u}_{\rm~EQ}},{{\bar~v}_{\rm~EQ}},l}~)}~)$;

    end if

  • Table 2   Configuration of the parameters of the simulator
    Configuration Topology Network size Switching Flit size Buffer Packet size Virtual channel
    Parameter TriBA-Net 27 nodes Wormhole 32 bits 4 flits 4 flits 4
  • Table 3   Router hardware overhead list
    Slices LUTs Flips-Flops
    SPORT router 139 269 114
    SPR4T router 118 229 103
  • Table 4   Comparison of power consumption between SPR4T and SPORT routers
    Uniform (nW) Bitreversal (nW)
    SPORT router 32.28 30.42
    SPR4T router 29.54 27.78