SCIENTIA SINICA Informationis, Volume 48, Issue 7: 932-946(2018) https://doi.org/10.1360/N112017-00302

Design and implementation of large-scale network propagation simulation method inspired by Pregel mechanism

More info
  • ReceivedMar 27, 2018
  • AcceptedApr 8, 2018
  • PublishedJul 18, 2018


With the rapid development of the Internet and online social media, the law of information dissemination in social networks needs much experimentation on network propagation calculation. The current network propagation experiments based on the SIR model are widely used in disease research and information dissemination. However, because of the limitations of hardware and software, it is still difficult to conduct ultra-large-scale network propagation calculation. However, the current Internet information dissemination shows the characteristics of large-scale users, large amounts of information, and fast propagation. The shortcomings of small-scale network dissemination experiments based on abstract and simplified methods have been revealed. In this study, the Spark platform is used to implement an experimental algorithm for large-scale network propagation calculation. The performances of the algorithm and Nepidemix stand-alone computing components are compared, and the algorithm's advantages and disadvantages are demonstrated. An orthogonal experimental design method was used to design the performance test experiment to find out the factors influencing the algorithm. When there are enough cluster computing resources, the algorithm can break the limitation of network node size and is difficult to develop, which lays the foundation for calculation experiments about very large-scale network propagation.

Funded by








[1] Watts D J, Strogatz S H. Collective dynamics of `small-world' networks. Nature, 1998, 393: 440-442 CrossRef PubMed ADS Google Scholar

[2] Barabási A L, Albert R, Jeong H. Mean field theory for scale-free random networks. Physica A-Statistical Mech its Appl, 1999, 272: 173-187 CrossRef ADS Google Scholar

[3] Pastor-Satorras R, Vespignani A. Epidemic Spreading in Scale-Free Networks. Phys Rev Lett, 2001, 86: 3200-3203 CrossRef PubMed ADS Google Scholar

[4] Lloyd A L. EPIDEMIOLOGY: How Viruses Spread Among Computers and People. Science, 2001, 292: 1316-1317 CrossRef Google Scholar

[5] May R M, Lloyd A L. Infection dynamics on scale-free networks. Phys Rev E, 2001, 64: 066112 CrossRef PubMed ADS Google Scholar

[6] Granovetter M. Threshold Models of Collective Behavior. Am J Sociology, 1978, 83: 1420-1443 CrossRef Google Scholar

[7] Li Q, Braunstein L A, Wang H. Non-consensus Opinion Models on Complex Networks. J Stat Phys, 2013, 151: 92-112 CrossRef ADS arXiv Google Scholar

[8] Qu B, Li Q, Havlin S. Nonconsensus opinion model on directed networks. Phys Rev E, 2014, 90: 052811 CrossRef PubMed ADS arXiv Google Scholar

[9] Goldenberg J, Libai B. Using complex systems analysis to advance marketing theory development: modeling heterogeneity effects on new product growth through stochastic cellular automata. Acad Market Sci Rev, 2001, 9: 1--18. Google Scholar

[10] Goldenberg J, Libai B, Muller E. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Lett, 2001, 12: 211-223 CrossRef Google Scholar

[11] Hethcote H W. The Mathematics of Infectious Diseases. SIAM Rev, 2000, 42: 599-653 CrossRef ADS Google Scholar

[12] Newman M E J. Spread of epidemic disease on networks. Phys Rev E, 2002, 66: 016128 CrossRef PubMed ADS Google Scholar

[13] Chopard B, Droz M. Cellular Automata Modeling of Physical Systems. Cambridge: Cambridge University Press, 1998. Google Scholar

[14] Zhang M X, Seck M, Verbraeck A. A DEVS-based M&S method for large-scale multi-agent systems. In: Proceedings of the 2013 Summer Computer Simulation Conference, Toronto, 2013. 3. Google Scholar

[15] Collier N, Ozik J, Macal C M. Large-scale agent-based modeling with repast HPC: a case study in parallelizing an agent-based model. In: Euro-Par 2015: Parallel Processing Workshops. Berlin: Springer, 2015. 454--465. Google Scholar

[16] Seck M, Verbraeck A. Devs in dsol: adding devs operational semantics to a generic event-scheduling simulation environment. In: Proceedings of the Summer Computer Simulation Conference, Istanbul, 2009. 261--266. Google Scholar

[17] Staudt C L, Sazonovs A, Meyerhenke H. Networkit: an interactive tool suite for high-performance network analysis,. arXiv Google Scholar

[18] Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the 3rd International AAAI Conference On Weblogs And Social Media, San Jose, 2009. Google Scholar

[19] Batagelj V, Mrvar A. Pajek---analysis and visualization of large networks. In: Graph Drawing Software. Berlin: Springer, 2004. 2265: 77--103. Google Scholar

[20] Leskovec J, Sosic R. Snap: a general-purpose network analysis and graph-mining library. ACM Trans Intell Syst Tech, 2016, 8: 1. Google Scholar

[21] Lugowski A, Alber D, Buluç A, et al. A flexible open-source toolbox for scalable complex graph analysis. In: Proceedings of the 2012 SIAM International Conference on Data Mining, Anaheim, 2012. Google Scholar

[22] Ediger D, Jiang K, Riedy E J. GraphCT: Multithreaded Algorithms for Massive Graph Analysis. IEEE Trans Parallel Distrib Syst, 2013, 24: 2220-2229 CrossRef Google Scholar

[23] Ediger D, Mccoll R, Riedy J, et al. Stinger: high performance data structure for streaming graphs. In: Proceedings of 2012 IEEE Conference on High Performance Extreme Computing, Waltham, 2013. 1--5. Google Scholar

[24] Shun J L, Blelloch G E. Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Shenzhen, 2013. 135--146. Google Scholar

[25] Rutherford A R, Ramadanović B, Ahrenberg L, et al. Control of an HIV epidemic among injection drug users: simulation modeling on complex networks. In: Proceedings of Winter Simulation Conference, Arlington, 2016. 23--37. Google Scholar

[26] Capasso V, Serio G. A generalization of the Kermack-McKendrick deterministic epidemic model. Math Biosci, 1978, 42: 43-61 CrossRef Google Scholar

[27] Liu L, Qu B, Chen B, et al. Modeling of information diffusion on social networks with applications to wechat,. arXiv Google Scholar

[28] Goudreau M W, Lang K, Rao S B. Portable and efficient parallel computing using the BSP model. IEEE Trans Comput, 1999, 48: 670-689 CrossRef Google Scholar

[29] Hill J M D, McColl B, Stefanescu D C. BSPlib: The BSP programming library. Parallel Computing, 1998, 24: 1947-1980 CrossRef Google Scholar

[30] Skillicorn D B, Hill J M D, Mccoll W F. Questions and answers about BSP. Sci Programm, 1997, 6: 249--274. Google Scholar

[31] Malewicz G, Austern M H, Bik A J C, et al. Pregel: a system for large-scale graph processing. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Indianapolis, 2010. 135--146. Google Scholar

[32] Gonzalez J E, Low Y C, Gu H J, et al. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of Usenix Conference on Operating Systems Design and Implementation, Hollywood, 2012. 17--30. Google Scholar

[33] Gonzalez J E, Xin R S, Dave A, et al. GraphX: graph processing in a distributed dataflow framework. In: Proceedings of Usenix Conference on Operating Systems Design and Implementation, Broomfield, 2014. 599--613. Google Scholar

[34] Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets. In: Proceedings of the 2nd Usenix Conference on Hot Topics in Cloud Computing, Berkeley, 2010. 10--10. Google Scholar

[35] Valiant L G. A bridging model for parallel computation. Commun ACM, 1990, 33: 103--111. Google Scholar

  • Figure 1

    (Color online) State transition of SIR model

  • Figure 2

    (Color online) State transition of SVFR model

  • Figure 3

    (Color online) Graph partition of GraphX

  • Figure 4

    (Color online) Data structure of GraphX

  • Figure 5

    (Color online) Find max value by Pregel, shadow node is inactive

  • Figure 6

    (Color online) State transition

  • Figure 7

    (Color online) Node attribute definition

  • Figure 8

    (Color online) Message definition

  • Figure 9

    (Color online) The impact ofdifferent nodes size on calculation time (minimumdegree 6, infection factor 0.4). (a) Network generating time; (b) propagate calculation time

  • Figure 10

    (Color online) The impact of different minimum degree on calculation time (node size is 100000, infection factor is 0.4). (a) Network generating time; (b) propagate calculation time

  • Figure 11

    (Color online) The impact of different infection factor on calculation time (node size is 100000, minimum degree is 6). (a) Network generating time; (b) propagate calculation time


    Algorithm 1 Pregel API运行流程

    for all $i~\in~{\rm~nodes}~$

    vprogFunc($i$, initialMsg);

    end for


    for all edge $\in$ edges


    end for

    for all $i~\in~{\rm~nodes}~$

    mergeMsgFunc(Msg $a$, Msg $b$);

    end for

    for all $i~\in$ nodes

    vprogFunc(value, Msg);

    end for

    until No more Msg produced;


    Algorithm 2 sendMsgFunc

    Require:dstId, srcId, dstAttr, srcAttr;


    if srcAttr(0) = 0 dstAttr(0)= 2 then

    return Iterator(dstId, Array(4, 0, 0));


    if srcAttr(0) = 2, dstAttr(0)= 0 then

    return Iterator(dstId, Array(2, srcId, srcAttr(2)+1));


    return Iterator.empty;

    end if

    end if


    Algorithm 3 vprogFunc

    Require:nodeId, Value, Msg, $\beta$, $\gamma$;


    if Msg(0) = 0 then

    if nodeId = Value(1) then

    return Array(2, 0, 0);


    return Array(0, 0, 0);

    end if


    choose a random $p\in(0,1)$;

    if $p<\beta$ then

    choose a random $q\in(0,1)$;

    if $q<\gamma$ then

    return Array(2, Msg(1), Msg(2));


    return Array(3, Msg(1), Msg(2));

    end if


    return Array(0, 0, 0);

    end if

    end if


    Algorithm 4 配置网络生成



    index = 0;

    for all $i\in(1,n)$

    for all $j~\in~(1,{\rm~degree})$



    end for

    end for

    while nodelist is not empty do

    choose two random $m,n~\in~(1$, size of nodelist$),~m~\neq~n$;

    edge=(nodelist$m$, nodelist$n$);



    delete nodelistlast$-$1;

    delete nodelistlast;

    push edge in EDGE;

    end while

    return EDGE;


    Algorithm 5 度序列生成



    sum = 0;

    for all $i~\in~({\rm~kmin},{\rm~kmax})$



    end for

    for all $j~\in~({\rm~kmin},{\rm~kmax})$


    end for

    for all $k~\in~(1,n)~$

    choose a new random $~p\in~(0,1)$;


    for all $l~\in~({\rm~kmin},{\rm~kmax})$

    if ${\rm~cumpro}[l]~<~p$ then


    end if

    end for

    end for

    return degree;

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有