SCIENCE CHINA Information Sciences, Volume 61 , Issue 9 : 092101(2018) https://doi.org/10.1007/s11432-016-9122-x

## Inferring diffusion networks with life stage heterogeneity

• AcceptedMay 10, 2017
• PublishedNov 2, 2017
Share
Rating

### Abstract

A network inference problem focuses on discovering the structure of a diffusion network from observed cascades. This problem is significantly more challenging inseveral settings in which this type of an inference is desirable or necessarybecause of heterogeneity in the diffusion process. The heterogeneity of the diffusion process in different life stages results in the inaccuracy of acommon assumption of constant influence strength. In this study, a Life Stage HeuristicLSH)method is proposed to model life stage heterogeneity by decoupling the popularity level of anitem under propagation from a true strength of social ties to improve inference accuracy. The proposed LSH is incorporated into almost allexisting state-of-the-art network inference algorithms to improve estimation accuracy with only minimal changes in the implementationand maintaining the same running time. Additionally, NetRate NetInf and ConNIeare used as three examples to demonstrate the power of the proposed method. Furthermore, clustering of cascades prior to the LSH is proposed to eliminate noise, and the optimized method is termed as Clustered Life Stage HeuristicCLSH). Extensive experiments on synthetic and real world datasets indicate that both LSH and CLSH methods significantly improve the accuracy of network inference.

### Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant No. 61572041), Beijing Natural Science Foundation (Grant No. 4152023), and National High Technology Research and Development Program of China (863 Program) (Grant No. 2014AA015103).

### References

[1] Domingos P, Richardson M. Mining the network value of customers. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2001. 57--66. Google Scholar

[2] Kempe D, Kleinberg J, Tardos E. Maximizing the spread of influence through a social network. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, 2003. 137--146. Google Scholar

[3] Leskovec J, Adamic L A, Huberman B A. The dynamics of viral marketing. In: Proceedings of ACM Transactions on the Web, New York, 2007. 1--5. Google Scholar

[4] Adar E, Zhang L, Adamic L, et al. Implicit structure and the dynamics of blogspace. In: Proceedings of Workshop on the Weblogging Ecosystem, New York, 2004. 13: 16989--16995. Google Scholar

[5] Gruhl D, Guha R, Liben-Nowell D, et al. Information diffusion through blogspace. In: Proceedings of the International Conference on World Wide Web, New York, 2004. 491--501. Google Scholar

[6] Leskovec J, Kleinberg J, Faloutsos C. Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, 2005. 177--187. Google Scholar

[7] Dong X L, Berti-Equille L, Srivastava D. Truth discovery and copying detection in a dynamic world. Proc VLDB Endow, 2009, 2: 562-573 CrossRef Google Scholar

[8] Daneshmand H, Rodriguez M G, Song L, et al. Estimating diffusion network structures: Recovery conditions, sample complexity and soft-thresholding algorithm. In: Proceedings of International Conference on Machine Learning, Beijing, 2014. 793--801. Google Scholar

[9] Du N, Song L, Woo H, et al. Uncover topic-sensitive information diffusion networks. In: Proceedings of International Conference on Artificial Intelligence and Statistics, Scottsdale, 2013. 229--237. Google Scholar

[10] Du N, Song L, Yuan S, et al. Learning networks of heterogeneous influence. In: Proceedings of Neural Information Processing Systems Conference, Lake Tahoe, 2012. 2780--2788. Google Scholar

[11] Rodriguez M G, Balduzzi D, Sch$\ddot{\rm~~o}$lkopf B. Uncovering the temporal dynamics of diffusion networks. In: Proceedings of International Conference on Machine Learning, Bellevue, 2011. 561--568. Google Scholar

[12] Rodriguez M G, Leskovec J, Krause A. Inferring networks of diffusion and influence. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 5: 21. Google Scholar

[13] Wang S, Hu X, Yu P S, et al. Mmrate: inferring multi-aspect diffusion networks with multi-pattern cascades. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2014. 1246--1255. Google Scholar

[14] Yang S H, Zha H. Mixture of mutually exciting processes for viral diffusion. In: Proceedings of International Conference on Machine Learning, Atlanta, 2013. 28: 1--9. Google Scholar

[15] Zhou K, Zha H, Song L. Learning social infectivity in sparse low-rank network using multi-dimensional hawkes processes. In: Proceedings of International Conference on Machine Learning, Atlanta, 2013. 31: 641--649. Google Scholar

[16] Dou P, Du S Z, Song G J. Inferring diffusion network on incomplete cascade data. In: Proceedings of International Conference on Web-Age Information Management, Nanchang, 2016. 325--337. Google Scholar

[17] Pasumarthi R K, Karthik S, Choure A, et al. Online network inference under dynamic cascade updates: A node-centric approach. In: Proceedings of International Conference on Communication Systems and Networks, Bangalore, 2014. 1--6. Google Scholar

[18] Rodriguez M G, Scholkopf B. Submodular inference of diffusion networks from multiple trees. In: Proceedings of International Conference on Machine Learning, Edinburgh, 2012. 1019--1028. Google Scholar

[19] Myers S A, Leskovec J. On the convexity of latent social network inference. In: Proceedings of Neural Information Processing Systems Conference, Whistler, 2010. 1741--1749. Google Scholar

[20] Yang J ,Leskovec J. Patterns of temporal variation in online media. In: Proceedings of International Conference on Web Search and Data Mining, New York, 2011. 177--186. Google Scholar

[21] Matsubara Y, Sakurai Y, Prakash B A, et al. Rise and fall patterns of information diffusion: Model and implications. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 6--14. Google Scholar

[22] Evans J D. Straightforward Statistics for the Behavioral Sciences. California: Brooks/Cole Publishing, 1996. Google Scholar

[23] Leskovec J, Chakrabarti D, Kleinberg J, et al. Kronecker graphs: an approach to modeling networks. J Mach Learn Res, 2010, 11: 985--1042. Google Scholar

[24] Leskovec J, Backstrom L, Kleinberg J. Meme-tracking and the dynamics of the news cycle. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, 2009. 497--506. Google Scholar

• Figure 1

(Color online) The diffusion speed in different stages of diffusion processes in the Sina microblog dataset and US Patent dataset. (a) Cascade A in Sina dataset; (b) cascade B in Sina dataset; (c) cascade A in Patent dataset; protectłinebreak (d) cascade B in Patent dataset.

• Figure 2

(Color online) Performance of (C)LSH-NetRate (a), (C)LSH-ConNIe (b), (C)LSH-NetInf (c) with respect to various network structures. The $x$-axis denotes different synthetic network structures.

• Figure 3

(Color online) Performance of (C)LSH-NetRate (a), (C)LSH-ConNIe (b), (C)LSH-NetInf (c) with respect to different numbers of cascades. The $x$-axis denotes a different number of cascades.

• Figure 4

(Color online) Performance of (C)LSH-NetRate (a), (C)LSH-ConNIe (b), (C)LSH-NetInf (c) with respect to different heterogeneity levels. The $x$-axis denotes different levels of heterogeneity.

• Figure 5

(Color online) Performance on the likelihood function maximum. (a) (C)LSH-NetRate; (b) (C)LSH-ConNIe.

• Figure 6

(Color online) Performance with respect to different parameter values $L$ and $K$. (a) Max F1 score on various $L$; (b) AUC on various $L$; (c) max F1 score on various $K$; (d) AUC on various $K$.

• Figure 7

(Color online) Precision-recall curve with respect to MemeTracker dataset. (a) (C)LSH-NetRate; (b) (C)LSH-ConNIe; (c) (C)LSH-NetInf.

• Figure 8

(Color online) Precision-recall curve with respect to Patent dataset. (a) (C)LSH-NetRate; (b) (C)LSH-ConNIe; (c) (C)LSH-NetInf.

• Figure 9

(Color online) Performance on MemeTracker dataset vs. the number of cascades. (a) (C)LSH-NetRate; protectłinebreak (b) (C)LSH-ConNIe; (c) (C)LSH-NetInf.

• Figure 10

(Color online) Performance on Patent dataset vs. the number of cascades. (a) (C)LSH-NetRate; (b) (C)LSH-ConNIe; (c) (C)LSH-NetInf.

• Figure 11

(Color online) The popularity level estimated by CLSH in MemeTracker and Patent datasets. (a) Cluster A in MemeTracker; (b) cluster B in MemeTracker; (c) cluster A in Patent dataset; (d) cluster B in Patent dataset.

• Table 1   Correlation coefficients between ground truth diffusion speed and the number of activated nodes in a time unit
 Coefficient Strength Percentage in Patent Percentage in Sina 0–0.19 Very weak 0.050 0.02 0.20–0.39 Weak 0.125 0.02 0.40–0.59 Moderate 0.175 0.06 0.60–0.79 Strong 0.300 0.08 0.80–1 Very strong 0.350 0.82
• Table 2   Popularity patterns in synthetic data
 Id $P^c$ function 1 $[0.2695;~~~0.6742;~~~0.0319;~~~0.0183;~~~0.0060]$ 2 $[0.6441;~~~0.1494;~~~0.0942;~~~0.0648;~~~0.0474]$ 3 $[0.0746;~~~0.1091;~~~0.1408;~~~0.2045;~~~0.4710]$ 4 $[0.1122;~~~0.0735;~~~0.1206;~~~0.2080;~~~0.4857]$
• Table 3   Popularity patterns with respect to different standard deviations
 Id $P^c$ function Level 1 $[0.2;~~0.2;~~~~0.2;~~0.2;~~~~0.2]$ 0 2 $[0.2;~~0.4;~~~~0.3;~~0.05;~~~0.05]$ 0.1541 3 $[0.1;~~0.5998;~0.3;~~0.0001;~0.0001]$ 0.2549 4 $[0.03;~0.8998;~0.07;~0.0001;~0.0001]$ 0.3924

Citations

• #### 0

Altmetric

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有