SCIENTIA SINICA Informationis, Volume 48 , Issue 11 : 1558-1574(2018) https://doi.org/10.1360/N112018-00134

## Rumor detection in social media based on a hierarchical attention network

• AcceptedSep 14, 2018
• PublishedNov 14, 2018
Share
Rating

### Abstract

For rumor detection in social media, the majority of feature-representation-based studies capture textual features or global users' features according to a partitioned sequence of microblogs. However, these studies ignore the time-series information across the microblogs in one time interval. Moreover, the latent textual information and the local users' information, which have been proven effective according to the traditional machine learning methods, are overlooked in capturing the time intervals' features, resulting in low performance. Therefore, we propose a rumor detection method in social media based on a hierarchical attention network. First, microblogs are partitioned into several time intervals. Then, the variation of information across the microblogs changing over time is learned using a bidirectional gated recurrent unit neural network with an attention mechanism. After that, the variation of features across the microblogs is combined with hand-crafted features to incorporate latent textual information and local users' information into the time intervals' features. Finally, we capture features' variation across the time intervals by using the bidirectional gated recurrent unit neural network, with the attention mechanism, and classify microblog events. Experimental results over two public datasets, Sina Weibo and Twitter, show that the proposed method outperforms (in terms of the accuracy) state-of-the-art methods by 1.5% and 1.4% over the two datasets, respectively, and it is effective for rumor detection.

### References

[1] Liu Z Y, Zhang L, Tu C C, et al. Statistical and semantic analysis of rumors in Chinese social media. Sci Sin Inform, 2015, 45: 1536--1546. Google Scholar

[2] Vosoughi S, Roy D, Aral S. The spread of true and false news online. Science, 2018, 359: 1146-1151 CrossRef PubMed ADS Google Scholar

[3] Waldrop M M. News Feature: The genuine problem of fake news.. Proc Natl Acad Sci USA, 2017, 114: 12631-12634 CrossRef PubMed Google Scholar

[4] Tan Z H, Shi Y C, Shi N X, et al. Rumor propagation analysis model inspired by gravity theory for online social networks. J Comput Res Dev, 2017, 54: 2586--2599. Google Scholar

[5] Liu Y H, Jin X L, Shen H W, et al. A survey on rumor identification over social media. Chinese J Comput, 2018, 41: 1536--1558. Google Scholar

[6] Chen Y F, Li Z Y, Liang X, et al. Review on rumor detection of online social networks. Chinese J Comput, 2018, 41: 1648--1677. Google Scholar

[7] Ma J, Gao W, Mitra P, et al. Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, 2016. 3818--3824. Google Scholar

[8] Yu F, Liu Q, Wu S, et al. A convolutional approach for misinformation identification. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, 2017. 3901--3907. Google Scholar

[9] Ma J, Gao W, Wong K F. Detect rumor and stance jointly by neural multi-task learning. In: Proceedings of the Web Conference Companion, Lyon, 2018. 585--593. Google Scholar

[10] Zhang Q, Zhang S Y, Dong J, et al. Automatic detection of rumor on social network. In: Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing, Nanchang, 2015. 113--122. Google Scholar

[11] Castillo C, Mendoza M, Poblete B. Information credibility on twitter. In: Proceedings of International Conference on World Wide Web, Hyderabad, 2011. 675--684. Google Scholar

[12] Yang F, Liu Y, Yu X H, et al. Automatic detection of rumor on Sina Weibo. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, Beijing, 2012. 13--20. Google Scholar

[13] Zhao Z, Resnick P, Mei Q Z. Enquiring minds: early detection of rumors in social media from enquiry posts. In: Proceedings of the 24th International Conference on World Wide Web, Florence, 2015. 1395--1405. Google Scholar

[14] Liang G, He W, Xu C. Rumor Identification in Microblogging Systems Based on Users' Behavior. IEEE Trans Comput Soc Syst, 2015, 2: 99-108 CrossRef Google Scholar

[15] Ma J, Gao W, Wong K F. Detect rumors in microblog posts using propagation structure via kernel learning. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, 2017. 708--717. Google Scholar

[16] Sun S Y, Liu H Y, He J, et al. Detecting event rumors on Sina Weibo automatically. In: Proceedings of the Web Technologies and Applications, Sydney, 2013. 120--131. Google Scholar

[17] Ma J, Gao W, Wei Z Y, et al. Detect rumors using time series of social context information on microblogging websites. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, 2015. 1751--1754. Google Scholar

[18] Ruchansky N, Seo S, Liu Y. CSI: a hybrid deep model for fake news detection. In: Proceedings of ACM on Conference on Information and Knowledge Management, Singapore, 2017. 797--806. Google Scholar

[19] Qazvinian V, Rosengren E, Radev D R, et al. Rumor has it: identifying misinformation in microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, 2011. 1589--1599. Google Scholar

[20] Hamidian S, Diab M. Rumor identification and belief investigation on Twitter. In: Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, San Diego, 2016. 3--8. Google Scholar

[21] Liu Y H, Jin X L, Shen H W, et al. Do rumors diffuse differently from non-rumors? a systematically empirical analysis in Sina Weibo for rumor identification. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Jeju, 2017. 407--420. Google Scholar

[22] Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detectors. Comput Sci, 2012, 3: 212--223. Google Scholar

[23] Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Vancouver, 2013. 6645--6649. Google Scholar

[24] LeCun Y, Boser B, Denker J S. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1989, 1: 541-551 CrossRef Google Scholar

[25] Elman J L. Finding Structure in Time. Cognitive Sci, 1990, 14: 179-211 CrossRef Google Scholar

[26] Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation, 1997, 9: 1735-1780 CrossRef Google Scholar

[27] Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1746--1751. Google Scholar

[28] Chen H M, Sun M S, Tu C C, et al. Neural sentiment classification with user and product attention. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Austin, 2016. 1650--1659. Google Scholar

[29] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologiesm, San Diego, 2016. 1480--1489. Google Scholar

[30] Cho K, Van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1724--1734. Google Scholar

[31] Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the International Conference on Machine Learning, Beijing, 2014. 1188--1196. Google Scholar

[32] Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014,. arXiv Google Scholar

• Figure 1

(Color online) The model of rumor detection in social media based on hierarchical attention networks

• Figure 2

(Color online) Results of rumor early detection. (a) Sina Weibo dataset; (b) Twitter dataset

• Figure 3

(Color online) The effects of epoch num on experimental results. (a) Sina Weibo dataset; (b) Twitter dataset

• Figure 4

(Color online) The features changing overtime on Sina Weibo dataset. (a) Enquiries and corrections;protect łinebreak (b) personal description; (c) verified users; (d) users reputation

• Figure 5

(Color online) The features changing overtime on Twitter dataset. (a) Enquiries and corrections; (b) personal description; (c) verified users; (d) users reputation

•

Algorithm 1 基于分层注意力网络的社交媒体谣言检测模型的训练算法

Require:训练数据集事件集合 $E=\{e_{1},e_{2},\ldots\}$,

其中$e_{i}=\{(m_{i,j},t_{i,j})\}_{j=1}^{n_{i}}$; 事件对应的真实标签集合$Y^{*}=\{y^{*}_{1},y^{*}_{2},\ldots\}$.

初始化模型参数集合$\theta$, 最大迭代次数MAXEPOCH, 当前迭代次数${\rm~epoch}~\Leftarrow~1$;

while ${\rm~epoch}~\leq~{\rm~MAXEPOCH}$ do

对各个事件$e_{i}$, 计算对应的预测标签$L_{i}$;

根据式(16)计算损失值Loss;

根据Loss的值利用Adam优化算法更新参数集合$\theta$;

${\rm~epoch}~\Leftarrow~{\rm~epoch}+1$;

end while

训练后得到的最优的模型参数集合$\theta$.

• Table 1   The features of the time intervals
 Feature Description Content-based features Text vector Calculated by utilizing doc2vec Enquiries and corrections % of microblogs with enquiries and corrections Length of microblogs Average length of microblogs User-based features Personal description % of users that provide personal description Verified users % of verified users Users reputation Followers/followees ratio Users activeness Followees/followers ratio
• Table 2   The regular expression list of enquiries and corrections
 Chinese English (?$:$这$|$那$|$它)是真的吗 is$\backslash$s(?$:$that$|$this$|$it)$\backslash$s true 什么[?!][?1]* wh[a]*t[?!][?1]* 真的?$|$真的? $|$求证$|$真的假的$|$真的吗$|$未经证实 real?$|$really?$|$unconfirmed 谣言$|$揭穿 rumor$|$debunk (?:那$|$这$|$它)不是真的$|$假的 (?:that$|$this$|$it)$\backslash$s is$\backslash$s not$\backslash$s true
• Table 3   Statistics of the dataset
 Statistic Sina Weibo Twitter Events# 4664 992 Rumors# 2313 498 Non-rumors# 2351 494 Microblogs# 3805656 1101985 Users # 2746818 491229 Average time length/event (h) 2460.7 1582.6 Average # of posts/event 816 1111 Max # of posts/event 59318 62827 Min # of posts/event 10 10
• Table 4   Rumor detection results (R: rumor, N: non-rumor)$^{\rm~a)}$
 Method Class Sina Weibo Twitter Accuracy Precison Recall $F1$ Accuracy Precison Recall $F1$ DT-Rank R 0.732 0.738 0.715 0.726 0.614 0.618 0.604 0.611 N 0.726 0.749 0.737 0.609 0.623 0.616 DTC R 0.831 0.847 0.815 0.831 0.709 0.690 0.772 0.729 N 0.815 0.847 0.830 0.733 0.643 0.685 SVM-TS R 0.857 0.839 0.885 0.861 0.716 0.689 0.793 0.738 N 0.878 0.830 0.857 0.754 0.639 0.692 GRU-2 R 0.910 0.876 0.956 0.914 0.723 0.712 0.743 0.727 N 0.952 0.864 0.906 0.735 0.704 0.719 CAMI R 0.933 0.921 0.945 0.933 0.752 0.722 0.814 0.765 N 0.945 0.921 0.932 0.790 0.690 0.737 CSI R 0.953 0.930 0.976 0.954 0.773 0.806 0.714 0.758 N 0.977 0.931 0.953 0.746 0.831 0.787 HAN-FC R 0.968 0.966 0.974 0.970 0.787 0.778 0.800 0.789 N 0.971 0.962 0.967 0.797 0.775 0.786

a) Values in bold represent the best result in each category among all methods.

Citations

• #### 0

Altmetric

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有