logo

SCIENTIA SINICA Informationis, Volume 48, Issue 11: 1558-1574(2018) https://doi.org/10.1360/N112018-00134

Rumor detection in social media based on a hierarchical attention network

More info
  • ReceivedMay 25, 2018
  • AcceptedSep 14, 2018
  • PublishedNov 14, 2018

Abstract

For rumor detection in social media, the majority of feature-representation-based studies capture textual features or global users' features according to a partitioned sequence of microblogs. However, these studies ignore the time-series information across the microblogs in one time interval. Moreover, the latent textual information and the local users' information, which have been proven effective according to the traditional machine learning methods, are overlooked in capturing the time intervals' features, resulting in low performance. Therefore, we propose a rumor detection method in social media based on a hierarchical attention network. First, microblogs are partitioned into several time intervals. Then, the variation of information across the microblogs changing over time is learned using a bidirectional gated recurrent unit neural network with an attention mechanism. After that, the variation of features across the microblogs is combined with hand-crafted features to incorporate latent textual information and local users' information into the time intervals' features. Finally, we capture features' variation across the time intervals by using the bidirectional gated recurrent unit neural network, with the attention mechanism, and classify microblog events. Experimental results over two public datasets, Sina Weibo and Twitter, show that the proposed method outperforms (in terms of the accuracy) state-of-the-art methods by 1.5% and 1.4% over the two datasets, respectively, and it is effective for rumor detection.


Funded by

国家自然科学基金(61772135,U1605251)

中国科学院网络数据科学与技术重点实验室开放基金课题(CASND-łinebreak ST201708,CASNDST201606)

北邮可信分布式计算与服务教育部重点实验室主任基金(2017KF01)


References

[1] Liu Z Y, Zhang L, Tu C C, et al. Statistical and semantic analysis of rumors in Chinese social media. Sci Sin Inform, 2015, 45: 1536--1546. Google Scholar

[2] Vosoughi S, Roy D, Aral S. The spread of true and false news online. Science, 2018, 359: 1146-1151 CrossRef PubMed ADS Google Scholar

[3] Waldrop M M. News Feature: The genuine problem of fake news.. Proc Natl Acad Sci USA, 2017, 114: 12631-12634 CrossRef PubMed Google Scholar

[4] Tan Z H, Shi Y C, Shi N X, et al. Rumor propagation analysis model inspired by gravity theory for online social networks. J Comput Res Dev, 2017, 54: 2586--2599. Google Scholar

[5] Liu Y H, Jin X L, Shen H W, et al. A survey on rumor identification over social media. Chinese J Comput, 2018, 41: 1536--1558. Google Scholar

[6] Chen Y F, Li Z Y, Liang X, et al. Review on rumor detection of online social networks. Chinese J Comput, 2018, 41: 1648--1677. Google Scholar

[7] Ma J, Gao W, Mitra P, et al. Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, 2016. 3818--3824. Google Scholar

[8] Yu F, Liu Q, Wu S, et al. A convolutional approach for misinformation identification. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, 2017. 3901--3907. Google Scholar

[9] Ma J, Gao W, Wong K F. Detect rumor and stance jointly by neural multi-task learning. In: Proceedings of the Web Conference Companion, Lyon, 2018. 585--593. Google Scholar

[10] Zhang Q, Zhang S Y, Dong J, et al. Automatic detection of rumor on social network. In: Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing, Nanchang, 2015. 113--122. Google Scholar

[11] Castillo C, Mendoza M, Poblete B. Information credibility on twitter. In: Proceedings of International Conference on World Wide Web, Hyderabad, 2011. 675--684. Google Scholar

[12] Yang F, Liu Y, Yu X H, et al. Automatic detection of rumor on Sina Weibo. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, Beijing, 2012. 13--20. Google Scholar

[13] Zhao Z, Resnick P, Mei Q Z. Enquiring minds: early detection of rumors in social media from enquiry posts. In: Proceedings of the 24th International Conference on World Wide Web, Florence, 2015. 1395--1405. Google Scholar

[14] Liang G, He W, Xu C. Rumor Identification in Microblogging Systems Based on Users' Behavior. IEEE Trans Comput Soc Syst, 2015, 2: 99-108 CrossRef Google Scholar

[15] Ma J, Gao W, Wong K F. Detect rumors in microblog posts using propagation structure via kernel learning. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, 2017. 708--717. Google Scholar

[16] Sun S Y, Liu H Y, He J, et al. Detecting event rumors on Sina Weibo automatically. In: Proceedings of the Web Technologies and Applications, Sydney, 2013. 120--131. Google Scholar

[17] Ma J, Gao W, Wei Z Y, et al. Detect rumors using time series of social context information on microblogging websites. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, 2015. 1751--1754. Google Scholar

[18] Ruchansky N, Seo S, Liu Y. CSI: a hybrid deep model for fake news detection. In: Proceedings of ACM on Conference on Information and Knowledge Management, Singapore, 2017. 797--806. Google Scholar

[19] Qazvinian V, Rosengren E, Radev D R, et al. Rumor has it: identifying misinformation in microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, 2011. 1589--1599. Google Scholar

[20] Hamidian S, Diab M. Rumor identification and belief investigation on Twitter. In: Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, San Diego, 2016. 3--8. Google Scholar

[21] Liu Y H, Jin X L, Shen H W, et al. Do rumors diffuse differently from non-rumors? a systematically empirical analysis in Sina Weibo for rumor identification. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Jeju, 2017. 407--420. Google Scholar

[22] Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detectors. Comput Sci, 2012, 3: 212--223. Google Scholar

[23] Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Vancouver, 2013. 6645--6649. Google Scholar

[24] LeCun Y, Boser B, Denker J S. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1989, 1: 541-551 CrossRef Google Scholar

[25] Elman J L. Finding Structure in Time. Cognitive Sci, 1990, 14: 179-211 CrossRef Google Scholar

[26] Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation, 1997, 9: 1735-1780 CrossRef Google Scholar

[27] Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1746--1751. Google Scholar

[28] Chen H M, Sun M S, Tu C C, et al. Neural sentiment classification with user and product attention. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Austin, 2016. 1650--1659. Google Scholar

[29] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologiesm, San Diego, 2016. 1480--1489. Google Scholar

[30] Cho K, Van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1724--1734. Google Scholar

[31] Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the International Conference on Machine Learning, Beijing, 2014. 1188--1196. Google Scholar

[32] Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014,. arXiv Google Scholar

  • Figure 1

    (Color online) The model of rumor detection in social media based on hierarchical attention networks

  • Figure 2

    (Color online) Results of rumor early detection. (a) Sina Weibo dataset; (b) Twitter dataset

  • Figure 3

    (Color online) The effects of epoch num on experimental results. (a) Sina Weibo dataset; (b) Twitter dataset

  • Figure 4

    (Color online) The features changing overtime on Sina Weibo dataset. (a) Enquiries and corrections;protect łinebreak (b) personal description; (c) verified users; (d) users reputation

  • Figure 5

    (Color online) The features changing overtime on Twitter dataset. (a) Enquiries and corrections; (b) personal description; (c) verified users; (d) users reputation

  •   

    Algorithm 1 基于分层注意力网络的社交媒体谣言检测模型的训练算法

    Require:训练数据集事件集合 $E=\{e_{1},e_{2},\ldots\}$,

    其中$e_{i}=\{(m_{i,j},t_{i,j})\}_{j=1}^{n_{i}}$; 事件对应的真实标签集合$Y^{*}=\{y^{*}_{1},y^{*}_{2},\ldots\}$.

    初始化模型参数集合$\theta$, 最大迭代次数MAXEPOCH, 当前迭代次数${\rm~epoch}~\Leftarrow~1$;

    while ${\rm~epoch}~\leq~{\rm~MAXEPOCH}$ do

    对各个事件$e_{i}$, 计算对应的预测标签$L_{i}$;

    根据式(16)计算损失值Loss;

    根据Loss的值利用Adam优化算法更新参数集合$\theta$;

    ${\rm~epoch}~\Leftarrow~{\rm~epoch}+1$;

    end while

    训练后得到的最优的模型参数集合$\theta$.

  • 1   Table 1The features of the time intervals
    Feature Description
    Content-based features Text vector Calculated by utilizing doc2vec
    Enquiries and corrections % of microblogs with enquiries and corrections
    Length of microblogs Average length of microblogs
    User-based features Personal description % of users that provide personal description
    Verified users % of verified users
    Users reputation Followers/followees ratio
    Users activeness Followees/followers ratio
  • 2   Table 2The regular expression list of enquiries and corrections
    Chinese English
    (?$:$这$|$那$|$它)是真的吗 is$\backslash$s(?$:$that$|$this$|$it)$\backslash$s true
    什么[?!][?1]* wh[a]*t[?!][?1]*
    真的?$|$真的? $|$求证$|$真的假的$|$真的吗$|$未经证实 real?$|$really?$|$unconfirmed
    谣言$|$揭穿 rumor$|$debunk
    (?:那$|$这$|$它)不是真的$|$假的 (?:that$|$this$|$it)$\backslash$s is$\backslash$s not$\backslash$s true
  • 3   Table 3Statistics of the dataset
    Statistic Sina Weibo Twitter
    Events# 4664 992
    Rumors# 2313 498
    Non-rumors# 2351 494
    Microblogs# 3805656 1101985
    Users # 2746818 491229
    Average time length/event (h) 2460.7 1582.6
    Average # of posts/event 816 1111
    Max # of posts/event 59318 62827
    Min # of posts/event 10 10
  • 4   Table 4Rumor detection results (R: rumor, N: non-rumor)$^{\rm~a)}$
    Method Class Sina Weibo Twitter
    Accuracy Precison Recall $F1$ Accuracy Precison Recall $F1$
    DT-Rank R 0.732 0.738 0.715 0.726 0.614 0.618 0.604 0.611
    N 0.726 0.749 0.737 0.609 0.623 0.616
    DTC R 0.831 0.847 0.815 0.831 0.709 0.690 0.772 0.729
    N 0.815 0.847 0.830 0.733 0.643 0.685
    SVM-TS R 0.857 0.839 0.885 0.861 0.716 0.689 0.793 0.738
    N 0.878 0.830 0.857 0.754 0.639 0.692
    GRU-2 R 0.910 0.876 0.956 0.914 0.723 0.712 0.743 0.727
    N 0.952 0.864 0.906 0.735 0.704 0.719
    CAMI R 0.933 0.921 0.945 0.933 0.752 0.722 0.814 0.765
    N 0.945 0.921 0.932 0.790 0.690 0.737
    CSI R 0.953 0.930 0.976 0.954 0.773 0.806 0.714 0.758
    N 0.977 0.931 0.953 0.746 0.831 0.787
    HAN-FC R 0.968 0.966 0.974 0.970 0.787 0.778 0.800 0.789
    N 0.971 0.962 0.967 0.797 0.775 0.786

    a) Values in bold represent the best result in each category among all methods.

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1