SCIENTIA SINICA Informationis, Volume 48, Issue 11: 1533-1545(2018) https://doi.org/10.1360/N112018-00157

Combining entity co-occurrence information and sentence semantic features for relation extraction

More info
  • ReceivedOct 5, 2018
  • AcceptedOct 30, 2018
  • PublishedNov 14, 2018


Relation extraction is one of the most important tasks in information extraction and a key step in knowledge graph construction. The existing relation extraction approaches mostly try to capture semantic features for entity pairs at the sentence level, which might ignore the global context information of the entities in the entire corpus. In this paper, we propose a novel neural network model for relation extraction, named CNSSNN, which combines the information of entity co-occurrences with sentences' semantic features. In this model, we first build an entity co-occurrence network from the corpus. Then, we introduce a network-level attention mechanism to capture network environmental information selectively and generate the corpus-level global context features for the entities. At the same time, we employ a bi-directional gated recurrent unit (bi-GRU) network to extract sentence-level semantic features for entity pairs. Finally, we combine the corpus-level features and the sentence-level features to classify relations. The experimental results, over a manually labeled dataset, show that our approach consistently outperforms other existing approaches in terms of both precision and recall.

Funded by



[1] Jurafsky D, Martin J. Speech and Language Processing. Beijing: Publishing House of Electronics Industry, 2018. Google Scholar

[2] Santos C N D, Xiang B, Zhou B W. Classifying relations by ranking with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015. Google Scholar

[3] Appelt D E, Bear J, Hobbs J R, et al. SRI international FASTUS system: MUC-4 test results and analysis. In: Proceedings of the 4th Conference on Message Understanding, 1992. 143--147. Google Scholar

[4] Yangarber R, Grishman R. NYU: description of the Proteus/PET system as used for MUC-7 ST. In: Proceedings of the 6th Message Understanding Conference, 1998. Google Scholar

[5] Zhou G D, Su J, Zhang J, et al. Exploring various knowledge in relation extraction. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005. Google Scholar

[6] Bin X I, Qian L H, Zhou G D, et al. The application of combined linguistic features in semantic relation extraction. J Chinese Inf Process, 2008, 22: 44--49. Google Scholar

[7] Miao Q L, Zhang S, Zhang B, et al. Extracting and visualizing semantic relationships from Chinese biomedical text. In: Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, 2012. 99--107. Google Scholar

[8] Zeng D J, Liu K, Lai S W, et al. Relation classification via convolutional deep neural network. In: Proceedings of the 25th International Conference on Computational Linguistics, 2014. 23--29. Google Scholar

[9] Wang L L, Cao Z, Melo G D, et al. Relation classification via multi-level attention CNNs. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016. 1298--1307. Google Scholar

[10] Ji G L, Liu K, He S Z, et al. Distant supervision for relation extraction with sentence-level attention and entity descriptions. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017. 3060--3066. Google Scholar

[11] Socher R, Huval B, Manning C D, et al. Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012. 1201--1211. Google Scholar

[12] Hashimoto K, Miwa M, Tsuruoka Y, et al. Simple customization of recursive neural networks for semantic relation classification. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2013. 1372--1376. Google Scholar

[13] Miwa M, Bansal M. End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016. 1105--1116. Google Scholar

[14] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 2014,. arXiv Google Scholar

[15] Hendrickx I, Su N K, Kozareva Z, et al. SemEval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions, 2009. 94--99. Google Scholar

[16] Cortes C, Vapnik V. Support-vector networks. Mach Learn, 1995, 20: 273--297. Google Scholar

[17] Jaynes E T. Information Theory and Statistical Mechanics. Phys Rev, 1957, 106: 620-630 CrossRef ADS Google Scholar

[18] Lafferty J D, Mccallum A, Pereira F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, 2001. 282--289. Google Scholar

[19] Zhang Y M, Zhou J F. A trainable method for extracting Chinese entity names and their relations. In: Proceedings of the 2nd Workshop on Chinese Language Processing: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, 2000. 66--72. Google Scholar

[20] Suchanek F M, Ifrim G, Weikum G. Combining linguistic and statistical analysis to extract relations from web documents. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. 712--717. Google Scholar

[21] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst, 2013, 26: 3111--3119. Google Scholar

[22] Nanda K. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In: Proceedings of ACL 2004 on Interactive Poster and Demonstration Sessions, 2013. Google Scholar

[23] Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016. 207--212. Google Scholar

[24] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. 2013,. arXiv Google Scholar

[25] Cho K, Merrienboer B V, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. 2014,. arXiv Google Scholar

[26] Lin Z H, Feng M W, Santos C N D, et al. A structured self-attentive sentence embedding. In: Proceedings of International Conference on Learning Representations (ICLR), 2017. Google Scholar

[27] Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR), 2015. Google Scholar

  • Figure 1

    (Color online) Overall framework of CNSSNN

  • Figure 2

    (Color online) P-R curves of different approaches. (a) SemEval; (b) CnNews

  • 1   Table 1Symbols and their description
    Symbol Description
    $C$ The corpus
    $s$ A sentence in corpus $C$
    $\boldsymbol{S}$ The matrix representation of a sentence $s$
    $w$ A word in a sentence
    $\boldsymbol{w}$ The vector representation of a word $w$
    $e$ An entity
    $\boldsymbol{f}^c$ Corpus-level features
    $\boldsymbol{f}^s$ Sentence-level features
    $\boldsymbol{f}$ Features of entity pair after features combination
  • 2   Table 2Numbers of samples of each label in the labeled dataset
    Label Number of samples
    “hold" 1031
    “study at" 923
    “work at" 3033
    “others" 4053
    Total 9040
  • 3   Table 3Numbers of samples of each label in the SemEval dataset
    Label Number of samples
    “others" 1864
    “cause-effect" 1331
    “instrument-agency" 660
    “product-producer" 948
    “content-container" 732
    “entity-origin" 974
    “entity-destination" 1137
    “component-whole" 1253
    “member-collection" 923
    “message-topic" 895
    Total 10717
  • 4   Table 4Hyper-parameter setting of CNSSNN
    Hyper-parameter u layer_num $q$ batchsize learning_rate $d$ $d_p$
    Value 100 1 64 250 1E$-$4 400 1
  • 5   Table 5Performance comparison of different relation extraction approaches on all labels
    Model $F1$ on SemEval (%) $F1$ on CnNews (%)
    CNN 80.43 85.32
    CR-CNN 81.09 86.47
    GRU 81.52 86.83
    ATT-GRU 83.69 88.15
    CNSSNN (ours) 85.99 90.34
  • 6   Table 6Performance comparison of different relation extraction approaches without “other” label
    Model SemEval $F1$ on CnNews (%)
    Precision (%) Recall (%) $F1$(%) Precision (%) Recall (%) $F1$(%)
    CNN 84.00 79.82 81.76 86.79 82.87 84.83
    CR-CNN 84.20 80.82 82.40 86.85 85.99 85.86
    GRU 82.85 81.07 81.94 85.91 86.16 86.21
    ATT-GRU 85.19 83.07 84.06 87.21 87.93 87.58
    CNSSNN (ours) 87.51 85.66 86.56 92.83 86.15 89.72

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有