logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 10 : 202102(2020) https://doi.org/10.1007/s11432-020-2982-y

Hierarchical LSTM with char-subword-word tree-structure representation for Chinese named entity recognition

More info
  • ReceivedApr 22, 2020
  • AcceptedMay 14, 2020
  • PublishedSep 16, 2020

Abstract

Chinese named entity recognition (CNER) aims to identify entity names such as person names and organization names from Chinese raw text and thus can quickly extract the entity information that people are concerned about from large-scale texts. Recent studies attempt to improve performance by integrating lexicon words into char-based CNER models. These existing studies, however, usually focus on leveraging the context-free words in lexicon without considering the contextual information of words and subwords in the sentences. To address this issue, in addition to utilizing the lexicon words, we further propose to construct a hierarchical tree structure representation composed of characters, subwords and context-aware predicted words from segmentor to represent each sentence for CNER. Based on the tree-structure representation, we propose a hierarchical long short-term memory (HiLSTM) framework, which consists of hierarchical encoding layer, fusion layer and CRF layer, to capture linguistic knowledge at different levels. On the one hand, the interactions within each level help to obtain the contextual information. On the other hand, the propagations from the lower-levels to the upper-levels can provide additional semantic knowledge for CNER. Experimental results on three widely used CNER datasets show that our proposed HiLSTM model achieves significant improvement over several strong benchmark methods.


Acknowledgment

This work was supported by National Science Fund for Distinguished Young Scholars (Grant No. 61525205), National Natural Science Foundation of China (Grant No. 61876116), and Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.


References

[1] Chen Y B, Xu L H, Liu K, et al. Event extraction via dynamic multi-pooling convolutional neural networks. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2015. 167--176. Google Scholar

[2] Miwa M, Bansal M. End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2016. 1105--1116. Google Scholar

[3] Diefenbach D, Lopez V, Singh K, et al. Core techniques of question answering systems over knowledge bases: a survey. Knowl Inform Syst, 2018. https://hal.archives-ouvertes.fr/hal-01637143/document. Google Scholar

[4] Yang J F, Guan Y, He B, et al. Corpus construction for named entities and entity relations on Chinese electronic medical records. J Softw, 2016, 27: 2725--2746. Google Scholar

[5] Song L F, Zhang Y, Gildea D, et al. Leveraging dependency forest for neural medical relation extraction. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2019. 208--218. Google Scholar

[6] Tian Y H, Ma W C, Xia F, et al. ChiMed: a Chinese medical corpus for question answering. In: Proceedings of the 18th BioNLP Workshop and Shared Task, 2019. 250--260. Google Scholar

[7] Saito K, Nagata M. Multi-language named-entity recognition system based on HMM. In: Proceedings of Annual Meeting of the Association for Computational Linguistics Workshop on Multilingual and Mixed-language Named Entity Recognition, 2003. Google Scholar

[8] Yu H K, Zhang H P, Liu Q, et al. Chinese named entity identification using cascaded hidden Markov model. J Commun, 2006, 27: 87--94. Google Scholar

[9] Solorio T, L'ópez A. Learning named entity classifiers using support vector machines. In: Proceedings of Conference on Computational Linguistics and Natural Language Processing (CICLing), 2004. 158--167. Google Scholar

[10] Mccallum A, Li W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2003. 188--191. Google Scholar

[11] Chiu J P C, Nichols E. Named Entity Recognition with Bidirectional LSTM-CNNs. Trans Association Comput Linguistics, 2016, 4: 357-370 CrossRef Google Scholar

[12] Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2016. 260--270. Google Scholar

[13] Liu L Y, Shang J B, Xu F, et al. Empower sequence labeling with task-aware neural language model. In: Proceedings of Association for the Advance of Artificial Intelligence (AAAI), 2018. 5253--5260. Google Scholar

[14] Dong C H, Zhang J J, Zong C Q, et al. Character-based LSTM-CRF with radical-level features for Cinese named entity recognition. In: Proceedings of International Conference on Computer Processing of Oriental Languages (ICCPOL), 2016. 239--250. Google Scholar

[15] Zhang Y, Yang J. Chinese NER using lattice LSTM. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2018. 1554--1564. Google Scholar

[16] Gui T, Zou Y C, Zhang Q, et al. A lexicon-based graph neural network for Chinese NER. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2019. 1040--1050. Google Scholar

[17] Liu W, Xu T G, Xu Q H, et al. An encoding strategy based word-character LSTM for Chinese NER. In: Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL), 2019. 2379--2389. Google Scholar

[18] Sui D B, Chen Y B, Liu K, et al. Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2019. 3828--3838. Google Scholar

[19] Gong C, Li Z H, Zhang M, et al. Multi-grained Chinese word segmentation. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2017. 703--714. Google Scholar

[20] Heinzerling B, Strube M. BPEmb: tokenization-free pre-trained subword embeddings in 275 languages. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2018. 2989--2993. Google Scholar

[21] Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch. J Mach Learn Res, 2011, 12: 2493--2537. Google Scholar

[22] He H F, Sun X. A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media. In: Proceedings of Association for the Advance of Artificial Intelligence (AAAI), 2017. 3216--3222. Google Scholar

[23] Zhao H, Kit C Y. Unsupervised segmentation helps supervised learning of character tagging for word segmentation and namedentity recognition. In: Proceedings of SIGHAN Workshop on Chinese Language Processing, 2008. Google Scholar

[24] Peng N Y, Dredze M. Named entity recognition for Chinese social media with jointly trained embeddings. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2015. 548--554. Google Scholar

[25] He H F, Sun X. F-score driven max margin neural network for named entity recognition in Chinese social media. In: Proceedings of European Chapter of the Association for Computational Linguistics (EACL), 2017. 713--718. Google Scholar

[26] Peng N Y, Dredze M. Improving named entity recognition for Chinese social media with word segmentation representation learning. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2016. 149--155. Google Scholar

[27] Cao P F, Chen Y B, Liu K, et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2018. 182--192. Google Scholar

[28] Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2016. 1715--1725. Google Scholar

[29] Weischedel R, Palmer M, Marcus M, et al. OntoNotes Release 4.0. Philadelphia: Linguistic Data Consortium, 2011. Google Scholar

[30] Levow G A. The third international Chinese language processing backoff: Word segmentation and named entity recognition. In: Proceedings of SIGHAN Workshop on Chinese Language Processing, 2006. 108--117. Google Scholar

[31] Noreen E. Computer-Intensive Methods for Testing Hypotheses. Biometrics, 1990, 46: 540--541. Google Scholar

[32] Zhang S X, Qin Y, Wen J, et al. Word segmentation and named entity recognition for SIGHAN bakeoff3. In: Proceedings of SIGHAN Workshop on Chinese Language Processing, 2006. 158--161. Google Scholar

[33] Zhou J S, Qu W G, Zhang F. Chinese named entity recognition via joint identification and categorization. Chinese J Electron, 2013, 22: 225--230. Google Scholar

[34] Li S, Zhao Z, Hu R F, et al. Analogical reasoning on Chinese morphological and semantic relations. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2018. 138--143. Google Scholar

  • Figure 1

    (Color online) The lexicon words (below) and char-subword-word tree-structure representation (upper) of an example.

  • Figure 2

    (Color online) The framework of our proposed HiLSTM model.

  • Table 1  

    Table 1Data statistics

    Dataset Train (#Sent Dev (#Sent Test (#Sent
    OntoNotes 15724 4301 4346
    MSRA 41728 4636 4365
    Weibo NER 1350 270 270
  • Table 2  

    Table 2Development results on OntoNotes. For the “with Lex words only" results of Lattice LSTM and WC-LSTM, we re-run the codes released by Zhang and Yang [15] and Liu et al. [17]. We also modify their models by encoding char-subword-word and give corresponding results in “with char-subword-word hybrid"$^{\rm~a)}$

    Models $P$ (%) $R$ (%) $F1$ (%)
    Char-based LSTM 67.12 58.42 62.47
    Lattice LSTM
    with Lex words only 74.64 68.83 71.62
    with char-subword-word hybrid 76.20 72.26 74.17
    WC-LSTM
    with Lex words only 73.08 68.62 70.31
    with char-subword-word hybrid 79.73 69.31 74.15
    HiLSTM
    with Lex words only 74.12 67.41 70.60
    with char-subword-word hybrid 76.84 73.06 74.90

    a) We use bold font to mark the best $F1$ score in each major row.

  • Table 3  

    Table 3Final results on OntoNotes, MSRA, and Weibo NER test data$^{\rm~a)}$

    Model OntoNotes (%) MSRA (%) Weibo NER (%)
    $P$ $R$ $F1$ $P$ $R$ $F1$ $F1$ (NE) $F1$ (NM) $F1$ (All)
    Maximum entropy [32] 92.20 90.18 91.18
    Global linear [33] 91.86 88.75 90.28
    Radical-level LSTM [14] 91.28 90.62 90.95
    Unified model [22] 54.50 62.17 58.23
    MTL [26] 55.28 62.97 58.99
    Adversarial MTL [27] 91.73 89.58 90.64 54.34 57.35 58.70
    GNN$\dagger$ [16] 76.13 73.68 74.89 94.19 92.73 93.46 55.34 64.98 60.21
    Collaborative GNN$\star$ [18] 75.06 74.52 74.79 94.01 92.93 93.47 56.45 68.43 63.09
    Collaborative GNN$\dagger$ [18] 74.42 72.60 73.50 92.52 90.29 91.39 50.57 64.50 56.33
    Lattice LSTM$\dagger$ [15] 76.35 71.56 73.88 93.57 92.79 93.18 53.04 62.25 58.79
    WC-LSTM$\dagger$ [17] 76.09 72.85 74.43 94.58 92.91 93.74 53.19 67.41 59.84
    Char-based LSTM 68.79 60.35 64.30 90.74 86.96 88.81 46.11 55.29 52.77
    Our HiLSTM 77.77 76.32 77.04 94.83 93.61 94.22 60.94 68.89 63.79

    a) We use bold font to mark the best result in each major row.

  • Table 4  

    Table 4Ablation study. “Lex words” and “Seg words” represent lexicon words and segmentor words$^{\rm~a)}$

    Model OntoNotes $F1$ (%) MSRA $F1$ (%) Weibo $F1$ (%)
    Complete HiLSTM 74.90 94.87 69.71
    w/o Seg words 72.91 94.81 67.28
    w/o Subwords 73.55 94.38 66.67
    w/o Lex words 73.78 94.31 67.72
    w/o Seg words & Subwords 70.60 94.39 65.86
    w/o Seg words & Lex words 71.17 92.09 67.81
    w/o Subwords & Lex words 70.46 93.93 65.47
    Char-based LSTM 62.47 90.59 59.43

    a) We use bold font to mark the best result.

  • Table 5  

    Table 5Case study$^{\rm~a)}$

    Id Cases
    1 Sentence $\ldots$对中国进出口银行有较深的了解 (have a relatively deep understanding of the Export-Import Bank of China)$\ldots$
    Lex words $\ldots$中国 (China), 进出 (in and out), 进出口 (imports and exports), 出口 (exit), 银行 (bank), 了解 (understand)$\ldots$
    Seg words $\ldots$对 (to), 中国 (China), 中国进出口银行 (the export-import bank of China), 进出口 (imports and exports), 银行 (bank), 有 (have), 较深 (relatively deep), 较 (relatively), 深 (deep), 的 (de), 了解 (understand)$\ldots$
    Subwords $\ldots$对 (to), 中国 (China), 进 (in), 出 (out), 口 (entrance), 银行 (bank), 有 (have), 较 (relatively), 深 (deep), 的 (de), 了解 (understand)$\ldots$
    Gold result $\ldots$对 (to) 中国进出口银行 (the export-import bank of China) [ORG] 有了较深的了解 (have a relatively deep understanding)$\ldots$
    w/o Seg words predicted result $\ldots$对 (to) _中国 (China) [GPE] 进出口银行 (the export-import bank) 有了较深的了解 (have a relatively deep understanding)$\ldots$
    with Seg words predicted result $\ldots$对 (to) 中国进出口银行 (the export-import bank of China) [ORG] 有了较深的了解 (have a relatively deep understanding)$\ldots$
    2 Sentence $\ldots$充分利用毗邻港澳的优势 (make full use of the advantages of adjoining Hong Kong and Macao)$\ldots$
    Lex words $\ldots$充分 (full), 利用 (use), 毗邻 (adjoin), 港澳 (Hong Kong and Macao), 优势 (advantage)$\ldots$
    Seg words $\ldots$充分 (full), 利用 (use), 毗邻 (adjoin), 港澳 (Hong Kong and Macao), 的 (de), 优势 (advantage)$\ldots$
    Subwords $\ldots$充分 (full), 利用 (use), 毗 (connect), 邻 (neighbour), 港 (Hong Kong), 澳 (Macao), 的 (de), 优势 (advantage)$\ldots$
    Gold result $\ldots$充分利用毗邻 (make full use of the adjoining) 港 (Hong Kong) [GPE] 澳 (Macao) [GPE] 的优势 (advantage)$\ldots$
    w/o Subwords predicted result $\ldots$充分利用毗邻 (make full use of the adjoining) _港澳 (Hong Kong and Macao) [LOC] 的优势 (advantage)$\ldots$
    with Subwords predicted result $\ldots$充分利用毗邻 (make full use of the adjoining) 港 (Hong Kong) [GPE] 澳 (Macao) [GPE] 的优势 (advantage)$\ldots$

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号