logo

SCIENTIA SINICA Informationis, Volume 47, Issue 8: 1036(2017) https://doi.org/10.1360/N112016-00281

Learning dependency edge transfer rule representation using encoder-decoder

More info
  • ReceivedMar 8, 2016
  • AcceptedApr 2, 2016
  • PublishedJun 20, 2017

Abstract

In existing statistical machine translation models, especially syntax-based models, there has always been a trade-off between the amount of information a translation unit preserves and its ability to generalize when translating new sentences. Neural networks have been successfully employed in reordering and end-to-end machine translation problems. In this paper, we propose a novel syntactic translation rule encoder-decoder based on neural networks. It is a dependency edge transfer rule encoder-decoder (DETED) that leverages the source side of a transfer rule and local context as input, and outputs the target side of that in order to learn the source-to-target matching of the dependency edge transfer rules. It shares not only the benefit of dependency edge, which is the most relaxed syntactic constraint, in order to ensure its generalization ability, but also the local context as additional information in order to improve its matching ability. The structure of the encoder-decoder is quite concise. With the source side of a translation rule as the input, it decodes the corresponding target side of the translation rule, and makes it clear the positional relation of the dependency edge. The generator is used to re-score the transfer rules when decoding. Experiments on three NIST test sets are presented. The results indicate a significant performance improvement with an average BLEU score of 1.39 above the baseline value.


Funded by

国家自然科学基金(61379086)


References

[1] Brown P F, Pietra V J D, Pietra S A D, et al. The mathematics of statistical machine translation: parameter estimation. Comput Linguist, 1993, 19: 263--311. Google Scholar

[2] Koehn P, Och F J, Marcu D. Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, 2003. 48--54. Google Scholar

[3] Marcu D, Wong W. A phrasebased, joint probability model for statistical machine translation. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Philadelphia, 2002. 133--139. Google Scholar

[4] Chiang D. A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, 2005. 263--270. Google Scholar

[5] Ding Y, Palmer M. Synchronous dependency insertion grammars: a grammar formalism for syntax based statistical MT. In: Proceedings of the Workshop on Recent Advances in Dependency Grammars, 2004. 90--97. Google Scholar

[6] Graehl J, Knight K. Training tree transducers. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, 2004. 105--112. Google Scholar

[7] Huang L, Knight K, Joshi A. Statistical syntax-directed translation with extended domain of locality. In: Proceedings of the 7th Biennial Conference of the Association for Machine Translation in the Americas, Cambridge, 2006. 66--73. Google Scholar

[8] Lin D. A path-based transfer model for machine translation. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, 2004. 625--630. Google Scholar

[9] Liu Y, Liu Q, Lin S X. Tree-to-string alignment template for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, 2006. 609--616. Google Scholar

[10] Quirk C, Menezes A, Cherry C. Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, 2005. 271--279. Google Scholar

[11] Shen L B , Xu J X, Weischedel R. A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, Columbus, 2008. 577--585. Google Scholar

[12] Xie J, Mi H T, Liu Q. A novel dependency-to-string model for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, 2011. 216--226. Google Scholar

[13] Yamada K, Knight K. A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Toulouse, 2001. 523--530. Google Scholar

[14] Chen H S, Xie J, Meng F D, et al. A dependency edge-based transfer model for statistical machine translation. In: Proceedings of the 25th International Conference on Computational Linguistics, Dublin, 2014. 1103--1113. Google Scholar

[15] Li P, Liu Y, Sun M S. Recursive autoencoders for ITG-based translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Washington, 2013. 567--577. Google Scholar

[16] Li P, Liu Y, Sun M S, et al. A neural reordering model for phrase-based translation. In: Proceedings of the International Conference on Computational Linguistics, Dublin, 2014. 1897--1907. Google Scholar

[17] Auli M, Galley M, Quirk C, et al. Joint language and translation modeling with recurrent neural networks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Seattle, 2013. 1044--1054. Google Scholar

[18] Cho K, van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1724--1734. Google Scholar

[19] Devlin J, Zbib R, Huang Z Q, et al. Fast and robust neural network joint models for statistical machine translation. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, Baltimore, 2014. 1370--1380. Google Scholar

[20] Kalchbrenner N, Blunsom P. Recurrent continuous translation models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Seattle, 2013. 1700--1709. Google Scholar

[21] Schwenk H. Continuous space translation models for phrase-based statistical machine translation. In: Proceedings of the International Conference on Computational Linguistics, Mumbai, 2012. 1071--1080. Google Scholar

[22] Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, 2002. 311--318. Google Scholar

[23] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Machine Learn Res, 2003. 1137--1155. Google Scholar

[24] Och F J, Ney H. Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, Philadelphia, 2002. 295--302. Google Scholar

[25] Och F J. Minimum error rate training in statistical machine translation. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, Sapporo, 2003. 160--167. Google Scholar

[26] Chang P C, Tseng H, Jurafsky D, et al. Discriminative reordering with Chinese grammatical relations features. In: Proceedings of the 3rd Workshop on Syntax and Structure in Statistical Translation, Boulder, 2009. 51--59. Google Scholar

[27] Levy R, Manning C. Is it harder to parse Chinese, or the Chinese treebank? In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, Sapporo, 2003. 439--446. Google Scholar

[28] Koehn P, Hoang H, Birch A, et al. Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, 2007. 177--180. Google Scholar

[29] Stolcke A. Srilm---an extensible language modeling toolkit. In: Proceedings of the International Conference on Spoken Language Processing, Denver, 2002. 901--904. Google Scholar

[30] Zeiler M D. Adadelta: an adaptive learning rate method,. arXiv Google Scholar

[31] Xiong D Y, Liu Q, Lin S X. Maximum entropy based phrase reordering model for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, 2006. 521--528. Google Scholar

[32] van Nguyen V, Shimazu A, Nguyen M L, et al. Improving a lexicalized hierarchical reordering model using maximum entropy. In: Proceedings of MT Summit XII, Ottawa, 2009. Google Scholar

  • Figure 1

    Dependency edge transfer rules. (a) Dependency tree and word alignment; (b) extracted dependency edge transfer rules; (c) generalisation of transfer rules; (d) ambiguities of dependency edge transfer rules

  • Figure 2

    Dependency edge transfer translation process. (a), (b) Analysis; (c) transfer; (d)$\sim$(f) generation

  • Figure 3

    Dependency edge transfer rule encoder-decoder

  • Figure 4

    Source dependency edge encoder

  •   

    Algorithm 1 在翻译解码时使用依存边转换翻译规则编码解码器计算目标端依存边的概率

    Require: 源端依存树的节点$n$; 依存边转换翻译规则集合$R$; 依存边转换翻译规则编码解码器$D$;

    Output: 节点$n$作为头节点时, 所有源端依存边对应的候选目标端依存边的概率集合$P$;

    if $n$ 不是叶子节点 then

    抽取该节点与其所有依存节点之间的源端依存边集合$E$;

    for $e\in E$

    利用$R$, 将$e$投射得到候选目标端依存边集合$F$;

    将$e$输入到$D$中;

    在$D$的输出层, 计算$F$中每条候选目标端依存边的概率$p$;

    将$p$放入集合$P$中;

    end for

    return $P$

    end if

  • Table 1   BLEU-4 scores (%) on NIST MT03$\sim$05 $^{\rm a)b)}$
    System MT03 MT04 MT05 Average
    Moses 32.30 33.43 31.44 32.39
    DEBT 32.57 35.06 31.36 32.99
    +DETED 33.8* 36.58* 32.76* 34.38
  • Table 2   BLEU-4 scores (%) of different components
    System MT03 MT04 MT05 Average
    DEBT 32.57 35.06 31.36 32.99
    +${\rm head}_{\rm tgt}$ 33.52 36.35 31.67 33.85
    +${\rm dep}_{\rm tgt}$ 33.43 35.81 31.40 33.55
    +${\rm lr}_{\rm tgt}$ 33.30 36.32 32.10 33.91
    +${\rm cd}_{\rm tgt}$ 33.41 36.50 32.39 34.10
    +DETED 33.80 36.58 32.76 34.38
  • Table 3   BLEU-4 scores (%) on NIST MT03$\sim$05 test set, with different contexts as input $^{\rm c)d)}$
    System MT03 MT04 MT05 Average
    DEBT 32.57 35.06 31.36 32.99
    +nocon 33.56 36.06 32.37 33.99
    +con1 33.80 36.58 32.76 34.38
    +con2 33.73 36.52 32.45 34.23
    +con3 33.94 36.24 32.49 34.22
  • Table 4   Decoding time cost on NIST MT03
    System Decoding time cost (s) Diff
    Moses 1081.71
    DEBT 1090.89
    +DETED 1297.83 +18.97%

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1