SCIENTIA SINICA Informationis, Volume 48, Issue 5: 574-588(2018) https://doi.org/10.1360/N112017-00222

## Neural machine translation with constraints

• AcceptedApr 16, 2018
• PublishedMay 11, 2018
Share
Rating

### Abstract

Neural machine translation (NMT), powered by deep learning, is an emerging machine translation paradigm that has been advancing rapidly in recent years. It has become mainstream technology in both academia and industry of machine translation. This paper provides an overview of our research work on NMT. It particularly focuses on a series of NMT models proposed for considering a variety of useful information and knowledge constraints, which include variational NMT with constraints of latent variables, NMT advised by statistical machine translation, and NMT with syntactical constraints from the source language. In addition to this overview, this paper presents an outlook of the future trends in NMT.

### References

[1] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of Workshop on Neural Information Processing Systems, Montreal, 2014. 3104--3112. Google Scholar

[2] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of International Conference on Learning Representations (ICLR), San Diego, 2015. Google Scholar

[3] Koehn P. Statistical Machine Translation. Cambridge: Cambridge University Press, 2009. Google Scholar

[4] Xiong D Y, Zhang M. Linguistically Motivated Statistical Machine Translation: Models and Algorithms. Berlin: Springer, 2015. Google Scholar

[5] Och F J, Ney H. Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, 2002. 295--302. Google Scholar

[6] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, 2013. 3111--3119. Google Scholar

[7] Junczys-Dowmunt M, Dwojak T, Hoang H. Is neural machine translation ready for deployment? a case study on 30 translation directions. 2016,. arXiv Google Scholar

[8] Jean S, Firat O, Cho K, et al. Montreal neural machine translation systems for WMT15. In: Proceedings of the 10th Workshop on Statistical Machine Translation (WMT), Lisboa, 2015. 134--140. Google Scholar

[9] Wu Y H, Schuster M, Chen Z F, et al. Google's neural machine translation system: bridging the gap between human and machine translation. 2016,. arXiv Google Scholar

[10] Kuang S H, Xiong D Y. Automatic long sentence segmentation for neural machine translation. In: Proceedings of Conference on Natural Language Processing and Chinese Computing (NLPCC), Kunming, 2016. Google Scholar

[11] Jean S, Cho K, Memisevic R, et al. On using very large target vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Metting on Association for Computational Linguistics (ACL), Beijing, 2015. Google Scholar

[12] Tu Z P, Lu Z D, Liu Y, et al. Modeling coverage for neural machine translation. In: Proceedings of the 54th Annual Metting on Association for Computational Linguistics (ACL), Berlin, 2016. 76--85. Google Scholar

[13] Kingma D R, Welling M. Auto-encoding variational bayes. In: Proceedings of International Conference on Learning Representations (ICLR), Banff, 2014. Google Scholar

[14] Rezende D J, Mohamed S, Wierstra D. Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, 2014. 1278--1286. Google Scholar

[15] Kingma D P, Mohamed S, Rezende D J, et al. Semi-supervised learning with deep generative models. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), Montreal, 2014. 3581--3589. Google Scholar

[16] Chung J Y, Kastner K, Dinh L, et al. A recurrent latent variable model for sequential data. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), Montreal, 2015. 2980--2988. Google Scholar

[17] Miao Y S, Yu L, Blunsom P. Neural variational inference for text processing. In: Proceedings of the 33nd International Conference on Machine Learning (ICML), New York, 2016. 1727--1736. Google Scholar

[18] Bowman S R, Vilnis L, Vinyals O, et al. Generating sentences from a continuous space. In: Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL), Berlin, 2016. 10--21. Google Scholar

[19] Li Z F, Eisner J, Khudanpur S. Variational decoding for statistical machine translation. In: Proceedings of the 47th Annual Meeting on Association for Computational Linguistics (ACL), Singapore, 2009. 593--601. Google Scholar

[20] He W, He Z J, Wu H, et al. Improved neural machine translation with SMT features. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, 2016. 151--157. Google Scholar

[21] Stahlberg F, Hasler E, Waite A, et al. Syntactically guided neural machine translation. In: Proceedings of the 54th Annual Metting on Association for Computational Linguistics (ACL), Berlin, 2016. 299--305. Google Scholar

[22] Arthur P, Neubig G, Nakamura S. Incorporating discrete translation lexicons into neural machine translation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, 2016. 1557--1567. Google Scholar

[23] Dahlmann L, Matusov E, Petrushkov P, et al. Neural machine translation leveraging phrase-based models in a hybrid search. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, 2017. 1411--1420. Google Scholar

[24] Niehues J, Cho E, Ha T L, et al. Pre-translation for neural machine translation. In: Proceedings of the 26th International Conference on Computational Linguistics (COLING), Osaka, 2016. 1828--1836. Google Scholar

[25] Zhou L, Hu W P, Zhang J J, et al. Neural system combination for machine translation. In: Proceedings of Annual Meeting on Association for Computational Linguistics (ACL), Vancouver, 2017. 378--384. Google Scholar

[26] Eriguchi A, Hashimoto K, Tsuruoka Y. Tree-to-sequence attentional neural machine translation. In: Proceedings of the 54th Annual Metting on Association for Computational Linguistics (ACL), Berlin, 2016. 823--833. Google Scholar

[27] Sennrich R, Haddow B. Linguistic input features improve neural machine translation. In: Proceedings of the 1st Conference on Machine Translation, Berlin, 2016. 83--91. Google Scholar

[28] Shi X, Padhi I, Knight K. Does string-based neural MT learn source syntax? In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, 2016. 1526--1534. Google Scholar

[29] Wu S Z, Zhang D D, Yang N, et al. Sequence-to-dependency neural machine translation. In: Proceedings of Annual Meeting on Association for Computational Linguistics (ACL), Vancouver, 2017. 698--707. Google Scholar

[30] Zhang B, Xiong D, Su J S, et al. Variational neural machine translation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, 2016. 521--530. Google Scholar

[31] Chung J Y, Gulcehre C, Cho J, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. In: Proceedings of NIPS Deep Learning and Representation Learning Workshop, Montreal, 2014. Google Scholar

[32] Luong M T, Sutskever I, Quoc V. Addressing the rare word problem in neural machine translation. In: Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL), Beijing, 2015. 11--19. Google Scholar

[33] Wang X, Lu Z D, Tu Z P, et al. Neural machine translation advised by statistical machine translation. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 3330--3336. Google Scholar

[34] Wang X, Tu Z P, Xiong D Y, et al. Translating phrases in neural machine translation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, 2017. 1421--1431. Google Scholar

[35] Liu Y, Liu Q, Lin S X. Tree-to-string alignment template for statistical machine translation. In: Proceedings of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING-ACL), Sydney, 2006. 609--616. Google Scholar

[36] Shen L B, Xu J X, Weischedel R. A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of the Annual Meeting on Association for Computational Linguistics with the Human Language Technology Conference (ACL-HLT), Columbus, 2008. 577--585. Google Scholar

[37] Xiong D Y, Liu Q, Lin S X. Maximum entropy based phrase reordering model for statistical machine translation. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (ACL), Sydney, 2006. 521--528. Google Scholar

[38] Xiong D Y, Liu Q, Lin S X. A dependency treelet string correspondence model for statistical machine translation. In: Proceedings of the 2nd Workshop on Statistical Machine Translation (WMT), Prague, 2007. 40--47. Google Scholar

[39] Li J H, Resnik P, Daumé H. Modeling syntactic and semantic structures in hierarchical phrase-based translation. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Atlanta, 2013. 540--549. Google Scholar

[40] Marton Y, Resnik P. Soft syntactic constraints for hierarchical phrased-based translation. In: Proceedings of the Annual Meeting on Association for Computational Linguistics with the Human Language Technology Conference (ACL-HLT), Columbus, 2008. 1003--1011. Google Scholar

[41] Xiong D Y, Zhang M, Aw A. Linguistically annotated reordering: evaluation and analysis. Comput Linguist, 2010, 36: 535-568 CrossRef Google Scholar

[42] Li J H, Xiong D Y, Tu Z P, et al. Modeling source syntax for neural machine translation. In: Proceedings of Annual Meeting on Association for Computational Linguistics (ACL), Vancouver, 2017. 688--697. Google Scholar

[43] Choe D K, Charniak E. Parsing as language modeling. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, 2016. 2331--2336. Google Scholar

[44] Vinyals O, Kaiser L, Koo T, et al. Grammar as a foreign language. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), Montreal, 2015. Google Scholar

[45] Gehring J, Auli M, Grangier D, et al. A convolutional encoder model for neural machine translation. In: Proceedings of Annual Meeting on Association for Computational Linguistics (ACL), Vancouver, 2017. 123--135. Google Scholar

[46] Ashish V, Noam S, Niki P, et al. Attention is all you need. In: Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, 2017. 6000--6010. Google Scholar

• Figure 1

Graphical model for latent variable constrained variational neural machine translation. We use symbols ${\boldsymbol~z}$, ${\boldsymbol~x}$ and ${\boldsymbol~y}$ to denote the latent variable, source sentence and target sentence, respectively

• Figure 2

Overall architecture for latent variable constrained variational neural machine translation

• Figure 3

(Color online) Convergence of VNMT

• Figure 4

The model that integrates word-level SMT constraints into NMT

• Figure 5

The model that integrates phrase-level SMT constraints into NMT

• Figure 6

Examples of NMT translation. An example of (a) discontinuous translation, (b) over translation

• Figure 7

An example of a source sentence. (a) Word sequence; (b) its syntactic parse tree; (c) its syntactic label sequence

• Figure 8

Parallel RNN encoder

• Figure 9

Hierarchical RNN encoder

• Figure 10

Mixed RNN encoder

• Table 1   Numbers of content words in NIST08 test set generated by the baseline system and the proposed system
 ALL REMOVE Reference 20481 19489 RNNSearch 13230 11007 +SMTrec/gate 12665 11172
• Table 2   Percentages (%) of syntactic phrases in our test sets being translated continuously, discontinuously, or not being translated. Here PP is for prepositional phrase, NP for noun phrase, CP for clause headed by a complementizer, and QP for quainter phrase
 System XP Continuously Discontinuously Untranslated RNNSearch PP 57.3 33.6 9.1 NP 59.8 25.5 14.7 CP 47.3 44.6 8.1 QP 54.0 22.2 23.8 ALL 58.1 27.1 14.8 Mixed RNN PP 63.3 27.5 9.2 NP 63.1 23.1 13.8 CP 54.5 36.6 8.9 QP 56.2 19.7 24.1 ALL 60.4 25.0 14.6

Citations

• #### 0

Altmetric

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有