logo

SCIENTIA SINICA Informationis, Volume 48, Issue 11: 1467-1486(2018) https://doi.org/10.1360/N112018-00163

A survey of quantum language models

More info
  • ReceivedJun 22, 2018
  • AcceptedSep 10, 2018
  • PublishedNov 9, 2018

Abstract

Language model is a fundamental research topic in areas related to natural language processing. In recent years, researchers have proposed quantum language models based on the probability theory of quantum mechanics. This paper aims to review the research motivation and the current progress of constructing various quantum language models. First, it reviews the research problems of classical language models. Then, it introduces some quantum language models in information retrieval and speech processing, as well as an end-to-end quantum language model based on neural network architecture. By analyzing the advantages and disadvantages of each quantum language model considered here, taking into account the essential connection between quantum mechanics and neural networks, we outline our vision for future research directions.


Funded by

国家重点研发计划(2017YFE0111900)

国家自然科学基金(U1636203,61772363)


References

[1] Minsky M. Semantic Information Processing. Cambridge: MIT Press, 1968. 440--441. Google Scholar

[2] Schank R. Conceptual Information Processing. Amsterdam: Elsevier Science Inc, 1975. 5--21. Google Scholar

[3] Bendersky M, Croft W B. Modeling higher-order term dependencies in information retrieval using query hypergraphs. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, 2012. 941--950. Google Scholar

[4] Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, 1999. 50--57. Google Scholar

[5] Harris Z S. Distributional Structure. Word, 1954, 10: 146-162 CrossRef Google Scholar

[6] Zhai C X. Statistical language models for information retrieval. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, 2007. 1: 3--4. Google Scholar

[7] Brown P F, Desouza P V, Mercer R L, et al. Class-based n-gram models of natural language. Comput Linguist, 1992, 18: 467--479. Google Scholar

[8] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis. J Am Soc Inf Sci, 1990, 41: 391--407. Google Scholar

[9] Xu W, Rudnicky A. Can artificial neural networks learn language models? In: Procedings of the 6th International Conference on Spoken Language Processing, 2000. Google Scholar

[10] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Mach Learn Res, 2003, 3: 1137--1155. Google Scholar

[11] Sun F, Guo J, Lan Y, et al. Sparse word embeddings using l1 regularized online learning. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, 2016. 2915--2921. Google Scholar

[12] Metzler D, Croft W B. A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, 2005. 472--479. Google Scholar

[13] Sordoni A, Nie J, Bengio Y. Modeling term dependencies with quantum language models for IR. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, 2013. 653--662. Google Scholar

[14] Robins D. Interactive information retrieval: context and basic notions. J Inform Sci, 2000, 3: 57--62. Google Scholar

[15] Magerman D M. Statistical decision-tree models for parsing. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Cambridge, 1995. 276--283. Google Scholar

[16] Bahl L R, Brown P F, de Souza P V. A tree-based statistical language model for natural language speech recognition. IEEE Trans Acoust Speech Signal Processing, 1989, 37: 1001-1008 CrossRef Google Scholar

[17] Rosenfeld R, Carbonell J G, Rudnicky A, et al. Adaptive statistical language modeling: a maximum entropy approach. Dissertation for Ph.D. Degree. Washington: Naval Research Laboratory, 2005. Google Scholar

[18] Wang J C, Xiao R, Sun Z X, et al. Research progress of web information retrieval. Comput Res Develop, 2001, 2: 187--193. Google Scholar

[19] Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University, 2008, 151: 5. Google Scholar

[20] Salton G, Fox E A, Wu H. Extended Boolean information retrieval. Commun ACM, 1983, 26: 1022-1036 CrossRef Google Scholar

[21] Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Commun ACM, 1975, 18: 613-620 CrossRef Google Scholar

[22] Robertson S. Understanding inverse document frequency: on theoretical arguments for IDF. J Documentation, 2004, 60: 503-520 CrossRef Google Scholar

[23] Fuhr N. Probabilistic Models in Information Retrieval. Comput J, 1992, 35: 243-255 CrossRef Google Scholar

[24] Robertson S, Zaragoza H. The probabilistic relevance framework: BM25 and beyond. J Found Trends Inf Ret, 2009, 3: 333--389. Google Scholar

[25] Lafferty J, Zhai C. Document language models, query models, and risk minimization for information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference, Princeton, 2001. 111--119. Google Scholar

[26] Zhai C, Lafferty J. A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference, Princeton, 2001. 334--342. Google Scholar

[27] Sennrich R. Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, 2012. 539--549. Google Scholar

[28] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993--1022. Google Scholar

[29] Zhao Q, Tong L, Swami A. Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework. IEEE J Sel Areas Commun, 2007, 25: 589-600 CrossRef Google Scholar

[30] van Rijsbergen C J. The Geometry of Information Retrieval. Cambridge: Cambridge University Press, 2004. 15--20. Google Scholar

[31] Zhang P, Song D W, Hou Y X, et al. Automata modeling for cognitive interference in users relevance judgment. In: Proceedings of Symposium on Quantum Informatics for Cognitive, Social, and Semantic Processes, 2010. 125--133. Google Scholar

[32] Wang B, Zhang P, Li J. Exploration of Quantum Interference in Document Relevance Judgement Discrepancy. Entropy, 2016, 18: 144 CrossRef ADS Google Scholar

[33] Zuccon G, Azzopardi L, van Rijsbergen K. The Quantum Probability Ranking Principle for Information Retrieval. Berlin: Springer, 2009. 232--240. Google Scholar

[34] Sordoni A, He J, Nie J. Modeling latent topic interactions using quantum interference for information retrieval. In: Proceedings of the 22nd CIKM, 2013. 1197--1200. Google Scholar

[35] Zhang P, Li J, Wang B. A Quantum Query Expansion Approach for Session Search. Entropy, 2016, 18: 146 CrossRef ADS Google Scholar

[36] Zhang P, Song D W, Zhao X Z, et al. Investigating query-drift problem from a novel perspective of Photon polarization. Berlin: Springer, 2011, 6931: 332--336. Google Scholar

[37] Zhao X, Zhang P, Song D, et al. A novel re-ranking approach inspired by quantum measurement. In: Proceedings of European Conference on Information Retrieval. Berlin: Springer, 2011. 721--724. Google Scholar

[38] Xie M J, Hou Y X, Zhang P, et al. Modeling quantum entanglements in quantum language models. In: Proceedings of the International Joint Conferences on Artificial Intelligence, 2015. 1362--1368. Google Scholar

[39] Piwowarski B, Frommholz I, Lalmas M. What can quantum theory bring to information retrieval. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, 2010. 59--68. Google Scholar

[40] Frommholz I, Larsen B, Piwowarski B, et al. Supporting poly representation in a quantum-inspired geometrical retrieval framework. In: Proceedings of the 3rd Symposium on Information Interaction in Context, 2010. 115--124. Google Scholar

[41] Haven E, Khrennikov A. Quantum Social Science. Cambridge: Cambridge University Press, 2013. Google Scholar

[42] Bruza P D, Wang Z, Busemeyer J R. Quantum cognition: a new theoretical approach to psychology.. Trends Cognitive Sci, 2015, 19: 383-393 CrossRef PubMed Google Scholar

[43] Nielsen M A, Chuang I L. Quantum Computation and Quantum Information. Cambridge: Cambridge University Press, 2000. Google Scholar

[44] von Neumann J. Mathematical Foundations of Quantum Mechanics. Princeton: Princeton University Press, 1996. Google Scholar

[45] Basile I, Tamburini F. Towards quantum language models. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017. 1840--1849. Google Scholar

[46] Zhang P, Niu J B, Su Z, et al. End-to-End quantum-like language models with application to question answering. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. Google Scholar

[47] Lvovsky A I. Iterative maximum-likelihood reconstruction in quantum homodyne tomography. J Opt B-Quantum Semiclass Opt, 2004, 6: S556-S559 CrossRef ADS Google Scholar

[48] van Rijsbergen C J. The Geometry of Information Retrieval. Cambridge: Cambridge University Press, 2004. 39--40. Google Scholar

[49] Shi Y Z, Zhang W Q, Cai M, et al. Variance regularization of RNNLM for speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, 2014. 4893--4897. Google Scholar

[50] Greff K, Srivastava R K, Koutnik J. LSTM: A Search Space Odyssey.. IEEE Trans Neural Netw Learning Syst, 2017, 28: 2222-2232 CrossRef PubMed Google Scholar

[51] Spengler C, Huber M, Hiesmayr B C. A composite parameterization of unitary groups, density matrices and subspaces. J Phys A-Math Theor, 2010, 43: 385306 CrossRef ADS arXiv Google Scholar

[52] Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. 1532--1543. Google Scholar

[53] Yang Y, Yih W, Meek C. Wikiqa: a challenge dataset for open-domain question answering. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015. 2013--2018. Google Scholar

[54] Wang M, Smith N A, Mitamura T. What is the Jeopardy model? A quasi-synchronous grammar for QA. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007. Google Scholar

[55] Carleo G, Troyer M. Solving the quantum many-body problem with artificial neural networks. Science, 2017, 355: 602-606 CrossRef PubMed ADS arXiv Google Scholar

[56] Levine Y, Yakira D, Cohen N, et al. Deep learning and quantum entanglement: fundamental connections with implications to network design. In: Proceedings of the 6th International Conference on Learning Representations, 2018. Google Scholar

[57] Levine Y, Sharir O, Shashua A. Benefits of depth for long-term memory of recurrent networks. In: Proceedings of the 6th International Conference on Learning Representations, 2018. Google Scholar

[58] Cohen N, Sharir O, Shashua A. On the expressive power of deep learning: a tensor analysis. In: Proceedings of Conference on Learning Theory, 2016. 698--728. Google Scholar

[59] Zhang P, Su Z, Zhang L P, et al. A quantum many-body wave function inspired language modeling approach. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018. Google Scholar

[60] Li Q C, Uprety S, Wang B Y, et al. Quantum-inspired complex word embedding. In: Proceedings of the 3rd Workshop on Representation Learning for NLP, 2018. Google Scholar

  • Figure 1

    (Color online) 2-dimensional geometric representation of projection measurement

  • Figure 2

    (Color online) Unitary evolution 2-dimensional spatial diagram

  • Figure 3

    (Color online) The dependency $|K_{1,2,3}\rangle$ is represented by vector space

  • Figure 4

    (Color online) Single sentence representation by density matrix

  • Figure 5

    (Color online) The first three layers are to obtain the single sentence representation, the fourth layer is to obtain the joint representation of a QA pair, and the softmax layer is to match the QA pair.

  • Figure 6

    (Color online) The single sentence representation and the joint representation, and the rest layers are to match the QA pair by the similarity patterns learned by 2-dimensional-CNN

  • Figure 7

    (Color online) Language modeling, neural networks and equivalent class diagrams of quantum mechanics

  •   

    Algorithm 1 基于量子测量的语言建模

    输入 密度矩阵$\rho_0$和酉演化矩阵$U$;

    输出 句子序列的联合概率 $P(s|\rho_o,U)$;

    初始化投影测量概率: $P(w_1;\rho_0,U)={\rm~tr}(\rho_0\Pi_{w_1})$; 投影后的状态: $\rho'_{1}=\frac{\Pi_{w_1}\rho_0\Pi_{w_1}}{{\rm tr}(\Pi{w_1}\rho_0\Pi_{w_1})}$; 演化后的状态: $\rho_1 = U \rho'_{1} U^{\rm T}$;

    循环测量 ($i=2,\ldots,n$)投影测量概率: $P(w_i|w_1,\ldots,w_{i-1};\rho_0,U)={\rm~tr}(\rho_{i-1}\Pi_{w_i})$; 投影后的状态: $\rho'_{i}=\frac{\Pi_{w_i}\rho_{i-1}\Pi_{w_i}}{{\rm tr}(\Pi{w_i}\rho_{i-1}\Pi_{w_i})}$;演化后的状态: $\rho_i = U \rho'_{i} U^{\rm T}$;

    结束$P(s|\rho_0,U)=P(w_1;\rho_0,U)\prod_{i=1}^{n}P(w_i|w_1,\ldots,w_{i-1};\rho_0,U)$;

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1