logo

SCIENTIA SINICA Informationis, Volume 47, Issue 8: 1066-1(2017) https://doi.org/10.1360/N112016-00303

Acquiring Chinese paraphrases based on random walk of $N$ steps

More info
  • ReceivedApr 5, 2017
  • AcceptedMay 26, 2017
  • PublishedAug 14, 2017

Abstract

The conventional “pivot” approach of acquiring paraphrases from bilingual corpus has certain limitations where only candidate paraphrases within two steps are considered. In this paper, we propose a graph-based model of acquiring paraphrases from a phrase translation table. First, we describe a graph-based model representing Chinese-English phrase translation relations, a random walk algorithm based on $N$ number of steps and a confidence metric for the obtained paraphrases. Furthermore, with the aim of finding more potential for Chinese paraphrases, we augment the model so that it is able to integrate other language pairs, such as English-Japanese phrase translation relations. We performed experiments on NTCIR Chinese-English and English-Japanese bilingual corpus and compared the results to those of conventional methods. The experimental results show that the proposed approach acquires more paraphrases. In addition, the performance was improved further after the English-Japanese phrase translations were added to the graph-based model.


Funded by

北京交通大学人才基金(KKRC11001532)

国家自然科学基金(61370130,61473294)

中央高校基本科研业务费专项资金(2015JBM033)


References

[1] Barzilay R, McKeown K. Extracting paraphrases from a parallel corpus. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, 2001. 50--57. Google Scholar

[2] Callison-Burch C, Koehn P, Osborne M. Improved statistical machine translation using paraphrases. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, 2006. 17--24. Google Scholar

[3] Barzilay R. Information fusion for multi-document summarization: paraphrasing and generation. Dissertation for Ph.D. Degree. New York: Columbia University, 2003. Google Scholar

[4] Zhou L, Lin C Y, Munteanu D S, et al. ParaEval: using paraphrases to vealuate summaries automatically. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, 2006. 447--454. Google Scholar

[5] Zukerman I, Raskutti B. Lexical query paraphrasing for document retrieval. In: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, 2002. 1--7. Google Scholar

[6] Iordanskaja L, Kittredge R, Polguere A. Lexical selection and paraphrase in a meaning-text generation model. In: Proceedings of Natural Language Generation in Artificial Intelligence and Computational Linguistics. New York: Springer, 1991. 293--312. Google Scholar

[7] McKeown K. Paraphrasing using given and new information in a question-answer system. In: Proceedings of the 17th Annual Meeting of the Association for Computational Linguistics, La Jolla, 1979. 67--72. Google Scholar

[8] Madnani N, Dorr B J. Generating phrasal and sentential paraphrases: a survey of data-driven methods. Comput Linguist, 2010, 3: 341--387. Google Scholar

[9] Bannard C, Callison-Burch C. Paraphraseing with bilingual parallel corpora. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Michigan, 2005. 597--604. Google Scholar

[10] Kok S, Brockett C. Hitting the right paraphrases in good time. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Los Angeles, 2010. 145--153. Google Scholar

[11] Brand M. A random walks perspective on maximizing satisfaction and profit. In: Proceedings of the 2005 SIAM International Conference on Data Mining, Irvine, 2005. 12--19. Google Scholar

[12] Sarkar P, Moore A W, Prakash A. Fast incremental proximity search in large graphs. In: Proceedings of the 25th International Conference on Machine learning. Helsinki: Finland Ages, 2008. 896--903. Google Scholar

  • Figure 3

    Translation relation graph of phrases translation table

  • Figure 4

    The random walk of an ant. (a) The ant at the start node; (b) after one step of random walk

  • Figure 5

    Translation relation graph of Chinese-English phrase translation table integrated with that of English-Japanese

  • Figure 6

    An example acquired by adding English-Japanese phrase table

  • Table 1   Examples of the Chinese-English phrases translation table
    Example Chinese phrase English phrase
    Chinese-English
    translation probability
    English-Chinese
    translation probability
    1 充当 阳极 acts as an anode 0.333333 0.25
    2 用作 阳极 acts as an anode 0.333333 1
    3 用作 阳极 serving as an anode 0.5 0.0416
    4 作为 阳极 serving as an anode 1 0.0313
  •   

    Algorithm 1 基于$N$步的随机行走算法

    Require:

    Output:

    while $t < N$ do

    $j \Leftarrow 1$;

    while $j < M$ do

    RandomDetermineTheNextPath($j$, Graph);

    GoForwardOneStep;

    Record(AntState);

    $j \Leftarrow j + 1$;

    end while

    $t \Leftarrow t+1 $;

    end while

  • Table 2   Some phrases for experiment
    Number Given phrase Number Given phrase
    1 传送 给 远程 6 不断 提高
    2 船体 7 家庭 用品
    3 微不足道 8 充当 阳极
    4 发送 图像 9 我们 期望
    5 医药 材料 10 专属
  • Table 3   The experimental results
    Experiment
    Average number
    of paraphrases
    Average number
    of paraphrases
    Average number
    of paraphrases
    Average number
    of paraphrases
    1 3.79 2.88 24.01 8.34
    2 11.56 7.18 37.89 5.28
    3 15.32 9.04 40.99 4.53
  • Table 4   Some examples of the acquired paraphrase results and the corresponding hitting time
    Number Given phrase
    The result of
    paraphrases
    Hitting time Number Given phrase
    The result of
    paraphrases
    Hitting time
    发送 给 远端 9.44202 继续 增大 11.2896
    传送 到 远处 10.6529 连续 升高 11.3157
    传输 到 远端 10.8738 日益 提高 11.8252
    传送 到 远端 11.3424 然后 增加 11.8338
    传输 给 远端 11.452 连续 增加 11.8834
    朝向该远程 11.8415 以及 提高 11.9046
    传输 到 远程 11.8629 继续 增加 11.9507
    1 传送到远程 传递 到 远端 11.9333 4 不断 提高 相继 增加 11.9507
    传递 给 远程 11.9999 日益 增多 11.9636
    运输 至 遥远 11.9999 然后 增大 11.9724
    _命令发送到 12 连续 增大 11.9935
    _距离该远端 12 持续 上升 11.9943
    _向 远端延伸 12 持续 增长 11.9947
    _溶液传送至 12 然后 升高 11.9964
    _给远端位置 12 不断 增大 11.9970
    船壳 11.8173 医疗 器具 9.59756
    2 船体 船身 11.9981 5 医药 材料 医学 材料 9.62103
    船侧 11.9999 医学 物质 9.87545
    轮船 11.9999 医疗 材料 11.5707
    无足轻重 11.9998 传输 图像 11.3969
    3 微不足道 无关紧要 11.9998 6 发送 图像 图像 传输 11.9899
    忽略不计 11.9999 传送 图像 11.9999

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1