SCIENTIA SINICA Informationis, Volume 48 , Issue 11 : 1510-1520(2018) https://doi.org/10.1360/N112018-00151

Heterographic pun identification model based on multi-dimensional semantic relationships

More info
  • ReceivedJun 13, 2018
  • AcceptedSep 10, 2018
  • PublishedNov 14, 2018


Identifying heterographic puns is an important branch of humor research, which has gradually developed into a new research area. This paper presents a heterographic pun identification mechanism based on feature sets in four dimensions, namely, semantic transparency, semantic relevance, phonetic expansibility, and syntax feature sets. The semantic transparency feature sets consist of the lexical item statistics and the character length; the syntax feature sets include names, capitalization, tense, part of speech, and location. Nine features of the above four dimensions are added to a binary decision tree to generate a threshold and complete a pun identification with the help of K-means clustering. Using the corpus of the SemEval2017 Task 7, the proposed method achieves satisfactory results, and its F1 value outscores the top one out of all participating teams. The experiment outlined in this paper proves that the taxonomic approach of the binary decision tree algorithm based on four dimensions is effective in identifying heterographic puns. The phonetic expansibility and the syntax feature sets are particularly effective among all other dimensions, which is consistent with our presumption that the phonetic feature plays a bigger role in identifying heterographic puns.

Funded by








[1] Tristan M. Towards the automatic detection and identification of English puns. Eur J Humour Res, 2016, 1: 59--75. Google Scholar

[2] Taylor J M. Ontology-based view of natural language meaning: the case of humor detection. J Ambient Intell Human Comput, 2010, 1: 221-234 CrossRef Google Scholar

[3] Taylor J M. Computational detection of humor: a dream or a nightmare? the ontological semantics approach. In: Proceedings of International Joint Conferences on Web Intelligence and Intelligent Agent Technology, Milano, 2009. 429--432. Google Scholar

[4] Reyes A, Rosso P, Buscaldi D. From humor recognition to irony detection: The figurative language of social media. Data Knowledge Eng, 2012, 74: 1-12 CrossRef Google Scholar

[5] Yishay R. Automatic humor classification on twitter. In: Proceedings of the NAACL HLT, Montréal, 2012. 66--70. Google Scholar

[6] Dario B, Pascale F. Deep learning of audio and language features for humor prediction. In: Proceedings of International Conference on Language Resources and Evaluation, Portoroz, 2016. 496--501. Google Scholar

[7] Dario B, Pascale F. Predicting humor response in dialogues from TV sitcoms. In: Proceedings of ICASSP, Shanghai, 2016. 5780--5784. Google Scholar

[8] Dario B, Pascale F. A long short-term memory framework for predicting humor in dialogues. In: Proceedings of NAACL HLT, San diego, 2016. 130--135. Google Scholar

[9] Zhang R X, Liu N S. Recognizing humor on twitter. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, Shanghai, 2014. 889--898. Google Scholar

[10] Reyes A, Buscaldi D, Rosso P. An analysis of the impact of ambiguity on automatic humour recognition. Lecture Notes Comput Sci, 2009, 5729: 162--169. Google Scholar

[11] Buscaldi D, Rosso P. Some experiments in humour recognition using the italian wikiquote collection. In: Proceedings of International Workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory. Berlin: Springer, 2007. 464--468. Google Scholar

[12] Igor L, Hod L. Humor as circuits in semantic networks. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Baltimore, 2012. 150--155. Google Scholar

[13] Alessandro V, Hannu T. “Let everything turn well in your wife": generation of adult humor using lexical constraints. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofa, 2013. 243--248. Google Scholar

[14] Zhang D Y, Yang L, Zheng P Q, et al. Construction and application of affective metaphor corpus. Sci Sin Inform, 2015, 45: 1574--1587. Google Scholar

[15] Lin H F, Zhang D Y, Yang L, et al. Computational humor researches and applications. J Shandong Univ, 2016, 7: 1--10. Google Scholar

[16] Ritchie G. Computational mechanisms for pun generation. In: Proceedings of the 10th European Workshop on Natural Language Generation, 2005. 8--10. Google Scholar

[17] Hempelmann C F. Computational humor: beyond the pun? the primer of humor research. Humor Res, 2008, 8: 333--360. Google Scholar

[18] Hong B A, Ong E. Automatically extracting word relationships as templates for pun generation. In: Proceedings of the NAACL HLT, Boulder, 2009. 24--31. Google Scholar

[19] Chloe K, Yuriy B. That's what she said: double entendre identification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, 2011. 89--94. Google Scholar

[20] Yang D Y, Lavie A, Dyer C, et al. Humor recognition and humor anchor extraction. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Lisbon, 2015. 2367--2376. Google Scholar

[21] Tristan M, Iryna G. Automatic disambiguation of English puns. In: Proceedings of Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, 2015. 719--729. Google Scholar

[22] Arnold Z, Elizabeth Z. Imperfect puns, markedness, and phonological similarity: with fronds like these, who needs anemones. Folia Linguist, 1986, 20: 493--503. Google Scholar

[23] Wlodzimierz S. Metaphonology of English Paronomasic Puns. Bern: Peter Lang Pub Inc, 1991. 1--325. Google Scholar

[24] Kim B. Machine humour: an implemented model of puns. Dissertation for Ph.D. Degree. Edinburgh: University of Edinburgh, 1996. Google Scholar

[25] Robinson T. The British English example pronunciation dictionary. Cambridge: Cambridge University Press, 1996. 1--8. Google Scholar

[26] Hempelmann C F. Paronomasic puns: target recoverability towards automatic generation. Dissertation for Ph.D. Degree. West Lafayette: Purdue University, 2003. Google Scholar

[27] Kao J T, Roger L, Goodman N D. A Computational model of linguistic humor in puns. Cogn Sci, 2016, 5: 1270--1285. Google Scholar

[28] Jaech A, Koncel--Kedziorski R, Ostendorf M. Phonological pun uderstanding. In: Proceedings of NAACL HLT, San Diego, 2016. 654--663. Google Scholar

[29] Samuel D, Aniruddha G, Hanyang C. Idiom savantat semeval--2017 task7: detection and interpretation of English puns. In: Proceedings of the 11th International Workshop on Semantic Evaluations, Vancouver, 2017. 103--108. Google Scholar

[30] Dipankar D, Aniket P. JUCSE NLP at SemEval2017 Task7: Employing rules to detect and interpret English puns. In: Proceedings of the 11th International Workshop on Semantic Evaluations, Vancouver, 2017. 432--435. Google Scholar

[31] Diao Y F, Lin H F, Wu D. WECA: a wordnet-encoded collocation attention network for homographic pun. In: Proceedings of EMNLP, Brussels, 2018. 432--435. Google Scholar

[32] Wang C M, Peng D L. The roles of surface frequencies cumulative morpheme frequencies, and semantic transparencies in the processing of compound words. Acta Psychol Sin, 1999, 3: 266--273. Google Scholar

[33] Cruise D A. Lexical Semantics. Cambridge: Cambridge University Press, 1991. Google Scholar

[34] Mok L W. Word-superiority effect as a function of semantic transparency of Chinese bimorphemic compound words. Language Cognitive Processes, 2009, 24: 1039-1081 CrossRef Google Scholar

[35] Pollatsek A, Hy?n? J. The role of semantic transparency in the processing of Finnish compound words. Language Cognitive Processes, 2005, 20: 261-290 CrossRef Google Scholar

[36] Fellbaum C. WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 1998. 1--423. Google Scholar

[37] Kenneth H, Ivan P, Jonathan H, et al. Scalable modified kneser-ney language model estimation. In: Proceedings of Annual Meeting of the Association for Computational Linguistics, Sofia, 2013. 690--696. Google Scholar

[38] Henry K, Nelson F. Computational Analysis of Present-day American English. Providence: Brown University Press, 1967. 1--424. Google Scholar

[39] Castro S, Cubero M, Garat D, et al. Is this a Joke? detecting humor in spanish tweets. In: Proceedings of IBERAMIA, San José, 2016. 139--150. Google Scholar

  • Figure 1

    (Color online) Puns recognition algrithm based BDT

  • Table 1   The pun recognition results of superimposed all features
    Features Precision (%) Recall (%) Accuracy (%) $F1$ (%)
    Semantic transparency 46.57 28.80 88.83 43.49
    Semantic transparency + semantic relevance 48.88 33.02 89.85 47.22
    Semantic transparency + semantic relevance 62.87 57.12 86.22 68.72
    + phonetic expansibility
    Semantic transparency + semantic relevance 78.48 82.93 86.39 84.62
    + phonetic expansibility + syntax features
    The number one of 2017SemEval [30] 78.37 81.90 87.04 84.39
  • Table 2   The recognition results of each dimension features
    Features Precision (%) Recall (%) Accuracy (%) $F1$ (%)
    Semantic relevance 32.02 4.88 98.41 9.30
    Semantic transparency 46.57 28.80 88.83 43.49
    Phonetic expansibility 52.58 39.89 86.37 54.57
    Syntax features 69.61 63.49 91.29 74.90
  • Table 3   The recognition results of three syntactical structure features
    Features Precision (%) Recall (%) Accuracy (%) $F1$ (%)
    Names 42.75 20.61 96.32 33.96
    Capitalization 30.56 3.93 76.92 7.49
    Tense 67.08 58.22 93.03 71.64

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号