SCIENCE CHINA Information Sciences, Volume 60, Issue 11: 110102(2017) https://doi.org/10.1007/s11432-016-9197-0

Convolutional neural networks for expert recommendation in community question answering

More info
  • ReceivedJun 5, 2017
  • AcceptedJul 17, 2017
  • PublishedOct 13, 2017


Community Question Answering (CQA) is becoming an increasingly important web service for people to search for expertise and to share their own. With lots of questions being solved, CQA have built a massive, freely accessible knowledge repository, which can provide valuable information for the broader society rather than just satisfy the question askers. It is critically important for CQA services to get high quality answers in order to maximize the benefit of this process. However, people are considered as experts only in their own specialized areas. This paper is concerned with the problem of expert recommendation for a newly posed question, which will reduce the questioner's waiting time and improve the quality of the answer, so as to improve the satisfaction of the whole community. We proposean approach based on convolutional neural networks (CNN) to resolve this issue. Experimental analysis over a large real-world dataset from Stack Overflow demonstrates that our approach achieves asignificant improvement over several baseline methods.


This work was supported by National Natural Science Foundation of China (Grant Nos. 61572098, 61632011, 61562080), National Key Research Development Program of China (Grant No. 2016YFB1001103), and Major Projects of Science and Technology Innovation in Liaoning Province (Grant No. 20151060-21).


[1] Balog K. Expertise Retrieval. FNT Inf Retrieval, 2012, 6: 127-256 CrossRef Google Scholar

[2] Liu X Y, Croft W B, Koll M. Finding experts in community-based question-answering services. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, Bremen, 2005. 315--316. Google Scholar

[3] Li B, King I, Lyu M R. Question routing in community question answering: putting category in its place. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, 2011. 2041--2044. Google Scholar

[4] Zhou G, Liu K, Zhao J. Joint relevance and answer quality learning for question routing in community QA. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, 2012. 1492--1496. Google Scholar

[5] Riahi F, Zolaktaf Z, Shafiei M, et al. Finding expert users in community question answering. In: Proceedings of the 21st International Conference on World Wide Web, Lyon, 2012. 791--798. Google Scholar

[6] Mandal D P, Kundu D, Maiti S. Finding experts in community question answering services: a theme based query likelihood language approach. In: Proceedings of IEEE International Conference on Advances in Computer Engineering and Applications, Ghaziabad, 2015. 423--427. Google Scholar

[7] Yang L, Qiu M, Gottipati S, et al. Cqarank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, San Francisco, 2013. 99--108. Google Scholar

[8] Yang B, Manandhar S. Tag-based expert recommendation in community question answering. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Beijing, 2014. 960--963. Google Scholar

[9] Zhao Z, Zhang L, He X. Expert Finding for Question Answering via Graph Regularized Matrix Completion. IEEE Trans Knowl Data Eng, 2015, 27: 993-1004 CrossRef Google Scholar

[10] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Mach Learn Res, 2003, 3: 1137--1155. Google Scholar

[11] Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch. J Mach Learn Res, 2011, 12: 2493--2537. Google Scholar

[12] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of International Conference on Neural Information Processing Systems, Lake Tahoe, 2013. 3111--3119. Google Scholar

[13] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436-444 CrossRef PubMed ADS Google Scholar

[14] Lecun Y, Bottou L, Bengio Y. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278-2324 CrossRef Google Scholar

[15] Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1746--1751. Google Scholar

[16] Pal A, Chang S, Konstan J A. Evolution of experts in question answering communities. In: Proceedings of the 6th International AAAI Conference on Weblogs and Social Media, Dublin, 2012. 274--281. Google Scholar

[17] Gao W, Zhou Z H. Dropout Rademacher complexity of deep neural networks. Sci China Inf Sci, 2016, 59: 072104 CrossRef Google Scholar

[18] Dong H L, Wang J, Lin H F, et al. Predicting best answerers for new questions: an approach leveraging distributed representations of words in community question answering. In: Proceedings of the 9th International Conference on Frontier of Computer Science and Technology, Dalian, 2015. 13--18. Google Scholar

[19] Pedregosa F, Michel V G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res, 2012, 12: 2825--2830. Google Scholar

[20] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. J Mach Learn Res, 2003, 3: 993--1022. Google Scholar

[21] Du L, Buntine W, Jin H D. A segmented topic model based on the two-parameter Poisson-Dirichlet process. Mach Learn, 2010, 8: 5--19. Google Scholar

  • Figure 1

    (Color online) The architecture of CNN used for expert recommendation.

  • Figure 2

    (Color online) Distribution of the most frequent tags in Stack Overflow.

  • Figure 3

    (Color online) Distribution of the most frequent co-occurringtags in Stack Overflow.

  • Figure 4

    (Color online) $S@1$ for prediction ofbest answerer using CNN model with different profile configurations.

  • Figure 5

    (Color online) $S@N$ for prediction ofbest answerer using CNN model based on titles and bodies.

  • Figure 6

    (Color online) Results of the prediction ofbest answerer based on D40: Y axis shows $S@1$ values and X axis showsthree models of different sentence length.

  • Table 1   Tags selected for the training set
    Frequently co-occur Partially co-occur Rarely co-occur
    C# Python Django
    SQL SQL-Server CSS
    Linux Delphi Ruby
    Windows .NET Ruby-on-Rails
    Java JavaScript WPF
    C iPhone
  • Table 2   Data statistics
    Data set ID Questions Best answerers
    All 479531 56055
    D20 311857 4390
    D40 248300 2064
  • Table 3   Performance comparison of proposed model and traditional methods based on D40
    Method TF-IDF Language model LR LDA [5] SSRM [18] STM [5] CNN-non-static
    $S@1$ 0.0320 0.0310 0.0349 0.0578 0.0578 0.1034 0.2734
    $S@2$ 0.0442 0.0372 0.0513 0.0765 0.0765 0.1051 0.2830
    $S@3$ 0.0560 0.0442 0.0625 0.0810 0.0810 0.1192 0.2884
    $S@4$ 0.0636 0.0478 0.0709 0.0836 0.0836 0.1200 0.2928
    $S@5$ 0.0714 0.0524 0.0778 0.0856 0.0856 0.1267 0.2966

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号