logo

SCIENCE CHINA Information Sciences, Volume 62, Issue 11: 212103(2019) https://doi.org/10.1007/s11432-018-9848-2

IEA: an answerer recommendation approach on stack overflow

More info
  • ReceivedSep 10, 2018
  • AcceptedFeb 28, 2019
  • PublishedSep 18, 2019

Abstract

Stack overflow is a web-based service where users can seek information by asking questions and share knowledge by providing answers about software development. Ideally, new questions are assigned to experts and answered within a short time after their submissions. However, the number of new questions is very large on stack overflow, answerers are not easy to find suitable questions timely. Therefore, an answerer recommendation approach is required to assign appropriate questions to answerers. In this paper, we make an empirical study about developers' activities. Empirical results show that 66.24% of users have more than 30% of comment activities. Furthermore, active users in the previous day are likely to be active in the next day. In this paper, we propose an approach IEA which combines user topical interest, topical expertise and activeness to recommend answerers for new questions. We first model user topical interest and expertise based on historical questions and answers. We also build a calculation method of users' activeness based on historical questions, answers, and comments. We evaluate the performance of IEA on 3428 users containing 41950 questions, 64894 answers, and 96960 comments. In comparison with the state-of-the-art approaches of TEM, TTEA and TTEA-ACT, IEA improves nDCG by 2.48%, 3.45% and 3.79%, and improves Pearson rank correlation coefficient by 236.20%, 84.91% and 224.12%, and improves Kendall rank correlation coefficient by 424.18%, 1845.30% and 772.60%.


Acknowledgment

This work was supported by National Key Research and Development Program of China (Grant No. 2018YFB1004202), National Natural Science Foundation of China (Grant No. 61672078), and State Key Laboratory of Software Development Environment of China (Grant No. SKLSDE-2018ZX-12).


References

[1] Guo J W, Xu S L, Bao S H, et al. Tapping on the potential of q&a community by recommending answer providers. In: Proceedings of the 17th ACM International Conference on Information and Knowledge Management, California, 2008. 921--930. Google Scholar

[2] Tian Y, Kochhar P S, Lim E P, et al. Predicting best answerers for new questions: an approach leveraging topic modeling and collaborative voting. In: Proceedings of the 5th International Conference on Social Informatics, Kyoto, 2013. 55--68. Google Scholar

[3] Liu Y, Qiu M H, Gottipati S, et al. Cqarank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, San Francisco, 2013. 99--108. Google Scholar

[4] Meng Z D, Gandon F, Zucker C F. Joint model of topics, expertises, activities and trends for question answering web applications. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, 2016. 296--303. Google Scholar

[5] Heinrich G. Parameter Estimation for Text Analysis. Technical Report. 2005. Google Scholar

[6] Jensen-shannon divergence. https://en.wikipedia.org/wiki/Jensen-Shannon divergence. Google Scholar

[7] J?rvelin K, Kek?l?inen J. Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst, 2002, 20: 422-446 CrossRef Google Scholar

[8] Kendall rank correlation coefficient. https://en.wikipedia.org/wiki/Kendall rank correlation coefficient. Google Scholar

[9] Xia X, David L, Wang X Y, et al. Accurate developer recommendation for bug resolution. In: Proceedings of the 20th Working Conference on Reverse Engineering, Koblenz, 2013. 72--81. Google Scholar

[10] Mann H B, Whitney D R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann Math Statist, 1947, 18: 50-60 CrossRef Google Scholar

[11] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993-1022. Google Scholar

[12] Hu Z T, Yao J J, Cui B. User group oriented temporal dynamics exploration. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec, 2014. 66--72. Google Scholar

[13] Wang X R, McCallum A. Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, 2006. 424--433. Google Scholar

[14] Zhou G Y, Lai S, Liu K, et al. Topic-sensitive probabilistic model for expert finding in question answer communities. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, 2012. 1662--1666. Google Scholar

[15] Barua A, Thomas S W, Hassan A E. What are developers talking about? An analysis of topics and trends in stack overflow. Empir Software Eng, 2014, 19: 619-654 CrossRef Google Scholar

[16] Beyer S, Pinzger M. A manual categorization of Android APP development issues on stack overflow. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, 2014. 531--535. Google Scholar

[17] Li H W, Xing Z C, Peng X, et al. What help do developers seek, when and how? In: Proceedings of the 20th Working Conference on Reverse Engineering, Koblenz, 2013. 142--151. Google Scholar

[18] Mario Linares-Vásquez M, Dit B, Poshyvanyk D. An exploratory analysis of mobile development issues using stack overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories, San Francisco, 2013. 93--96. Google Scholar

[19] Nadi S, Krüger S, Mezini M, et al. Jumping through hoops: why do java developers struggle with cryptography APIs? In: Proceedings of the 38th International Conference on Software Engineering, Austin, 2016. 935--946. Google Scholar

[20] Rosen C, Shihab E. What are mobile developers asking about? A large scale study using stack overflow. Empir Software Eng, 2016, 21: 1192-1223 CrossRef Google Scholar

[21] Xu B W, Ye D H, Xing Z C, et al. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, 2016. 51--62. Google Scholar

[22] Anvik J, Hiew L, Murphy G C. Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, Shanghai, 2006. 361--370. Google Scholar

[23] Hossen M K, Kagdi H, Poshyvanyk D. Amalgamating source code authors, maintainers, and change proneness to triage change requests. In: Proceedings of the 22nd International Conference on Program Comprehension, Hyderabad, 2014. 130--141. Google Scholar

[24] Jeong G, Kim S, Zimmermann T. Improving bug triage with bug tossing graphs. In: Proceedings of the 7th joint meeting of European Software Engineering Conference and ACM SIGSOFT International Symposium on Foundations of Software Engineering, Amsterdam, 2009. 111--120. Google Scholar

[25] Linares-Vásquez M, Hossen K, Dang H, et al. Triaging incoming change requests: bug or commit history, or code authorship? In: Proceedings of the 28th IEEE International Conference on Software Maintenance, Trento, 2012. 451--460. Google Scholar

[26] Liu H, Ma Z, Shao W. Schedule of Bad Smell Detection and Resolution: A New Way to Save Effort. IIEEE Trans Software Eng, 2012, 38: 220-235 CrossRef Google Scholar

[27] Matter D, Kuhn A, Nierstrasz O. Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings of the 6th International Working Conference on Mining Software Repositories, Vancouver, 2009. 131--140. Google Scholar

  • Figure 1

    (Color online) An example of a question on stack overflow.

  • Figure 2

    (Color online) Percentage of comment activities.

  • Figure 3

    (Color online) Active users in successive days.

  • Figure 4

    (Color online) Overall framework of our method IEA.

  • Figure 5

    The graphical model of TEM.

  • Table 1   Number of answers per question in training data
    The number of answers per question The number of questions
    0 3053
    1 20423
    2 10074
    3 4066
    4 1640
    5 646
    6 274
    7 125
    8 67
    9 31
    $\geqslant~10$ 56
  • Table 2   An example of user activity in April 2014 on stack overflow
    User ID Activity Creation time Question ID
    3523446 Answer 2014-04-11 11:20 23011187
    3523446 Answer 2014-04-13 04:31 23039131
    3523446 Answer 2014-04-13 04:36 23039155
    3523446 Comment 2014-04-13 05:47 23039155
    3523446 Answer 2014-04-13 06:16 23039802
    3523446 Answer 2014-04-24 12:57 23269620
    3523446 Answer 2014-04-24 13:23 23270226
    3523446 Comment 2014-04-24 13:29 23269620
    3523446 Answer 2014-04-25 04:38 23284281
    3523446 Answer 2014-04-25 09:47 23289638
    3523446 Answer 2014-04-25 12:02 23292561
    3523446 Comment 2014-04-25 12:32 23284281
    3523446 Comment 2014-04-25 15:16 23284281
    3523446 Comment 2014-04-25 15:25 23292561
    3523446 Comment 2014-04-25 16:04 23284281
    3523446 Comment 2014-04-26 02:24 23305872
  • Table 3   Symbols associated with TEM
    Notation Type Description
    $~U~$ Scalar The total number of users
    $~N_{u}~$ Scalar The total number of questions and answers for user $~u~$
    $~M_{u,n}~$ Scalar The total number of words in $~u~$'s $~n~$-th question or answer
    $~L_{u,n}~$ Scalar The total number of tags in $~u~$'s $~n~$-th question or answer
    $~K~$ Scalar The total number of topics
    $~E~$ Scalar The total number of expertise levels
    $~\alpha~$ Scalar Hyperparameter of the Dirichlet prior for the user topic distribution
    $~\beta~$ Scalar Hyperparameter of the Dirichlet prior for the user topical expertise distribution
    $~\eta~$ Scalar Hyperparameter of the Dirichlet prior for the topic-word distribution
    $~\gamma~$ Scalar Hyperparameter of the Dirichlet prior for the topic-tag distribution
    $~\alpha_{0}~$, $~\beta_{0}~$, $~\mu_{0}~$, $~k_{0}~$ Scalar Normal-Gamma parameters
    $~\theta_{u}~$ Vector Topic distribution for user $~u~$
    $~\phi_{k}~$ Vector Word distribution for topic $~k~$
    $~\varphi_{k}~$ Vector Tag distribution for topic $~k~$
    $~\theta_{k,u}~$ Vector Expertise distribution for user $~u~$ under topic $~k~$
    $G(~\mu_{e}~$, $~\Sigma_{e}~$) Vector Expertise specific vote distribution
  • Table 4   nDCG, Pearson and Kendall of approaches TEM, TTEA, TTEA-ACT, and IEA
    nDCG@1 nDCG@5 nDCG Pearson Kendall
    IEA 0.6624 0.8349 0.9020 0.1880 0.1649
    TEM 0.6006 0.8131 0.8802 0.0559 0.0315
    TTEA 0.5784 0.8048 0.8719 0.1017 0.0085
    TTEA-ACT 0.5752 0.8020 0.8690 0.0580 0.0189
  • Table 5   nDCG gain, Pearson gain and Kendall gain of approaches TEM, TTEA, TTEA-ACT, and IEA
    nDCG@1 nDCG@5 nDCG Pearson Kendall
    gain (%) gain (%) gain (%) gain (%) gain (%)
    IEA vs. TEM 10.29 $~\ast\ast\ast~$ 2.68 $~\ast\ast~$ 2.48 $~\ast\ast~$ 236.20 $~\ast\ast\ast~$ 424.18 $~\ast\ast~$
    IEA vs. TTEA 14.53 $~\ast~$ 3.74 $~\ast~$ 3.45 $~\ast~$ 84.91 $~\ast\ast~$ 1845.30 $~\ast\ast~$
    IEA vs. TTEA-ACT 15.17 $~\ast\ast~$ 4.11 $~\ast\ast~$ 3.79 $~\ast\ast~$ 224.12 $~\ast\ast~$ 772.60 $~\ast\ast~$

    $~\ast\ast\ast$

  • Table 6   nDCG ratio, Pearson ratio and Kendall ratio of approaches TEM, TTEA, TTEA-ACT, and IEA
    nDCG@1 nDCG@5 nDCG Pearson Kendall
    ratio (%) ratio (%) ratio (%) ratio (%) ratio (%)
    IEA vs. TEM 89.38 87.19 87.19 82.51 88.05
    IEA vs. TTEA 85.63 83.44 83.44 79.88 85.13
    IEA vs. TTEA-ACT 86.88 85.31 85.31 79.30 86.59
  • Table 7   nDCG, Pearson and Kendall of approaches IEA-no-comment and IEA
    normalsize nDCG@1 nDCG@5 nDCG Pearson Kendall
    IEA 0.6624 0.8349 0.9020 0.1880 0.1649
    IEA-no-comment 0.6555 0.8328 0.8998 0.1303 0.1602
  • Table 8   nDCG gain, Pearson gain and Kendall gain of approaches IEA-no-comment and IEA
    nDCG@1 nDCG@5 nDCG Pearson Kendall
    gain (%) gain (%) gain (%) gain (%) gain (%)
    IEA vs. IEA-no-comment 1.05 0.26 0.2378 44.30 2.91
  • Table 9   nDCG ratio, Pearson ratio and Kendall ratio of approaches IEA-no-comment and IEA
    nDCG@1 nDCG@5 nDCG Pearson Kendall
    ratio (%) ratio (%) ratio (%) ratio (%) ratio (%)
    IEA vs. IEA-no-comment 95.31 94.06 94.06 93.29 94.46
  • Table 10   nDCG, Pearson and Kendall of approaches TEM, TA, EA, INT, EXP, ACT, and IEA
    nDCG@1 nDCG@5 nDCG Pearson Kendall
    IEA 0.6624 0.8349 0.9020 0.1880 0.1649
    TEM 0.6006 0.8131 0.8802 0.0559 0.0315
    TA 0.6333 0.8237 0.8908 0.1180 0.1029
    EA 0.6480 0.8297 0.8968 0.1216 0.1497
    INT 0.5204 0.7797 0.8467 $-$0.0685 $-$0.0908
    EXP 0.5586 0.7988 0.8659 $-$0.0557 $-$0.0122
    ACT 0.6250 0.8205 0.8876 0.0930 0.1063
  • Table 11   nDCG gain, Pearson gain and Kendall gain of approaches TEM, TA, EA, INT, EXP, ACT, and IEA
    nDCG@1 nDCG@5 nDCG Pearson Kendall
    gain (%) gain (%) gain (%) gain (%) gain (%)
    IEA vs. TEM 10.29 2.68 2.48 236.20 424.18
    IEA vs. TA 4.60 1.36 1.25 59.40 60.19
    IEA vs. EA 2.22 0.63 0.58 54.68 10.13
    IEA vs. INT 27.29 7.09 6.53 $-$374.47 $-$281.53
    IEA vs. EXP 18.58 4.52 4.17 $-$437.70 $-$1456.60
    IEA vs. ACT 5.99 1.75 1.62 102.19 55.07
  • Table 12   Performance of IEA by varying the number of topics ($T$)
    nDCG@1 nDCG@5 nDCG Pearson Kendall
    $T~=~1$ 0.6417 0.8274 0.8944 0.1691 0.1310
    $T~=~2$ 0.6458 0.8288 0.8958 0.1394 0.1310
    $T~=~3$ 0.6432 0.8271 0.8942 0.1155 0.1111
    $T~=~4$ 0.6283 0.8218 0.8888 0.1274 0.1012
    $T~=~5$ 0.6420 0.8270 0.8941 0.1586 0.1127
    $T~=~6$ 0.6500 0.8309 0.8979 0.1843 0.1385
    $T~=~7$ 0.6464 0.8289 0.8960 0.1061 0.1277
    $T~=~8$ 0.6343 0.8247 0.8918 0.0886 0.1114
    $T~=~9$ 0.6633 0.8347 0.9018 0.1598 0.1583
    $T~=~10$ 0.6624 0.8349 0.9020 0.1880 0.1649
    $T~=~11$ 0.6425 0.8271 0.8942 0.1437 0.1332
    $T~=~12$ 0.6231 0.8190 0.8861 0.0901 0.0856
    $T~=~13$ 0.6502 0.8303 0.8973 0.1309 0.1435
    $T~=~14$ 0.6246 0.8213 0.8883 0.1102 0.0869
    $T~=~15$ 0.6242 0.8208 0.8878 0.1277 0.1304
  • Table 13   Performance of IEA by varying the number of expertise ($E$)
    nDCG@1 nDCG@5 nDCG Pearson Kendall
    $E~=~1$ 0.6250 0.8205 0.8876 0.0988 0.1063
    $E~=~2$ 0.6262 0.8202 0.8873 0.1061 0.0760
    $E~=~3$ 0.6257 0.8203 0.8873 0.1009 0.0950
    $E~=~4$ 0.6410 0.8266 0.8937 0.1341 0.1162
    $E~=~5$ 0.6437 0.8282 0.8953 0.1344 0.1087
    $E~=~6$ 0.6309 0.8236 0.8906 0.1146 0.1176
    $E~=~7$ 0.6187 0.8185 0.8855 0.0644 0.0784
    $E~=~8$ 0.6262 0.8218 0.8888 0.1493 0.1161
    $E~=~9$ 0.6070 0.8151 0.8821 0.1204 0.0712
    $E~=~10$ 0.6624 0.8349 0.9020 0.1880 0.1649
    $E~=~11$ 0.6300 0.8224 0.8895 0.0825 0.0898
    $E~=~12$ 0.6469 0.8301 0.8972 0.1397 0.1511
    $E~=~13$ 0.6328 0.8244 0.8915 0.1111 0.1201
    $E~=~14$ 0.6287 0.8229 0.8900 0.1090 0.1013
    $E~=~15$ 0.6377 0.8246 0.8916 0.1286 0.1210

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号