SCIENCE CHINA Information Sciences, Volume 61 , Issue 5 : 050106(2018) https://doi.org/10.1007/s11432-017-9419-x

Personalized project recommendation on GitHub

More info
  • ReceivedNov 6, 2017
  • AcceptedApr 13, 2018
  • PublishedApr 20, 2018


GitHub is a software development platform that facilitates collaboration and participation in project development.Typically, developers search for relevant projects in order to reuse functions and identify useful features.Recommending suitable projects for developers can save their time. However, finding suitable projects among many projects on GitHub is difficult.In addition, different users may have different requirements.A recommendation system would help developers by reducing the time required to find suitable projects.In this paper, we propose an approach to recommend projects that considers developer behaviors and project features.The proposed approach automatically recommends the top-$N$ most relevant software projects.We also integrate user feedback to improve recommendation accuracy.The results of an empirical study using data crawled from GitHub demonstrate that the proposed approach can efficiently recommend relevant software projects with relatively high precision.


[1] Sun X B, Yang H, Xia X. Enhancing developer recommendation with S information via mining historical commits. J Syst Softw, 2017, 134: 355-368 CrossRef Google Scholar

[2] Sun X B, Li B, Duan Y C. Mining software repositories for automatic interface recommendation. Sci Program, 2016, 2016: 5 CrossRef Google Scholar

[3] Zhang L X, Zou Y Z, Xie B, et al. Recommending relevant projects via user behaviour: an exploratory study on GitHub. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies, Hong Kong, 2014. 25--30. Google Scholar

[4] Zhang Y, Lo D, Singh K P, et al. Detecting similar repositories on GitHub. In: Proceedings of the 24rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Klagenfurt, 2017. 13--23. Google Scholar

[5] Jiang J, Lo D, He J H. Why and how developers fork what from whom in GitHub. Empir Softw Eng, 2017, 22: 547-578 CrossRef Google Scholar

[6] McMillan C, Grechanik M, Poshyvanyk D. Detecting similar software applications. In: Proceedings of the 34th International Conference on Software Engineering, Piscataway, 2012. 364--374. Google Scholar

[7] Sun W S, Sun X B, Yang H, et al. WB4SP: a tool to build the word base for specific programs. In: Proceedings of the 24th IEEE International Conference on Program Comprehension, Austin, 2016. Google Scholar

[8] Hu J J, Sun X B, Lo D, et al. Modeling the evolution of development topics using dynamic topic models. In: Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, Montreal, 2015. 3--12. Google Scholar

[9] Thung F, Lo D, Jiang L X. Detecting similar applications with collaborative tagging. In: Proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM), Trento, 2012. 600--603. Google Scholar

[10] Yang C, Fan Q, Wang T, et al. Repolike: personal repositories recommendation in social coding communities. In: Proceedings of the 8th Asia-Pacific Symposium on Internetware 2016, Beijing, 2016. 54--62. Google Scholar

[11] Wang J, de Vries A P, Reinders M. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, 2006. 501--508. Google Scholar

[12] Kirkpatrick S, Gelatt C D, Vecchi M P. Optimization by simulated annealing. In: Readings in Computer Vision: Issues, Problems, Principles, and Paradigms. San Francisco: Morgan Kaufmann Publishers, 1983. 671--680. Google Scholar

[13] Sun X B, Liu X Y, Hu J J, et al. Empirical studies on the NLP techniques for source code data preprocessing. In: Proceedings of the 3rd International Workshop on Evidential Assessment of Software Technologies, Nanjing, 2014. 32--39. Google Scholar

[14] Xu W Y, Sun X B, Hu J J, et al. REPERSP: recommending personalized software projects on GitHub. In: Proceedings of 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Shanghai, 2017. 648--652. Google Scholar

[15] Zhao Z, Shang M. User-based collaborative-filtering recommendation algorithms on Hadoop. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Phuket, 2010. 478--481. Google Scholar

[16] Sarwar B, Karypis G, Konstan J, et al. Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web, Hong Kong, 2001. 285--295. Google Scholar

[17] Xu W Y, Sun X B, Xia X, et al. Scalable relevant project recommendation on GitHub. In: Proceedings of the 9th Asia-Pacific Symposium on Internetware, Shanghai, 2017. Google Scholar

[18] Blincoe K, Sheoran J, Goggins S. Understanding the popular users: following, affiliation influence and leadership on GitHub. Inf Softw Tech, 2016, 70: 30-39 CrossRef Google Scholar

[19] Ray B, Posnett B, Filkov V, et al. A large scale study of programming languages and code quality in GitHub. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Hong Kong, 2014. 155--165. Google Scholar

  • Figure 1

    (Color online) Overview of the architecture of our approach.

  • Figure 2

    (Color online) Procedure to extract features and calculate similarity.

  • Figure 3

    (Color online) Modeling user behaviors.

  • Figure 4

    (Color online) Example project recommendation.

  • Table 1   Statistics of four groups of GitHub data
    Group name Users Projects Development areas
    vim-jp 22 562 Vimscript
    Formidable 16 185 Web
    harvesthq 43 540 Android
    Large 1621 20367 /
  • Table 2   Evaluation metrics
    Metric Formula
    Accuracy $\left|~{\{~u|u~\in~U,R(u)~\cap~T(u)~\ne~\emptyset~\}~}~\right|/|U|$
    Recall $\left|~{R(u)~\cap~T(u)}~\right|/\left|~{T(u)}~\right|~$
    Precision ${\left|~{R(u)~\cap~T(u)}~\right|}~/~{\left|~{R(u)}~\right|}$
    $F1$ $2~\cdot~{\rm~precision}~\cdot~{\rm~recall}~/~({\rm~precision}~+~{\rm~recall})$

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备17057255号       京公网安备11010102003388号