Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[11-03] 组会内容介绍:基于知识感知的异构图学习方法的 issue-PR 链接预测研究 #305

Open
andyhuang18 opened this issue Nov 3, 2024 · 1 comment

Comments

@andyhuang18
Copy link
Contributor

andyhuang18 commented Nov 3, 2024

Description

汇报人:黄温瑞

本次会议分享一篇发表在 《IEEE Transactions on Software Engineering》(CCF-A)的一篇论文。

论文链接:

Improving_Issue-PR_Link_Prediction_via_Knowledge-Aware_Heterogeneous_Graph_Learning.pdf

论文摘要:

Links between issues and pull requests (PRs) assist GitHub developers in tackling technical challenges, gaining development inspiration, and improving repository maintenance. In realistic repositories, these links are still insufficiently established. Aiming at this situation, existing works focus on issues and PRs themselves and employ text similarity with additional information like issue size to predict issue-PR links, yet their effectiveness is unsatisfactory. The limitation is that issues and PRs are not isolated on GitHub. Rather, they are related to multiple GitHub sources, including repositories and submitters, which, through their diverse relationships, can supply potential and crucial knowledge about technical domains, developmental insights, and cross-repository technical details. To this end, we propose Auto IP Linker (AIPL), which introduces the heterogeneous graph to model multiple GitHub sources with their relationships. Further, it leverages the metapath-based technique to reveal and incorporate the potential information for a more comprehensive understanding of issues and PRs. Firstly, we identify 4 types of GitHub sources related to issues and PRs (repositories, users, issues, PRs) as well as their relationships, and model them into task-specific heterogeneous graphs. Next, we analyze information transmitted among issues or PRs to reveal which knowledge is crucial for them. Based on our analysis, we formulate a series of metapaths and employ the metapath-based technique to incorporate various information for learning the knowledgeaware embedding of issues and PRs. Finally, we can infer whether an issue and a PR can be linked based on their embedding. We evaluate the performance of AIPL on real-world data sets collected from GitHub. The results show that, compared to the baselines, AIPL can achieve average improvements of 15.94%, 15.19%, 20.52%, and 18.50% in terms of Accuracy, Precision, Recall, and F1-score.

AIPL框架概览:

image

AIPL性能展示:

image

相关论文:

@birdflyi
Copy link
Collaborator

birdflyi commented Nov 4, 2024

相关工作:https://github.com/birdflyi/GitHub_Collaboration_Relation_Extraction

Feature:

  • 更丰富的节点类型和边类型
  • 针对Issue、PullRequest、SHA的缩写识别
  • 针对PullRequest的不唯一issue_id的同一节点合并
  • 对不同项目中的实体id扩展,以确保在全域中的id唯一性

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants