Skip to content
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ A sample family of reusable [GitHub Agentic Workflows](https://github.github.com
### Depth Triage & Analysis Workflows

- [🏷️ Issue Triage](docs/issue-triage.md) - Triage issues and pull requests
- [🔍 Issue Duplication Detector](docs/issue-duplication-detector.md) - Detect duplicate issues and suggest next steps
- [🏥 CI Doctor](docs/ci-doctor.md) - Monitor CI workflows and investigate failures automatically
- [🔍 Repo Ask](docs/repo-ask.md) - Intelligent research assistant for repository questions and analysis
- [🔍 Daily Accessibility Review](docs/daily-accessibility-review.md) - Review application accessibility by automatically running and using the application
Expand Down
100 changes: 100 additions & 0 deletions docs/issue-duplication-detector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# 🔍 Issue Duplication Detector

> For an overview of all available workflows, see the [main README](../README.md).

The [issue duplication detector workflow](../workflows/issue-duplication-detector.md?plain=1) runs every 5 minutes to detect duplicate issues in the repository and suggest next steps.

## Installation

```bash
# Install the 'gh aw' extension
gh extension install github/gh-aw

# Add the Issue Duplication Detector workflow to your repository
gh aw add githubnext/agentics/issue-duplication-detector
```

This walks you through adding the workflow to your repository.

You must also [choose a coding agent](https://github.github.com/gh-aw/reference/engines/) and add an API key secret for the agent to your repository.

You can manually trigger this workflow using `gh aw run issue-duplication-detector` or wait for it to run automatically on its 5-minute schedule.

**Mandatory Checklist**

* [ ] If in a fork, enable GitHub Actions and Issues in the fork settings

## Configuration

This workflow requires no configuration and works out of the box. The workflow uses intelligent semantic analysis to detect duplicate issues by comparing titles, descriptions, and content.

### How It Works

The workflow operates on a 5-minute batch schedule:

1. **Searches for recent issues**: Queries for issues created or updated in the last 10 minutes
2. **Analyzes each issue**: Extracts key information from the issue title and body
3. **Searches for duplicates**: Uses GitHub search with keywords to find similar existing issues
4. **Compares semantically**: Analyzes whether issues describe the same underlying problem or request
5. **Posts helpful comments**: If duplicates are found, adds a polite comment with:
- Links to potential duplicate issues
- Explanation of why they appear to be duplicates
- Suggested next steps for the issue author

### Batch Processing & Cost Control

- Runs every 5 minutes to batch-process multiple issues in a single workflow run
- Only comments when high-confidence duplicates are found
- Maximum 10 comments per run to prevent excessive API usage
- 15-minute timeout ensures predictable runtime costs

After editing run `gh aw compile` to update the workflow and commit all changes to the default branch.

## What it reads from GitHub

- Recently created or updated issues (last 10 minutes)
- Full issue details including title, body, and metadata
- Repository issue history for duplicate detection
- Both open and closed issues for comprehensive analysis

## What it creates

- Adds comments to issues that appear to be duplicates
- Comments include links to potential duplicates and suggested next steps
- Requires `issues: write` permission

## What web searches it performs

- Does not perform web searches; operates entirely within GitHub data

## Human in the loop

- Review duplicate detection comments for accuracy
- Verify that flagged issues are truly duplicates
- Close duplicate issues or provide clarification if the detection was incorrect
- Add any missing context to the original issue if the duplicate has valuable additional information
- Monitor false positives and disable the workflow if accuracy is not acceptable

## Activity duration

- By default this workflow will trigger for at most 30 days, after which it will stop triggering.
- This allows you to experiment with the workflow for a limited time before deciding whether to keep it active.

## Example Output

When a duplicate is detected, the workflow posts a comment like:

```markdown
👋 Hi! It looks like this issue might be a duplicate of existing issue(s):

- #123 - Add support for custom templates

Both issues describe the need for customizable templates in the project configuration.

**Suggested next steps:**
- Review issue #123 to see if it addresses your concern
- If this issue has additional context not covered in #123, consider adding it there
- If they are indeed the same, this issue can be closed as a duplicate

Let us know if you think this assessment is incorrect!
```
103 changes: 103 additions & 0 deletions workflows/issue-duplication-detector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
description: Detect duplicate issues and suggest next steps (batched every 5 minutes)
on:
schedule:
- cron: "*/5 * * * *" # Every 5 minutes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot change this to every 6 hours

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to every 6 hours in commit f2c04d0. Updated the cron schedule to 0 */6 * * * and adjusted the time window for checking issues from last 10 minutes to last 6 hours throughout both the workflow and documentation files.

workflow_dispatch:

permissions: read-all

tools:
github:
toolsets: [default]
bash:
- "*"

safe-outputs:
add-comment:
max: 10 # Allow multiple comments in batch mode

timeout-minutes: 15
---

# Issue Duplication Detector

You are an AI agent that detects duplicate issues in the repository `${{ github.repository }}`.

## Your Task

Analyze recently created or updated issues to determine if they are duplicates of existing issues. This workflow runs every 5 minutes to batch-process issues, providing cost control and natural request batching.

## Instructions

1. **Find recent issues to check**:
- Use GitHub tools to search for issues in this repository that were created or updated in the last 10 minutes
- Construct a query like: `repo:${{ github.repository }} is:issue updated:>=<timestamp-10-minutes-ago>`
- Where the timestamp should be in ISO 8601 format (e.g., 2024-02-04T23:08:00Z)
- This captures any issues that might have been created or edited since the last run
- If no recent issues are found, exit successfully without further action

2. **For each recent issue found**:
- Fetch the full issue details using GitHub tools
- Note the issue number, title, and body content

3. **Search for duplicate issues**:
- For each recent issue, use GitHub tools to search for similar existing issues
- Search using keywords from the issue's title and body
- Look for issues that describe the same problem, feature request, or topic
- Consider both open and closed issues (closed issues might have been resolved)
- Focus on semantic similarity, not just exact keyword matches
- Exclude the current issue itself from the duplicate search

4. **Analyze and compare**:
- Review the content of potentially duplicate issues
- Determine if they are truly duplicates or just similar topics
- A duplicate means the same underlying problem, request, or discussion
- Consider that different wording might describe the same issue

5. **For issues with duplicates found**:
- Use the `output.add-comment` safe output to post a comment on the issue
- In your comment:
- Politely inform that this appears to be a duplicate
- List the duplicate issue(s) with their numbers and titles using markdown links (e.g., "This appears to be a duplicate of #123")
- Provide a brief explanation of why they are duplicates
- Suggest next steps, such as:
- Reviewing the existing issue(s) to see if they already address the concern
- Adding any new information to the existing issue if this one has additional context
- Closing this issue as a duplicate if appropriate
- Keep the tone helpful and constructive

6. **For issues with no duplicates**:
- Do not add any comment
- The issue is unique and can proceed normally

## Important Guidelines

- **Batch processing**: Process multiple issues in a single run when available
- **Read-only analysis**: You are only analyzing and commenting, not modifying issues
- **Be thorough**: Search comprehensively to avoid false negatives
- **Be accurate**: Only flag clear duplicates to avoid false positives
- **Be helpful**: Provide clear reasoning and actionable suggestions
- **Use safe-outputs**: Always use `output.add-comment` for commenting, never try to use GitHub write APIs directly
- **Cost control**: The 5-minute batching window provides a natural upper bound on costs

## Example Comment Format

When you find duplicates, structure your comment like this:

```markdown
👋 Hi! It looks like this issue might be a duplicate of existing issue(s):

- #123 - [Title of duplicate issue]

Both issues describe [brief explanation of the common problem/request].

**Suggested next steps:**
- Review issue #123 to see if it addresses your concern
- If this issue has additional context not covered in #123, consider adding it there
- If they are indeed the same, this issue can be closed as a duplicate

Let us know if you think this assessment is incorrect!
```

Remember: Only comment if you have high confidence that duplicates exist.