Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generate PR description workflow #3042

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions .github/workflows/generate-pr-description.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
name: Auto PR Description

on:
pull_request:
types: [opened, edited]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true

jobs:
auto-describe:
runs-on: ubuntu-latest
if: github.event.pull_request.draft == false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I open the PR as a draft and then later change it to "Ready for Review", this code will not run again.

permissions:
contents: read
pull-requests: write
issues: write
steps:
- name: Checkout code
uses: actions/[email protected]

- name: Set up Python
uses: actions/[email protected]
with:
python-version: '3.11'

- name: Install dependencies
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.cargo/env
uv pip install --system requests openai
htahir1 marked this conversation as resolved.
Show resolved Hide resolved

- name: Check for previous successful run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow would run only when the PR is opened. So, it will only be run once. That's why the step here does not make much sense. When it runs for the first and only time, the comment won't be there, and thus the code will execute.

id: check_comment
run: |
PR_NUMBER="${{ github.event.pull_request.number }}"
COMMENT=$(gh api -X GET "/repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" | jq '.[] | select(.body | contains("Auto PR description generated successfully")) | .id')
if [ -n "$COMMENT" ]; then
echo "Workflow has already run successfully for this PR."
echo "skip=true" >> $GITHUB_OUTPUT
else
echo "skip=false" >> $GITHUB_OUTPUT
fi
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Wait for potential edits
if: steps.check_comment.outputs.skip == 'false'
run: sleep 300 # Wait for 5 minutes

- name: Generate PR description
if: steps.check_comment.outputs.skip == 'false'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: python scripts/generate_pr_description.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

⚠️ Potential issue

Issues Identified in generate_pr_description.py:

  1. Incorrect OpenAI Model Name:

    • The script specifies the model as "gpt-4o-mini", which is likely a typo. It should be updated to a valid model name, such as "gpt-4".
  2. Lack of Error Handling:

    • The OpenAI API calls do not include error handling. This can lead to unhandled exceptions if the API request fails or returns an error.
    • Consider adding try-except blocks to manage potential errors gracefully.
  3. Rate Limiting and Retry Mechanism:

    • The script does not account for API rate limits or implement a retry mechanism. Implementing exponential backoff strategies can improve reliability.
  4. Excessive Truncation Limit:

    • MAX_CHARS is set to 400000, which may be unnecessarily high for PR descriptions. A more reasonable limit can ensure concise and relevant descriptions.

Category:

Analysis chain

Good workflow logic, but some improvements needed.

The steps for checking previous runs, waiting, and generating the PR description are well thought out. However, there are a few points to address:

  1. In the "Check for previous successful run" step, address the shellcheck warnings by using double quotes:

    run: |
      PR_NUMBER="${{ github.event.pull_request.number }}"
      COMMENT=$(gh api -X GET "/repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" | jq '.[] | select(.body | contains("Auto PR description generated successfully")) | .id')
      if [ -n "$COMMENT" ]; then
        echo "Workflow has already run successfully for this PR."
        echo "skip=true" >> "$GITHUB_OUTPUT"
      else
        echo "skip=false" >> "$GITHUB_OUTPUT"
      fi
  2. The use of OpenAI's API for generating PR descriptions is intriguing.

Could you provide more information about how this is implemented? Here's a script to check the content of the Python script:

  1. Consider adding error handling and rate limiting considerations when using external APIs like OpenAI.

Would you like me to propose some error handling improvements for the Python script or open a GitHub issue to track this task?

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the content of the generate_pr_description.py script
cat scripts/generate_pr_description.py

Length of output: 3663

Tools
actionlint

36-36: shellcheck reported issue in this script: SC2086:info:5:23: Double quote to prevent globbing and word splitting

(shellcheck)


36-36: shellcheck reported issue in this script: SC2086:info:7:24: Double quote to prevent globbing and word splitting

(shellcheck)


- name: Add success comment
if: steps.check_comment.outputs.skip == 'false'
run: |
PR_NUMBER="${{ github.event.pull_request.number }}"
gh issue comment ${PR_NUMBER} --body "Auto PR description generated successfully"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Check for errors
if: failure()
run: |
echo "The PR description generation failed. Please check the logs for more information."
htahir1 marked this conversation as resolved.
Show resolved Hide resolved
91 changes: 91 additions & 0 deletions scripts/generate_pr_description.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
import os
import requests
import openai

MAX_CHARS = 400000 # Maximum characters for changes summary

def truncate_changes(changes_summary):
"""Truncates the changes summary to fit within MAX_CHARS."""
total_chars = 0
truncated_summary = []
for change in changes_summary:
change_chars = len(change)
if total_chars + change_chars > MAX_CHARS:
remaining_chars = MAX_CHARS - total_chars
if remaining_chars > 50: # Ensure we're not adding just a few characters
truncated_change = change[:remaining_chars]
truncated_summary.append(truncated_change + "...")
break
total_chars += change_chars
truncated_summary.append(change)
return truncated_summary

def generate_pr_description():
# GitHub API setup
htahir1 marked this conversation as resolved.
Show resolved Hide resolved
token = os.environ['GITHUB_TOKEN']
repo = os.environ['GITHUB_REPOSITORY']
pr_number = os.environ['GITHUB_EVENT_NUMBER']
headers = {'Authorization': f'token {token}'}
api_url = f'https://api.github.com/repos/{repo}/pulls/{pr_number}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve security and error handling for API setup.

The current implementation directly uses environment variables without any validation. This could lead to runtime errors if the variables are not set. Additionally, there's no error handling for the API calls.

Consider adding validation for environment variables and error handling:

import os
from typing import Dict

def get_github_headers() -> Dict[str, str]:
    token = os.environ.get('GITHUB_TOKEN')
    if not token:
        raise ValueError("GITHUB_TOKEN environment variable is not set")
    return {'Authorization': f'token {token}'}

def get_pr_api_url() -> str:
    repo = os.environ.get('GITHUB_REPOSITORY')
    pr_number = os.environ.get('GITHUB_EVENT_NUMBER')
    if not repo or not pr_number:
        raise ValueError("GITHUB_REPOSITORY or GITHUB_EVENT_NUMBER environment variable is not set")
    return f'https://api.github.com/repos/{repo}/pulls/{pr_number}'

# Use these functions in generate_pr_description()
headers = get_github_headers()
api_url = get_pr_api_url()
Tools
Ruff

25-25: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)


26-26: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)


27-27: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)


28-28: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)


28-28: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)


29-29: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)


# Get current PR description
pr_info = requests.get(api_url, headers=headers).json()
current_description = pr_info['body'] or ''
htahir1 marked this conversation as resolved.
Show resolved Hide resolved

# Check if description matches the default template
default_template_indicator = "I implemented/fixed _ to achieve _."

if default_template_indicator in current_description:
# Get PR files
files_url = f'{api_url}/files'
files = requests.get(files_url, headers=headers).json()

# Process files
changes_summary = []
for file in files:
filename = file['filename']
status = file['status']

if status == 'added':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, if there is a new file, there is some meaningful code in there. That's why it might be nice to add the patch for the new files as well.

changes_summary.append(f"Added new file: {filename}")
elif status == 'removed':
changes_summary.append(f"Removed file: {filename}")
elif status == 'modified':
if file['binary']:
changes_summary.append(f"Modified binary file: {filename}")
else:
patch = file.get('patch', '')
if patch:
changes_summary.append(f"Modified {filename}:")
changes_summary.append(patch)
elif status == 'renamed':
changes_summary.append(f"Renamed file from {file['previous_filename']} to {filename}")

# Truncate changes summary if it's too long
truncated_changes = truncate_changes(changes_summary)
changes_text = "\n".join(truncated_changes)

# Generate description using OpenAI
openai.api_key = os.environ['OPENAI_API_KEY']
response = openai.OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant that generates concise pull request descriptions based on changes to files."},
{"role": "user", "content": f"Generate a brief, informative pull request description based on these changes:\n\n{changes_text}"}
],
max_tokens=1000
)
htahir1 marked this conversation as resolved.
Show resolved Hide resolved

generated_description = response.choices[0].message['content'].strip()

# Update PR description
data = {'body': generated_description}
requests.patch(api_url, json=data, headers=headers)
print(f"Updated PR description with generated content")
return True
else:
print("PR already has a non-default description. No action taken.")
return False

if __name__ == "__main__":
generate_pr_description()
Loading