Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 58 additions & 23 deletions .github/workflows/leaderboard.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,50 +98,85 @@ jobs:
source .venv/bin/activate
agentready validate-report /tmp/submission.json

- name: Verify repository exists and is public
- name: Detect repository host
id: detect_host
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO_URL: ${{ steps.extract.outputs.repo_url }}
run: |
# SAFE: REPO_URL comes from workflow output, not direct user input
ORG_REPO=$(echo "$REPO_URL" | sed 's|git@github.com:||' | sed 's|https://github.com/||' | sed 's|\.git$||')

IS_PRIVATE=$(gh repo view "$ORG_REPO" --json isPrivate -q '.isPrivate')

if [ "$IS_PRIVATE" == "true" ]; then
echo "::error::Repository $ORG_REPO is private."
# Determine if this is a GitHub or GitLab repository
if echo "$REPO_URL" | grep -q "github\.com"; then
echo "host=github" >> "$GITHUB_OUTPUT"
elif echo "$REPO_URL" | grep -q "gitlab\.com"; then
echo "host=gitlab" >> "$GITHUB_OUTPUT"
else
echo "::error::Unsupported repository host in URL: $REPO_URL"
exit 1
fi
Comment on lines +107 to 114
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Grep patterns may match unintended hosts.

The patterns grep -q "github\.com" and grep -q "gitlab\.com" match these strings anywhere in the URL. A malicious URL like https://evil.com/fake-github.com/org/repo would incorrectly be classified as GitHub. Consider using stricter patterns.

Proposed fix using anchored patterns
-          if echo "$REPO_URL" | grep -q "github\.com"; then
+          if echo "$REPO_URL" | grep -qE '^(git@|https?://)github\.com[:/]'; then
             echo "host=github" >> "$GITHUB_OUTPUT"
-          elif echo "$REPO_URL" | grep -q "gitlab\.com"; then
+          elif echo "$REPO_URL" | grep -qE '^(git@|https?://)gitlab\.com[:/]'; then
             echo "host=gitlab" >> "$GITHUB_OUTPUT"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if echo "$REPO_URL" | grep -q "github\.com"; then
echo "host=github" >> "$GITHUB_OUTPUT"
elif echo "$REPO_URL" | grep -q "gitlab\.com"; then
echo "host=gitlab" >> "$GITHUB_OUTPUT"
else
echo "::error::Unsupported repository host in URL: $REPO_URL"
exit 1
fi
if echo "$REPO_URL" | grep -qE '^(git@|https?://)github\.com[:/]'; then
echo "host=github" >> "$GITHUB_OUTPUT"
elif echo "$REPO_URL" | grep -qE '^(git@|https?://)gitlab\.com[:/]'; then
echo "host=gitlab" >> "$GITHUB_OUTPUT"
else
echo "::error::Unsupported repository host in URL: $REPO_URL"
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/leaderboard.yml around lines 107 - 114, The current grep
checks (grep -q "github\.com" / "gitlab\.com") can match those strings anywhere
in the URL; update the checks to match the actual host portion of REPO_URL (so
malicious paths won't match). Replace the two grep lines with anchored/URL-aware
regex checks such as using grep -E (or grep -Eq) against a pattern that ensures
the hostname is github.com or gitlab.com (for example matching
://...github\.com(/|$) or allowing optional userinfo/www), i.e. change the grep
commands that reference REPO_URL to stricter regexes and leave the echo
"host=github"/"host=gitlab" writes to GITHUB_OUTPUT unchanged.


echo "✅ Repository $ORG_REPO is public"
# Convert SSH/HTTPS URL to an HTTPS clone URL
# git@<host>:<path>.git -> https://<host>/<path>.git
CLONE_URL=$(echo "$REPO_URL" | sed -E 's|^git@([^:]+):|https://\1/|' | sed 's|\.git$||')
echo "clone_url=${CLONE_URL}.git" >> "$GITHUB_OUTPUT"
echo "browse_url=$CLONE_URL" >> "$GITHUB_OUTPUT"

- name: Verify repository exists and is public
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO_URL: ${{ steps.extract.outputs.repo_url }}
HOST: ${{ steps.detect_host.outputs.host }}
CLONE_URL: ${{ steps.detect_host.outputs.clone_url }}
run: |
if [ "$HOST" = "github" ]; then
# GitHub: use gh CLI for verification
ORG_REPO=$(echo "$REPO_URL" | sed 's|git@github.com:||' | sed 's|https://github.com/||' | sed 's|\.git$||')
IS_PRIVATE=$(gh repo view "$ORG_REPO" --json isPrivate -q '.isPrivate')
if [ "$IS_PRIVATE" == "true" ]; then
echo "::error::Repository $ORG_REPO is private."
exit 1
fi
echo "✅ Repository $ORG_REPO is public"
else
# GitLab/other: verify repo is publicly accessible via git ls-remote
if git ls-remote --exit-code "$CLONE_URL" HEAD > /dev/null 2>&1; then
echo "✅ Repository is publicly accessible: $CLONE_URL"
else
echo "::error::Repository is not publicly accessible: $CLONE_URL"
exit 1
fi
fi

- name: Verify submitter has access
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO_URL: ${{ steps.extract.outputs.repo_url }}
SUBMITTER: ${{ github.event.pull_request.user.login }}
HOST: ${{ steps.detect_host.outputs.host }}
run: |
# SAFE: All values in environment variables
ORG_REPO=$(echo "$REPO_URL" | sed 's|git@github.com:||' | sed 's|https://github.com/||' | sed 's|\.git$||')

if gh api "/repos/$ORG_REPO/collaborators/$SUBMITTER" 2>/dev/null; then
echo "✅ $SUBMITTER is a collaborator on $ORG_REPO"
elif [ "$(gh api "/repos/$ORG_REPO" -q '.owner.login')" == "$SUBMITTER" ]; then
echo "✅ $SUBMITTER is the owner of $ORG_REPO"
if [ "$HOST" = "github" ]; then
# GitHub: verify via API
ORG_REPO=$(echo "$REPO_URL" | sed 's|git@github.com:||' | sed 's|https://github.com/||' | sed 's|\.git$||')
if gh api "/repos/$ORG_REPO/collaborators/$SUBMITTER" 2>/dev/null; then
echo "✅ $SUBMITTER is a collaborator on $ORG_REPO"
elif [ "$(gh api "/repos/$ORG_REPO" -q '.owner.login')" == "$SUBMITTER" ]; then
echo "✅ $SUBMITTER is the owner of $ORG_REPO"
else
echo "::error::$SUBMITTER does not have commit access to $ORG_REPO"
exit 1
fi
else
echo "::error::$SUBMITTER does not have commit access to $ORG_REPO"
exit 1
# Non-GitHub: cannot verify cross-platform access automatically
echo "::warning::Cannot verify submitter access for non-GitHub repos. Manual review required."
echo "⚠️ Submitter access for non-GitHub repos must be verified manually by maintainers."
fi

- name: Re-run assessment
env:
REPO_URL: ${{ steps.extract.outputs.repo_url }}
CLONE_URL: ${{ steps.detect_host.outputs.clone_url }}
run: |
source .venv/bin/activate

# SAFE: REPO_URL in environment variable
echo "Cloning $REPO_URL..."
git clone "$REPO_URL" /tmp/repo-to-assess
echo "Cloning $CLONE_URL..."
git clone "$CLONE_URL" /tmp/repo-to-assess

echo "Running assessment..."
agentready assess /tmp/repo-to-assess --output-dir /tmp/validation
Expand Down
59 changes: 55 additions & 4 deletions scripts/generate-leaderboard-data.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,51 @@
"""

import json
import re
import sys
from collections import defaultdict
from datetime import datetime
from pathlib import Path
from typing import Any


def git_url_to_https(url: str) -> str:
"""Convert a git remote URL (SSH or HTTPS) to an HTTPS browse URL.

Handles GitHub and GitLab SSH/HTTPS formats:
git@github.com:org/repo.git -> https://github.com/org/repo
git@gitlab.com:group/sub/project.git -> https://gitlab.com/group/sub/project
https://github.com/org/repo.git -> https://github.com/org/repo
"""
url = url.strip()
# SSH format: git@<host>:<path>.git
ssh_match = re.match(r"^git@([^:]+):(.+?)(?:\.git)?$", url)
if ssh_match:
host, path = ssh_match.groups()
return f"https://{host}/{path}"
# HTTPS format: strip trailing .git
if url.startswith("https://") or url.startswith("http://"):
return re.sub(r"\.git$", "", url)
return url


def repo_display_name_from_url(url: str) -> str | None:
"""Extract the full repository path from a git URL for display purposes.

Returns the path portion without the host, e.g.:
git@gitlab.com:redhat/rhel-ai/wheels/builder.git -> redhat/rhel-ai/wheels/builder
https://github.com/org/repo -> org/repo
"""
url = url.strip()
ssh_match = re.match(r"^git@[^:]+:(.+?)(?:\.git)?$", url)
if ssh_match:
return ssh_match.group(1)
https_match = re.match(r"^https?://[^/]+/(.+?)(?:\.git)?$", url)
if https_match:
return https_match.group(1)
return None


def scan_submissions(submissions_dir: Path) -> dict[str, list[dict[str, Any]]]:
"""Scan submissions directory and group assessments by repository.

Expand Down Expand Up @@ -86,16 +124,29 @@ def generate_leaderboard_data(repos: dict[str, list[dict[str, Any]]]) -> dict[st
agentready_version = metadata.get("agentready_version", "unknown")
research_version = metadata.get("research_version", "unknown")

# Derive display name and URL from the assessment JSON's
# repository.url when available, falling back to the directory-
# derived repo_name for backwards compatibility with GitHub repos.
raw_url = latest["repository"].get("url", "")
display_name = repo_display_name_from_url(raw_url) if raw_url else None
browse_url = git_url_to_https(raw_url) if raw_url else None

# Fall back to directory-derived values (GitHub assumption)
if not display_name:
display_name = repo_name
if not browse_url:
browse_url = f"https://github.com/{repo_name}"

entry = {
"repo": repo_name,
"org": repo_name.split("/")[0],
"name": repo_name.split("/")[1],
"repo": display_name,
"org": display_name.split("/")[0],
"name": display_name.rsplit("/", 1)[-1],
"score": float(latest["overall_score"]),
"tier": latest["certification_level"],
"language": latest["repository"].get("primary_language", "Unknown"),
"size": latest["repository"].get("size_category", "Unknown"),
"last_updated": submissions[0]["timestamp"][:10], # YYYY-MM-DD
"url": f"https://github.com/{repo_name}",
"url": browse_url,
"agentready_version": agentready_version,
"research_version": research_version,
"history": [
Expand Down
Loading
Loading