Skip to content

fix: separate repo-level and PR-level refresh#286

Open
swaparup36 wants to merge 11 commits intoOWASP-BLT:mainfrom
swaparup36:fix-optimize-api-calls
Open

fix: separate repo-level and PR-level refresh#286
swaparup36 wants to merge 11 commits intoOWASP-BLT:mainfrom
swaparup36:fix-optimize-api-calls

Conversation

@swaparup36
Copy link
Copy Markdown
Contributor

@swaparup36 swaparup36 commented Mar 16, 2026

Resolves #83

Changes

  • Added a cleaner refresh flow so we only do heavy PR updates when needed.
  • Improved batch GraphQL refresh so check status and behind-by data stay accurate.
  • Added a new repo refresh endpoint /api/refresh-repo for repo-wide sync cases.
  • Updated the UI:
    • Renamed Refresh All Essential to Refresh All PRs
    • Added Refresh Repo Data button
    • Updated API call count labels to match the new logic.

Summary by CodeRabbit

  • New Features

    • Added "Refresh Repo Data" button and a matching API to run per-repository refreshes and report aggregated per-repo results.
    • Added "Refresh Visible PRs" control to refresh only PRs currently shown in the UI.
  • Improvements

    • More accurate API-call estimates scoped to the current view and unique repos.
    • Better per-repo failure reporting, cache invalidation, and data reload after refresh.
    • Two-phase refresh flow for more reliable discovery and updating of open PRs.

@owasp-blt
Copy link
Copy Markdown

owasp-blt bot commented Mar 16, 2026

📊 Monthly Leaderboard

Hi @swaparup36! Here's how you rank for March 2026:

Rank User Open PRs PRs (merged) PRs (closed) Reviews Comments Total
13 DevalPrime @DevalPrime 5 20 0 0 3 211
14 swaparup36 @swaparup36 5 15 0 2 1 167
15 azizrebhi @azizrebhi 2 15 1 0 3 156

Scoring this month (across OWASP-BLT org): Open PRs (+1 each), Merged PRs (+10), Closed (not merged) (−2), Reviews (+5; first two per PR in-month), Comments (+2, excludes CodeRabbit). Run /leaderboard on any issue or PR to see your rank!

@owasp-blt
Copy link
Copy Markdown

owasp-blt bot commented Mar 16, 2026

👋 Hi @swaparup36!

This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:

  • The PR author
  • coderabbitai
  • copilot

Once a valid peer review is submitted, this check will pass automatically. Thank you!

⚠️ Peer review enforcement is active.

@github-actions github-actions bot added the files-changed: 4 PR changes 4 files label Mar 16, 2026
@github-actions
Copy link
Copy Markdown

🍃 PR Readiness Check

Check the readiness of this PR on Leaf:
👉 Open on Leaf

Leaf reviews pull requests for operational readiness, security risks, and production-impacting changes before they ship.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds frontend controls to refresh visible PRs and per-repository data, implements a new /api/refresh-repo endpoint, and converts PR batch fetching to a two-pass approach that populates statusCheckRollup-derived check counts and accurate behind_by values.

Changes

Cohort / File(s) Summary
Frontend UI
public/index.html
Replaces "Refresh All Essential" with "Refresh Visible PRs" (#refreshPrsBtn), adds "Refresh Repo Data" (#refreshRepoDataBtn), scopes API-call estimates to visible PRs and scoped repos, and posts per-repo refreshes to /api/refresh-repo with aggregated UI state/errors.
GitHub client / GraphQL
src/github_api.py
Rewrites fetch_multiple_prs_batch to a 2-pass flow: first fetch head/base refs and statusCheckRollup (single latest commit), then batch compare queries to compute behind_by. Escapes GraphQL strings, normalizes refs, computes checks from statusCheckRollup, removes placeholder fields, and adds repo-scoped helpers: fetch_repo_open_pr_numbers, fetch_repo_open_prs, fetch_repo_comparison_batch.
Handlers / Scheduling
src/handlers.py
Adds handle_refresh_repo(request, env) implementing per-repo discovery, compare batching, upsert/remove logic, cache invalidation, and aggregated response. Refactors scheduled refresh into Phase 1 (repo-level discovery/delete/discover) and Phase 2 (PR-level batch enrichment), updates logging and Slack error reporting. Note: a duplicated/inserted stub appears before an existing handle_refresh_org definition — review for structural duplication.
API routing
src/index.py
Registers new route POST /api/refresh-repo and imports handle_refresh_repo.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Frontend as Frontend (browser)
    participant API as API (/api/refresh-repo)
    participant DB as Database
    participant GitHub as GitHub API
    participant Cache as Cache

    Client->>Frontend: Click "Refresh Repo Data"
    Frontend->>API: POST /api/refresh-repo (owner/repo, token)
    API->>DB: load tracked PR rows for repo
    API->>GitHub: fetch_repo_open_prs() (paginated GraphQL)
    GitHub-->>API: open PR list (headRefOid, baseRefName)
    API->>GitHub: fetch_repo_comparison_batch() (batched compares)
    GitHub-->>API: behind_by map per PR
    API->>DB: upsert/update PR records (behind_by, checks, status)
    API->>Cache: invalidate readiness/timeline caches
    API-->>Frontend: summary (updated, discovered, removed, failures, rate_limit)
    Frontend-->>Client: display aggregated result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

quality: high

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: separate repo-level and PR-level refresh' accurately summarizes the main change—refactoring refresh logic into distinct repo and PR-level operations to optimize API usage.
Linked Issues check ✅ Passed The PR fulfills issue #83 requirements: implements separate repo/PR refresh flows, adds multi-pass GraphQL batching (Pass 1 for checks, Pass 2 for behind_by), introduces /api/refresh-repo endpoint, reduces API calls per 50 PRs to ~2 GraphQL calls, and updates UI controls.
Out of Scope Changes check ✅ Passed All changes directly support the stated objectives: github_api.py adds batch helpers, handlers.py implements repo/PR refresh logic and scheduled refresh restructuring, index.py adds routing, and public/index.html updates UI controls and API estimate labels.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
public/index.html (1)

2116-2141: ⚠️ Potential issue | 🟠 Major

Chunk /api/refresh-batch before calling this “All PRs”.

Line 2116 now advertises a scalable “all PRs” refresh, but the click handler still sends one /api/refresh-batch request and src/handlers.py rejects anything above 100 ids. With perPage allowed up to 1000, this action fails reliably on larger pages unless you slice the request into <=100-id chunks or narrow the label/scope.

Suggested fix
-                    const prIds = allPrs.map(pr => pr.id);
-                    if (prIds.length > 0) {
-                        const resp = await fetch('/api/refresh-batch', {
-                            method: 'POST',
-                            headers: { 'Content-Type': 'application/json' },
-                            body: JSON.stringify({ pr_ids: prIds })
-                        });
-                        if (!resp.ok) throw new Error(`Refresh failed (${resp.status})`);
-                        const data = await resp.json();
-                        if (data.error) throw new Error(data.error);
-                    }
+                    const prIds = allPrs.map(pr => pr.id);
+                    for (let i = 0; i < prIds.length; i += 100) {
+                        const chunk = prIds.slice(i, i + 100);
+                        const resp = await fetch('/api/refresh-batch', {
+                            method: 'POST',
+                            headers: { 'Content-Type': 'application/json' },
+                            body: JSON.stringify({ pr_ids: chunk })
+                        });
+                        if (!resp.ok) throw new Error(`Refresh failed (${resp.status})`);
+                        const data = await resp.json();
+                        if (data.error) throw new Error(data.error);
+                    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@public/index.html` around lines 2116 - 2141, The “Refresh All PRs” action
(button id refreshPrsBtn) currently sends a single POST to /api/refresh-batch
which fails for >100 PR ids because src/handlers.py enforces a 100-id limit;
change the click handler to chunk the allPrs id list into batches of at most 100
(respecting perPage and the server limit), send one /api/refresh-batch request
per chunk, await or aggregate their results, and update the UI only after all
batches complete; also update the displayed API call estimate (refreshApiCalls)
and tooltip/title text to indicate batched requests so the label matches the
implementation.
src/github_api.py (1)

519-675: ⚠️ Potential issue | 🔴 Critical

Batch transform currently persists only one PR per chunk.

pr_data, owner, repo, and pr_number used from Line 593 onward are outside the Line 521 loop, so only the last PR in the batch is transformed/stored. This drops other PR updates and breaks batch refresh correctness.

💡 Proposed fix
-            comparison_targets = {}
+            comparison_targets = {}
+            pass1_pr_data = {}

             for i, (owner, repo, pr_number) in enumerate(batch):
                 ...
                 comparison_targets[(owner, repo, pr_number)] = {
                     'base_ref': pr_data.get('baseRefName'),
                     'head_oid': pr_data.get('headRefOid')
                 }
+                pass1_pr_data[(owner, repo, pr_number)] = pr_data

             # Pass 2: Batch compare...
             behind_by_map = {}
             if comparison_query_parts:
                 ...
-                # Transform GraphQL response to match REST API format used by fetch_pr_data
-                author = pr_data.get('author', {})
-                base_repo = pr_data.get('baseRepository', {})
-                ...
-                all_results[(owner, repo, pr_number)] = transformed_data
+            # Transform every PR from pass 1 using behind_by_map from pass 2
+            for owner, repo, pr_number in batch:
+                pr_data = pass1_pr_data.get((owner, repo, pr_number))
+                if not pr_data:
+                    continue
+
+                author = pr_data.get('author') or {}
+                base_repo = pr_data.get('baseRepository') or {}
+                ...
+                all_results[(owner, repo, pr_number)] = transformed_data
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/github_api.py` around lines 519 - 675, The batch transform uses
pr_data/owner/repo/pr_number variables from the earlier loop so only the last PR
is transformed; move the per-PR transformation logic to iterate over the
original batch or comparison_targets keys instead of using the single pr_data
variable — e.g., for each key in comparison_targets (or re-loop over batch),
fetch the corresponding pr_data from data[alias]['pullRequest'], compute
open_conversations_count, review_status, checks counts, behind_by (using
behind_by_map[key]), build transformed_data and assign to all_results[(owner,
repo, pr_number)] so every PR in the chunk is transformed and stored; ensure you
reference comparison_targets, compare_keys/behind_by_map, and the per-PR alias
(pr{i}) to locate and fix the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@public/index.html`:
- Around line 3388-3420: The refresh button currently builds repoKeys from the
page-scoped allPrs (so only paginated PRs are refreshed); change the flow so
after iterating repoKeys and calling POST /api/refresh-repo you also refresh the
global repo list by reloading /api/repos (or calling whichever helper fetches
repos) so repo-level metadata (behind_by and added/removed PRs) is up-to-date;
update the handler that uses repoKeys, allPrs, loadPrs(true), and
invalidateApiCache() to trigger the repos reload (and handle errors similarly to
the existing failures array) before restoring the button UI.

In `@src/github_api.py`:
- Around line 949-953: The code assumes result['data']['repository'] exists and
will raise if repository is null; modify the comparison-batch handling to guard
against a null repository by reading result.get('data') first and treating
repository as an empty dict when missing (e.g., repo_data = (result.get('data')
or {}).get('repository') or {}), then continue using repo_data for lookups of
alias/ref_data/compare_data (variables referenced: result, repo_data,
index_to_pr_number, alias, ref_data, compare_data) so null repository responses
don't cause an exception and the batch can handle missing comparisons
gracefully.
- Around line 845-847: The code currently chains .get() through 'repository' and
will raise a None dereference if GitHub returns repository: null; update the
GraphQL open-PR handling to first extract repository = result.get('data',
{}).get('repository') and if repository is None raise a controlled API error
(with a clear message/context) before accessing pullRequests, then set pr_data =
repository.get('pullRequests', {}) and continue to derive nodes and page_info
from pr_data; reference the variables pr_data, nodes, page_info and the
'repository' field when implementing this check so the error is thrown
deterministically instead of causing a runtime exception.

In `@src/handlers.py`:
- Around line 818-840: The discovered-PR branch is inserting misleading
placeholder CI/review fields into the prs table via upsert_pr; instead either
skip inserting here or immediately enrich the discovered PR data by calling
fetch_multiple_prs_batch (or the PR-level fetch) to backfill real fields before
calling upsert_pr. Concretely: collect the discovered PR identifiers
(owner/repo_name/pr_number or pr_url), call fetch_multiple_prs_batch/db PR fetch
to obtain the real
checks_passed/checks_failed/checks_skipped/review_status/mergeable_state/files_changed/commits_count/is_merged/etc.,
merge those results into pr_data, then call upsert_pr(db, pr_url, owner,
repo_name, pr_number, pr_data); alternatively remove the upsert_pr call here and
let the existing PR-level refresher handle new PR inserts.
- Around line 748-785: The code currently does behind_by =
behind_by_map.get(pr_number, 0) which coerces a missing comparison into 0;
instead, when fetch_repo_comparison_batch has no entry for pr_number you must
not overwrite an existing tracked row's behind_by nor mislabel a newly
discovered PR as up-to-date: look up behind_by_map.get(pr_number) without a
default, and if it returns None/absent then if pr_number exists in tracked_map
preserve the existing row['behind_by'] value, otherwise set behind_by to None
(so the DB column is stored as NULL or omitted) and only write/patch the
behind_by column when a real compare result exists; update code paths that
INSERT/UPDATE the prs row to accept None/NULL and avoid binding 0 for missing
comparisons.

In `@src/index.py`:
- Around line 118-119: The /api/refresh-repo route is missing an authentication
gate before calling handle_refresh_repo; update the dispatch in src/index.py to
validate caller credentials (e.g., check Authorization header or session) or
require a caller-supplied token and reject unauthorized requests before calling
handle_refresh_repo(request, env). Use the same token/source logic that
handlers.py expects (avoid falling back to shared handler token), return a
401/403 for missing/invalid credentials, and only call handle_refresh_repo when
the auth check succeeds.

---

Outside diff comments:
In `@public/index.html`:
- Around line 2116-2141: The “Refresh All PRs” action (button id refreshPrsBtn)
currently sends a single POST to /api/refresh-batch which fails for >100 PR ids
because src/handlers.py enforces a 100-id limit; change the click handler to
chunk the allPrs id list into batches of at most 100 (respecting perPage and the
server limit), send one /api/refresh-batch request per chunk, await or aggregate
their results, and update the UI only after all batches complete; also update
the displayed API call estimate (refreshApiCalls) and tooltip/title text to
indicate batched requests so the label matches the implementation.

In `@src/github_api.py`:
- Around line 519-675: The batch transform uses pr_data/owner/repo/pr_number
variables from the earlier loop so only the last PR is transformed; move the
per-PR transformation logic to iterate over the original batch or
comparison_targets keys instead of using the single pr_data variable — e.g., for
each key in comparison_targets (or re-loop over batch), fetch the corresponding
pr_data from data[alias]['pullRequest'], compute open_conversations_count,
review_status, checks counts, behind_by (using behind_by_map[key]), build
transformed_data and assign to all_results[(owner, repo, pr_number)] so every PR
in the chunk is transformed and stored; ensure you reference comparison_targets,
compare_keys/behind_by_map, and the per-PR alias (pr{i}) to locate and fix the
code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: fb5cee05-a606-484c-b585-aa73bebf8d37

📥 Commits

Reviewing files that changed from the base of the PR and between a22a666 and e251e3b.

📒 Files selected for processing (4)
  • public/index.html
  • src/github_api.py
  • src/handlers.py
  • src/index.py

…ld in open-pr graphql path; coerce a missing compare result to behind_by = 0; auto-inserting discovered PRs with fabricated CI/review data; Protect /api/refresh-repo from unauthenticated triggering
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/github_api.py (1)

519-534: ⚠️ Potential issue | 🔴 Critical

Critical: Batch transform currently writes data for only one PR context.

Line 593 onward uses pr_data, owner, repo, and pr_number outside the per-PR loop, so only the last bound context is transformed and saved. This can silently drop other PRs from all_results.

Proposed fix
-            comparison_targets = {}
+            comparison_targets = {}
+            pr_payloads = {}

             for i, (owner, repo, pr_number) in enumerate(batch):
                 alias = f"pr{i}"
                 repo_data = data.get(alias, {})
                 pr_data = repo_data.get('pullRequest') if repo_data else None
@@
                 if not pr_data:
                     print(f"Warning: No PR data in GraphQL response for {owner}/{repo}#{pr_number}")
                     all_results[(owner, repo, pr_number)] = None
                     continue

+                pr_payloads[(owner, repo, pr_number)] = pr_data
                 comparison_targets[(owner, repo, pr_number)] = {
                     'base_ref': pr_data.get('baseRefName'),
                     'head_oid': pr_data.get('headRefOid')
                 }

@@
-                # Transform GraphQL response to match REST API format used by fetch_pr_data
-                author = pr_data.get('author', {})
-                base_repo = pr_data.get('baseRepository', {})
+            # Transform each PR payload (not just last loop variable)
+            for owner, repo, pr_number in compare_keys:
+                pr_data = pr_payloads.get((owner, repo, pr_number))
+                if not pr_data:
+                    continue
+                author = pr_data.get('author', {})
+                base_repo = pr_data.get('baseRepository', {})
@@
-                all_results[(owner, repo, pr_number)] = transformed_data
+                all_results[(owner, repo, pr_number)] = transformed_data

Also applies to: 593-675

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/github_api.py` around lines 519 - 534, The loop that builds
comparison_targets and writes per-PR results is leaking per-PR variables
(pr_data, owner, repo, pr_number) out of the per-PR scope so only the last PR
gets transformed/saved; fix by performing all per-PR transformations and
assignments inside the for i, (owner, repo, pr_number) in enumerate(batch) loop
(use alias to fetch data for each PR), populate comparison_targets[(owner, repo,
pr_number)] and all_results[(owner, repo, pr_number)] within that loop, and do
not reference pr_data/owner/repo/pr_number after the loop—ensure any pass-2
GraphQL compare query preparation uses the per-PR entries from
comparison_targets rather than loop-scoped variables.
🧹 Nitpick comments (1)
src/github_api.py (1)

375-388: Recommended: Deduplicate GraphQL helper functions to prevent drift.

_escape_graphql_string and _normalize_ref_name are duplicated in two functions. Extracting them as module-private helpers will keep behavior consistent and reduce future regressions.

Also applies to: 877-888

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/github_api.py` around lines 375 - 388, Two small helper implementations
(_escape_graphql_string and _normalize_ref_name) are duplicated; consolidate
them into single module-private helpers and have all callers use those helpers
to avoid drift. Remove the duplicated copies and keep one canonical
implementation (module-level functions named _escape_graphql_string and
_normalize_ref_name), then update the functions that previously contained inline
copies to call these helpers (search for the other occurrence around lines
~877-888) so both places use the same logic; ensure signature and return
behavior remain identical to preserve callers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@public/index.html`:
- Around line 2140-2142: The UI button label "Refresh All PRs" misrepresents the
action because the click handler currently posts only the visible PR ids using
allPrs.map(pr => pr.id); change the label and tooltip to "Refresh Visible PRs"
(and update the title attribute and any aria-labels) so wording matches the
behavior, or alternatively modify the client call that uses allPrs.map(pr =>
pr.id) and the backend endpoint to accept and refresh every tracked PR id if you
intend true global refresh; update the displayed API-call count text
(refreshApiCalls) and any related helper names to match the chosen behavior.
- Around line 2116-2118: The displayed repo refresh API call count is computed
from paginated allPrs (`uniqueRepoCount = new Set(allPrs.map(...)).size`) but
should use the actual scoped `repos` list used for refresh; update the
calculation of uniqueRepoCount to derive unique repo identifiers from the
`repos` array (the same scoped list around the `repos` variable referenced
later) so repoRefreshApiCalls = uniqueRepoCount * 2 reflects the correct number
of repo-scoped refresh API calls.

In `@src/github_api.py`:
- Around line 949-955: The code currently aborts the entire batch when result
contains 'errors' (the check at result with raising Exception), but GraphQL can
return partial data with field-level errors; instead of raising, log the errors
(e.g. logger.warning or similar) and continue processing: remove the immediate
raise for "GraphQL comparison batch errors: {result['errors']}", capture
result.get('errors') into a variable, emit a non-fatal log that includes those
errors and the owner/repo context, and let the subsequent repo_data lookup and
the loop over index_to_pr_number proceed so per-PR fallbacks (like defaulting
behind_by = 0) still run; only keep the existing exception for the truly missing
repository case (the current check for not repo_data).

In `@src/handlers.py`:
- Around line 724-740: The token validation is inverted: change the condition
that currently rejects long, whitespace-free caller_token to instead reject
tokens that are too short or contain whitespace (i.e., replace the "if
len(caller_token.strip()) >= 20 and (' ' not in ...) and ('\t' not in ...):
return 403" logic with the opposite: return 403 when len < 20 or token contains
whitespace). Also ensure user_token selection uses the provided caller_token
first and falls back to env.GITHUB_TOKEN (i.e., set user_token = caller_token or
getattr(env, 'GITHUB_TOKEN', None)) so the env fallback is reachable; keep
Response.new usage and JSON error messages for 401/403 as-is.
- Around line 895-898: The handler currently returns raw exception details to
the client; instead keep notifying Slack via notify_slack_exception (using
getattr(env, 'SLACK_ERROR_WEBHOOK', '')) and change the Response.new call in the
exception block (for handle_refresh_repo) to return a generic 500 payload such
as {'error': 'Internal server error'} with status 500 and Content-Type
application/json, removing type(e).__name__ and str(e) from the response body so
internal details are not exposed to clients.

---

Outside diff comments:
In `@src/github_api.py`:
- Around line 519-534: The loop that builds comparison_targets and writes per-PR
results is leaking per-PR variables (pr_data, owner, repo, pr_number) out of the
per-PR scope so only the last PR gets transformed/saved; fix by performing all
per-PR transformations and assignments inside the for i, (owner, repo,
pr_number) in enumerate(batch) loop (use alias to fetch data for each PR),
populate comparison_targets[(owner, repo, pr_number)] and all_results[(owner,
repo, pr_number)] within that loop, and do not reference
pr_data/owner/repo/pr_number after the loop—ensure any pass-2 GraphQL compare
query preparation uses the per-PR entries from comparison_targets rather than
loop-scoped variables.

---

Nitpick comments:
In `@src/github_api.py`:
- Around line 375-388: Two small helper implementations (_escape_graphql_string
and _normalize_ref_name) are duplicated; consolidate them into single
module-private helpers and have all callers use those helpers to avoid drift.
Remove the duplicated copies and keep one canonical implementation (module-level
functions named _escape_graphql_string and _normalize_ref_name), then update the
functions that previously contained inline copies to call these helpers (search
for the other occurrence around lines ~877-888) so both places use the same
logic; ensure signature and return behavior remain identical to preserve
callers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 3f20bcc5-9762-4cdf-8501-788dff927d3f

📥 Commits

Reviewing files that changed from the base of the PR and between e251e3b and ee5f693.

📒 Files selected for processing (3)
  • public/index.html
  • src/github_api.py
  • src/handlers.py

…erstates refresh scope; aborting the whole repo comparison batch on partial GraphQL errors; inverted token validation; raw exception in api response text
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/github_api.py (1)

536-674: ⚠️ Potential issue | 🔴 Critical

Only one PR per batch is persisted after pass-2 compare.

The transform/write section uses pr_data, owner, repo, and pr_number after the collection loop, so only the last PR in the batch is written to all_results. The rest of the batch is dropped.

💡 Proposed fix
             behind_by_map = {}
             if comparison_query_parts:
                 ...
-                # Transform GraphQL response to match REST API format used by fetch_pr_data
-                author = pr_data.get('author', {})
-                base_repo = pr_data.get('baseRepository', {})
-                ...
-                transformed_data = {
-                    ...
-                }
-                all_results[(owner, repo, pr_number)] = transformed_data
+            # Transform each PR in the batch after behind_by_map is ready
+            for i, (owner, repo, pr_number) in enumerate(batch):
+                alias = f"pr{i}"
+                repo_data = data.get(alias, {})
+                pr_data = repo_data.get('pullRequest') if repo_data else None
+                if not pr_data:
+                    all_results[(owner, repo, pr_number)] = None
+                    continue
+
+                author = pr_data.get('author', {})
+                base_repo = pr_data.get('baseRepository', {})
+                ...
+                transformed_data = {
+                    ...
+                    'behind_by': behind_by_map.get((owner, repo, pr_number), 0),
+                    ...
+                }
+                all_results[(owner, repo, pr_number)] = transformed_data
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/github_api.py` around lines 536 - 674, The transform/write logic is
executed only once after building the batch comparison, so
pr_data/owner/repo/pr_number refer to the last item and only that PR is saved;
fix by moving the transform-and-store block to run per PR: iterate over
compare_keys (or over comparison_targets.items()), for each key retrieve pr_data
= comparison_targets[key] and compute author/base_repo/review
threads/reviews/rollup/etc, use behind_by = behind_by_map.get(key, 0) when
building transformed_data, then set all_results[(owner, repo, pr_number)] =
transformed_data for each PR; ensure references to reviewThreads pageInfo
warnings and review/reviewers processing use the per-PR variables and keep the
GraphQL-to-REST field mapping (transformed_data) intact.
♻️ Duplicate comments (1)
src/github_api.py (1)

950-968: ⚠️ Potential issue | 🟠 Major

Missing comparisons are still coerced to 0, and there is a dead duplicate loop.

This block still turns “no compare result” into behind_by = 0, which can overwrite valid stored values during repo refresh. Also, the first loop at Line 954-Line 958 has no effect and should be removed.

💡 Proposed fix
-        if 'errors' in result:
-            print(f"GraphQL comparison batch errors: {result['errors']}")
+        if 'errors' in result:
+            print(f"GraphQL comparison batch errors: {result['errors']}")

         repo_data = (result.get('data') or {}).get('repository') or {}
-        for i, pr_number in index_to_pr_number.items():
-             alias = f"cmp{i}"
-             ref_data = repo_data.get(alias, {})
-             compare_data = ref_data.get('compare') if ref_data else None
         for i, pr_number in index_to_pr_number.items():
             alias = f"cmp{i}"
             ref_data = repo_data.get(alias, {})
             compare_data = ref_data.get('compare') if ref_data else None
             if compare_data:
                 behind_by = compare_data.get('behindBy')
                 if behind_by is None:
                     behind_by = compare_data.get('aheadBy')
                 behind_by_map[pr_number] = int(behind_by or 0)
-            else:
-                behind_by_map[pr_number] = 0
+            # Intentionally do not coerce missing compare to 0.
+            # Caller can preserve existing value when key is absent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/github_api.py` around lines 950 - 968, Remove the duplicate no-op loop
and stop coercing missing compare results to 0: delete the first for-loop that
iterates index_to_pr_number (the dead loop) and modify the remaining loop that
inspects repo_data aliases (alias = f"cmp{i}", ref_data, compare_data) so that
you only set behind_by_map[pr_number] when compare_data exists and contains a
numeric behindBy/aheadBy; if compare_data is missing, do not modify
behind_by_map (leave any existing value intact) instead of assigning 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@public/index.html`:
- Line 2116: The refresh calculation uses the full allPrs list so the "Visible
PRs" action refreshes hidden items; change the logic to use the currently
displayed/filtered PR array (e.g., filteredPrs, visiblePrs, or the UI's
displayedRows variable) instead of allPrs in the refreshVisibleApiCalls
computation and any related code paths (the assignment using
refreshVisibleApiCalls and the other block around lines 3372-3377).
Specifically, replace usages of allPrs with the filtered/visible list the UI
renders, keep the same batching logic (Math.ceil(length / 50) * 2), and ensure
any other references in the same function/method that iterate or count PRs use
that visible/filtered array so the button only refreshes actually visible rows.

In `@src/handlers.py`:
- Around line 739-740: The handler currently lets request.json() exceptions
bubble up causing a 500; wrap the await request.json() call in a try/except that
catches JSON parsing errors (e.g., json.JSONDecodeError and ValueError or the
framework's BadRequest) and return a 400 client error with a clear message
instead of proceeding; update the code around data = (await
request.json()).to_py() and repo_param = (data.get('repo') or '').strip() to
only run when parsing succeeds so malformed JSON produces a 400 like other JSON
handlers.
- Around line 2070-2072: The top-level except block in the scheduled refresh
code currently only prints the error and swallows the exception; update the
handler (the except Exception as e in the scheduled refresh flow in
src/handlers.py) to log the full error via the app logger/monitoring client and
then re-raise the exception (or raise an application-specific error) so failures
propagate to the scheduler/alerting system; ensure you include contextual info
(type(e).__name__ and the message/traceback) when logging before re-raising.

---

Outside diff comments:
In `@src/github_api.py`:
- Around line 536-674: The transform/write logic is executed only once after
building the batch comparison, so pr_data/owner/repo/pr_number refer to the last
item and only that PR is saved; fix by moving the transform-and-store block to
run per PR: iterate over compare_keys (or over comparison_targets.items()), for
each key retrieve pr_data = comparison_targets[key] and compute
author/base_repo/review threads/reviews/rollup/etc, use behind_by =
behind_by_map.get(key, 0) when building transformed_data, then set
all_results[(owner, repo, pr_number)] = transformed_data for each PR; ensure
references to reviewThreads pageInfo warnings and review/reviewers processing
use the per-PR variables and keep the GraphQL-to-REST field mapping
(transformed_data) intact.

---

Duplicate comments:
In `@src/github_api.py`:
- Around line 950-968: Remove the duplicate no-op loop and stop coercing missing
compare results to 0: delete the first for-loop that iterates index_to_pr_number
(the dead loop) and modify the remaining loop that inspects repo_data aliases
(alias = f"cmp{i}", ref_data, compare_data) so that you only set
behind_by_map[pr_number] when compare_data exists and contains a numeric
behindBy/aheadBy; if compare_data is missing, do not modify behind_by_map (leave
any existing value intact) instead of assigning 0.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 3eb142d7-f429-4bbd-94f9-58658109393d

📥 Commits

Reviewing files that changed from the base of the PR and between ee5f693 and 0b6e8ed.

📒 Files selected for processing (3)
  • public/index.html
  • src/github_api.py
  • src/handlers.py

… rows; malformed JSON in /api/refresh-repo; top-level scheduled refresh failures are swallowed
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
public/index.html (1)

3387-3388: ⚠️ Potential issue | 🟡 Minor

Refresh flows should also update the rate-limit widget immediately.

Both bulk refresh actions consume API budget, but the rate-limit bar isn’t refreshed afterward, so the UI can show stale values.

Suggested patch
-                    await loadPrs(true);
+                    await loadPrs(true);
+                    await loadRateLimit(true);
@@
                     await loadRepos(true);
                     await loadPrs(true);
                     invalidateApiCache();
+                    await loadRateLimit(true);

Also applies to: 3434-3436

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@public/index.html` around lines 3387 - 3388, After calling loadPrs(true) in
the bulk refresh flow, also invoke the app's rate-limit refresh function so the
rate-limit widget updates immediately; find the function that updates the
rate-limit UI (e.g., refreshRateLimit, updateRateLimitDisplay, or loadRateLimit)
and call it (await if it returns a promise) after loadPrs(true) and in the
corresponding refresh flow at the other location (the block around lines
3434-3436), or put the call in a shared finally block so the widget is refreshed
on both success and error.
♻️ Duplicate comments (1)
src/handlers.py (1)

1988-2015: ⚠️ Potential issue | 🟠 Major

Prevent placeholder rows from persisting when phase-2 enrichment is missing.

At Line 1988+ new PRs are inserted with placeholder CI/review fields, but Line 2044 iterates only batch_results.items(). If the batch helper omits a requested key, that PR is silently skipped and placeholder data remains in prs.

💡 Proposed fix
-        removed_phase1 = 0
+        removed_phase1 = 0
+        discovered_phase1 = set()
@@
                         await upsert_pr(db, pr_url, owner, repo_name, item['number'], pr_data)
                         # Queue the new row for Phase 2 so it gets a full data update immediately
                         new_row_result = await db.prepare(
                             'SELECT id, pr_url, repo_owner, repo_name, pr_number FROM prs WHERE pr_url = ?'
                         ).bind(pr_url).first()
                         if new_row_result:
+                            discovered_phase1.add((owner, repo_name, item['number']))
                             confirmed_open_rows.append(
                                 new_row_result.to_py() if hasattr(new_row_result, 'to_py') else dict(new_row_result)
                             )
@@
-        for (owner, repo, pr_number), pr_data in batch_results.items():
-            if (owner, repo, pr_number) not in pr_lookup:
-                continue
-            pr_id, pr_url = pr_lookup[(owner, repo, pr_number)]
+        for owner, repo, pr_number in prs_to_fetch:
+            key = (owner, repo, pr_number)
+            pr_id, pr_url = pr_lookup[key]
+            pr_data = batch_results.get(key)
@@
             if not pr_data:
                 print(f"Scheduled refresh: failed to fetch {owner}/{repo}#{pr_number}")
+                if key in discovered_phase1:
+                    await db.prepare('DELETE FROM prs WHERE id = ?').bind(pr_id).run()
+                    print(f"Scheduled refresh: removed placeholder row for {owner}/{repo}#{pr_number}")
                 errors += 1
                 continue

Also applies to: 2044-2053

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers.py` around lines 1988 - 2015, New PR placeholder rows are being
left in the prs table when the Phase-2 batch enrichment (batch_results) doesn't
include that PR's key; modify the flow around upsert_pr/confirmed_open_rows to
avoid persistent placeholders by only inserting/committing the minimal
placeholder when you can guarantee Phase-2 will run, or by removing the
placeholder if Phase-2 data is absent: specifically, either (A) move the call to
upsert_pr until after you confirm batch_results contains the PR key in the
Phase-2 response (use the same PR identifier used as key in batch_results), or
(B) after creating the placeholder via upsert_pr check batch_results.items()
(the same loop around batch_results) and if the PR key is missing delete the
newly inserted row (use the db.prepare(...).bind(pr_url).first() result or the
returned upsert result to remove by id); adjust logic that builds
confirmed_open_rows and any usage of confirmed_open_rows to only include rows
that were successfully enriched or retained intentionally.
🧹 Nitpick comments (1)
public/index.html (1)

3418-3432: Consider bounded concurrency for repo refresh calls.

The sequential loop is safe, but it can become slow with many repos. A small concurrency cap (e.g., 3–5) would reduce wall-clock time without overloading the backend.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@public/index.html` around lines 3418 - 3432, The current sequential for-loop
over repoKeys that calls fetch('/api/refresh-repo') is slow for many repos;
change it to use a bounded concurrency/promise-pool (cap 3–5) so multiple
refreshes run in parallel without overwhelming the backend: implement a simple
worker pool or batch processing that pulls repoKey items, issues the same POST
fetch and error handling for each, and collects failures into the existing
failures array; reference the loop over repoKeys, the fetch('/api/refresh-repo')
call, and the failures.push(...) logic so you preserve identical
request/response/error behavior while adding the concurrency cap.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/handlers.py`:
- Around line 1916-1922: The description for the scheduled-refresh routine (the
"Phase 1 — Repository-level refresh" and "Phase 2 — PR-level refresh"
paragraphs) is misleading about call counts; update the wording to state that
Phase 1 issues REST requests in paginated batches per repository (so actual REST
requests may exceed “one per unique repo” depending on pagination) and Phase 2
issues GraphQL requests in chunked/batched calls (so actual GraphQL requests may
exceed “one per 50 PRs” depending on chunking/page size), making clear these are
per-page/per-chunk counts rather than hard upper bounds; edit the text in
src/handlers.py where the scheduled-refresh phases are documented to reflect
pagination/chunking behavior and that actual request counts depend on page/chunk
sizes and total items.

---

Outside diff comments:
In `@public/index.html`:
- Around line 3387-3388: After calling loadPrs(true) in the bulk refresh flow,
also invoke the app's rate-limit refresh function so the rate-limit widget
updates immediately; find the function that updates the rate-limit UI (e.g.,
refreshRateLimit, updateRateLimitDisplay, or loadRateLimit) and call it (await
if it returns a promise) after loadPrs(true) and in the corresponding refresh
flow at the other location (the block around lines 3434-3436), or put the call
in a shared finally block so the widget is refreshed on both success and error.

---

Duplicate comments:
In `@src/handlers.py`:
- Around line 1988-2015: New PR placeholder rows are being left in the prs table
when the Phase-2 batch enrichment (batch_results) doesn't include that PR's key;
modify the flow around upsert_pr/confirmed_open_rows to avoid persistent
placeholders by only inserting/committing the minimal placeholder when you can
guarantee Phase-2 will run, or by removing the placeholder if Phase-2 data is
absent: specifically, either (A) move the call to upsert_pr until after you
confirm batch_results contains the PR key in the Phase-2 response (use the same
PR identifier used as key in batch_results), or (B) after creating the
placeholder via upsert_pr check batch_results.items() (the same loop around
batch_results) and if the PR key is missing delete the newly inserted row (use
the db.prepare(...).bind(pr_url).first() result or the returned upsert result to
remove by id); adjust logic that builds confirmed_open_rows and any usage of
confirmed_open_rows to only include rows that were successfully enriched or
retained intentionally.

---

Nitpick comments:
In `@public/index.html`:
- Around line 3418-3432: The current sequential for-loop over repoKeys that
calls fetch('/api/refresh-repo') is slow for many repos; change it to use a
bounded concurrency/promise-pool (cap 3–5) so multiple refreshes run in parallel
without overwhelming the backend: implement a simple worker pool or batch
processing that pulls repoKey items, issues the same POST fetch and error
handling for each, and collects failures into the existing failures array;
reference the loop over repoKeys, the fetch('/api/refresh-repo') call, and the
failures.push(...) logic so you preserve identical request/response/error
behavior while adding the concurrency cap.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 1371f2a1-eb0d-4dd0-bcc6-ff22a05b2606

📥 Commits

Reviewing files that changed from the base of the PR and between 0b6e8ed and 0838750.

📒 Files selected for processing (2)
  • public/index.html
  • src/handlers.py

@swaparup36
Copy link
Copy Markdown
Contributor Author

@coderabbitai please review the new changes

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 16, 2026

@swaparup36 Sure! I'll kick off a full review of the new changes in PR #286 right away.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/github_api.py`:
- Around line 955-969: The first for-loop that iterates "for i, pr_number in
index_to_pr_number.items()" is duplicated and dead code; remove the earlier
duplicate (the block that defines alias, ref_data, compare_data but does not
assign into behind_by_map) so only the second loop that computes and assigns
behind_by_map[pr_number] remains; keep references to index_to_pr_number,
repo_data, alias, compare_data and behind_by_map intact and ensure only the loop
that sets behind_by_map based on compare_data (using behindBy or aheadBy
fallback) is left.

In `@src/handlers.py`:
- Around line 940-945: There is a duplicate/incomplete function stub for async
def handle_refresh_org that contains an unterminated docstring and causes
SyntaxError; remove the incomplete stub (the earlier async def
handle_refresh_org and its unterminated triple-quoted docstring) so only the
complete handle_refresh_org implementation remains, and run a quick lint/parse
to confirm a single valid async def handle_refresh_org function exists.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 3db6613f-b5f7-477d-b03c-42a98924eedd

📥 Commits

Reviewing files that changed from the base of the PR and between 0838750 and b3aeeb9.

📒 Files selected for processing (4)
  • public/index.html
  • src/github_api.py
  • src/handlers.py
  • src/index.py
✅ Files skipped from review due to trivial changes (1)
  • src/index.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • public/index.html

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR separates “repo-level” refresh work from “PR-level” refresh work to reduce unnecessary GitHub API usage while keeping behind_by and check/review status accurate, including a new /api/refresh-repo endpoint and corresponding UI controls.

Changes:

  • Adds a new repo refresh API (POST /api/refresh-repo) to sync open/closed PRs and recompute behind_by per repository.
  • Refactors the hourly scheduled refresh into a two-phase flow (repo scan/discovery + PR batch refresh).
  • Enhances batch GraphQL refresh to include accurate behind_by and CI check counts; updates the UI to refresh only visible PRs and to trigger repo refreshes.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/index.py Wires the new /api/refresh-repo route to the worker router.
src/handlers.py Implements handle_refresh_repo and updates scheduled refresh to a two-phase repo/PR flow.
src/github_api.py Extends batch GraphQL refresh with compare + status rollups; adds repo open-PR and comparison batch helpers.
public/index.html Updates refresh UX: “Refresh Visible PRs”, adds “Refresh Repo Data”, and adjusts API call estimate labels.
Comments suppressed due to low confidence (1)

src/github_api.py:676

  • fetch_multiple_prs_batch() currently builds comparison_targets for all PRs in the batch, but then the "Transform GraphQL response" block uses pr_data/owner/repo/pr_number variables from the previous loop and writes a single all_results[(owner, repo, pr_number)]. This means most PRs in the batch never get transformed/returned (and if comparison_query_parts is empty, none are). Refactor so transformation happens in a loop over each PR response (and is not gated on comparison_query_parts).
            # Extract PR data from response
            data = result.get('data', {})
            # Save comparison targets for a pass-2 GraphQL compare query
            comparison_targets = {}
            for i, (owner, repo, pr_number) in enumerate(batch):
                alias = f"pr{i}"
                repo_data = data.get(alias, {})
                pr_data = repo_data.get('pullRequest') if repo_data else None
                
                if not pr_data:
                    print(f"Warning: No PR data in GraphQL response for {owner}/{repo}#{pr_number}")
                    all_results[(owner, repo, pr_number)] = None
                    continue

                comparison_targets[(owner, repo, pr_number)] = {
                    'base_ref': pr_data.get('baseRefName'),
                    'head_oid': pr_data.get('headRefOid')
                }

            # Pass 2: Batch compare all PRs from this chunk to compute accurate behind_by
            comparison_query_parts = []
            compare_keys = list(comparison_targets.keys())
            for i, key in enumerate(compare_keys):
                owner, repo, pr_number = key
                compare_alias = f"cmp{i}"
                base_ref = _normalize_ref_name(comparison_targets[key].get('base_ref'))
                head_oid = comparison_targets[key].get('head_oid')

                if not base_ref or not head_oid:
                    continue

                comparison_query_parts.append(f"""
                    {compare_alias}: repository(owner: "{_escape_graphql_string(owner)}", name: "{_escape_graphql_string(repo)}") {{
                        baseRef: ref(qualifiedName: "{_escape_graphql_string(base_ref)}") {{
                            compare(headRef: "{_escape_graphql_string(head_oid)}") {{
                                aheadBy
                                behindBy
                            }}
                        }}
                    }}
                """)

            behind_by_map = {}
            if comparison_query_parts:
                comparison_query = "query { " + " ".join(comparison_query_parts) + " }"
                comparison_options = to_js({
                    "method": "POST",
                    "headers": headers,
                    "body": json.dumps({"query": comparison_query})
                }, dict_converter=Object.fromEntries)

                comparison_response = await fetch(graphql_url, comparison_options)
                if comparison_response.status == 200:
                    comparison_result = (await comparison_response.json()).to_py()
                    if 'errors' in comparison_result:
                        print(f"GraphQL compare errors: {comparison_result['errors']}")
                    else:
                        comparison_data = comparison_result.get('data', {})
                        for i, key in enumerate(compare_keys):
                            alias = f"cmp{i}"
                            repo_compare = comparison_data.get(alias, {})
                            ref_data = repo_compare.get('baseRef') if repo_compare else None
                            compare_data = ref_data.get('compare') if ref_data else None
                            if compare_data:
                                # Prefer behindBy. Fallback to aheadBy if behindBy is missing.
                                behind_by = compare_data.get('behindBy')
                                if behind_by is None:
                                    behind_by = compare_data.get('aheadBy')
                                behind_by_map[key] = int(behind_by or 0)
                            else:
                                behind_by_map[key] = 0
                else:
                    print(f"Warning: GraphQL compare query returned status {comparison_response.status}")
                    for key in compare_keys:
                        behind_by_map[key] = 0
                
                # Transform GraphQL response to match REST API format used by fetch_pr_data
                author = pr_data.get('author', {})
                base_repo = pr_data.get('baseRepository', {})
                
                # Count unresolved conversations
                review_threads = pr_data.get('reviewThreads', {}).get('nodes', [])
                open_conversations_count = sum(1 for thread in review_threads if not thread.get('isResolved', False))
                
                # Note: If there are more than 100 review threads, this count will be incomplete
                # The pageInfo would indicate hasNextPage=true in that case
                page_info = pr_data.get('reviewThreads', {}).get('pageInfo', {})
                if page_info.get('hasNextPage'):
                    print(f"Warning: PR {owner}/{repo}#{pr_number} has >100 review threads, count may be incomplete")
                
                # Process reviews to get latest state per reviewer
                from utils import calculate_review_status
                reviews_data = pr_data.get('reviews', {}).get('nodes', [])
                review_status = calculate_review_status(reviews_data)
                
                # Build per-reviewer data
                latest_reviews = {}
                if reviews_data:
                    valid_reviews = [r for r in reviews_data if r.get('submittedAt') and r.get('author')]
                    sorted_reviews = sorted(valid_reviews, key=lambda x: x.get('submittedAt', ''))
                    for review in sorted_reviews:
                        author_data = review['author']
                        latest_reviews[author_data['login']] = {
                            'login': author_data['login'],
                            'avatar_url': author_data.get('avatarUrl', ''),
                            'state': review['state']
                        }
                reviewers_list = list(latest_reviews.values())

                # Count CI check results from statusCheckRollup on the latest commit
                checks_passed = 0
                checks_failed = 0
                checks_skipped = 0
                last_commit_nodes = pr_data.get('commits', {}).get('nodes', [])
                if last_commit_nodes:
                    rollup = (last_commit_nodes[0].get('commit') or {}).get('statusCheckRollup') or {}
                    for ctx in rollup.get('contexts', {}).get('nodes', []):
                        typename = ctx.get('__typename', '')
                        if typename == 'CheckRun':
                            conclusion = (ctx.get('conclusion') or '').upper()
                            if conclusion == 'SUCCESS':
                                checks_passed += 1
                            elif conclusion in ('FAILURE', 'TIMED_OUT', 'CANCELLED'):
                                checks_failed += 1
                            elif conclusion in ('SKIPPED', 'NEUTRAL'):
                                checks_skipped += 1
                        elif typename == 'StatusContext':
                            state = (ctx.get('state') or '').upper()
                            if state == 'SUCCESS':
                                checks_passed += 1
                            elif state in ('FAILURE', 'ERROR'):
                                checks_failed += 1
                
                # Build the pr_data dict matching REST API format
                transformed_data = {
                    'title': pr_data.get('title', ''),
                    'state': pr_data.get('state', '').lower(),
                    'is_merged': 1 if pr_data.get('merged', False) else 0,
                    'mergeable_state': pr_data.get('mergeStateStatus', 'unknown'),
                    'files_changed': pr_data.get('changedFiles', 0),
                    'author_login': author.get('login', ''),
                    'author_avatar': author.get('avatarUrl', ''),
                    'repo_owner_avatar': base_repo.get('owner', {}).get('avatarUrl', ''),
                    'checks_passed': checks_passed,
                    'checks_failed': checks_failed,
                    'checks_skipped': checks_skipped,
                    'commits_count': pr_data.get('commits', {}).get('totalCount', 0),
                    'behind_by': behind_by_map.get((owner, repo, pr_number), 0),
                    'review_status': review_status,
                    'last_updated_at': pr_data.get('updatedAt', ''),
                    'is_draft': 1 if pr_data.get('isDraft', False) else 0,
                    'open_conversations_count': open_conversations_count,
                    'reviewers_json': json.dumps(reviewers_list),
                    'etag': None,  # GraphQL doesn't provide ETags
                    '_batch_fetch': True
                }
                
                all_results[(owner, repo, pr_number)] = transformed_data
                

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ion on handle_refresh_repo, estimate api call displayed as a lower-bound
@Nachiket-Roy
Copy link
Copy Markdown
Contributor

@coderabbitai check for the following issues in the PR:
Incomplete Data, Sequential Loops and Data Consistency

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 22, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the PR diff to examine the changes
gh pr diff 286 --repo OWASP-BLT/BLT-Leaf

Length of output: 48150


🏁 Script executed:

#!/bin/bash
# Read the key backend files
cat src/github_api.py
echo "===HANDLERS==="
cat src/handlers.py
echo "===INDEX==="
cat src/index.py

Length of output: 159075


@Nachiket-Roy Here is a detailed analysis of all three issue categories across the PR:


1. 🔴 Incomplete Data

Critical: Transform block misplaced inside if comparison_query_parts: — only last PR's data is saved

In fetch_multiple_prs_batch (src/github_api.py), the block that converts Pass 1 GraphQL data into transformed_data and writes to all_results is nested inside the if comparison_query_parts: block. This has two compounding problems:

Problem A — entire batch dropped when all PRs lack refs:
If every PR in a batch is missing base_ref or head_oid, comparison_query_parts stays empty, the if body is never entered, and no PR from that batch is stored in all_results even though Pass 1 succeeded.

Problem B — only the last PR is ever stored:
pr_data, owner, repo, and pr_number are set inside the for i, (owner, repo, pr_number) in enumerate(batch) loop, but the transform block runs after the loop ends, using whichever values the variables held at the last iteration. For a 50-PR batch, at most one result is written.

# ❌ Current (transform is inside `if comparison_query_parts:`, after the for-loop)
for i, (owner, repo, pr_number) in enumerate(batch):   # sets pr_data, owner, repo, pr_number
    ...
    comparison_targets[...] = { 'base_ref': ..., 'head_oid': ... }

if comparison_query_parts:          # skipped entirely if all PRs lack refs
    ...                             # Pass 2 compare query
    # Transform uses stale variables from last loop iteration only!
    author = pr_data.get('author', {})
    ...
    all_results[(owner, repo, pr_number)] = transformed_data   # only last PR stored

# ✅ Fix: move transform inside the for-loop, outside the if-block
for i, (owner, repo, pr_number) in enumerate(batch):
    ...
    # Transform here, per PR, using per-iteration pr_data
    all_results[(owner, repo, pr_number)] = {
        ...
        'behind_by': behind_by_map.get((owner, repo, pr_number), 0),
    }

statusCheckRollup contexts capped at 100 with no pagination

The Pass 1 query uses contexts(first: 100) but never reads pageInfo.hasNextPage. Repositories with more than 100 check contexts (e.g. large monorepos with many required checks) will silently under-count checks_passed/checks_failed/checks_skipped.

New PRs silently skipped in handle_refresh_repo when comparison fails

behind_by = behind_by_map.get(pr_number)
if behind_by is None:
    if pr_number in tracked_map:
        behind_by = tracked_map[pr_number]['behind_by']
    else:
        continue   # ← silently drops newly-open PRs whose comparison failed

A new PR that fetch_repo_comparison_batch could not compare (e.g. network error for that specific PR) is never added to discovered_candidates and is therefore never inserted.

Phase 1 of handle_scheduled_refresh inserts placeholder zeros

Newly discovered PRs in Phase 1 are inserted with checks_passed: 0, checks_failed: 0, behind_by: 0, commits_count: 0, and files_changed: 0. If Phase 2 subsequently fails for any of these PRs, those zeros persist indefinitely.


2. 🟡 Sequential Loops

handle_batch_refresh_prs: N sequential DB queries for N PR IDs

# ❌ Current: 1 DB round-trip per pr_id
for pr_id in pr_ids:
    result = await db.prepare(
        'SELECT ... FROM prs WHERE id = ?'
    ).bind(pr_id).first()

# ✅ Fix: single query with IN clause
placeholders = ','.join('?' * len(pr_ids))
results = await db.prepare(
    f'SELECT ... FROM prs WHERE id IN ({placeholders})'
).bind(*pr_ids).all()

For the allowed maximum of 100 PR IDs, this is 100 sequential DB calls that could be one.

handle_refresh_repo: sequential DELETE per closed PR

# ❌ Current
for pr_number in closed_numbers:
    await db.prepare('DELETE FROM prs WHERE id = ?').bind(row['id']).run()

# ✅ Fix: batch delete
ids_to_delete = [tracked_map[n]['id'] for n in closed_numbers]
placeholders = ','.join('?' * len(ids_to_delete))
await db.prepare(f'DELETE FROM prs WHERE id IN ({placeholders})').bind(*ids_to_delete).run()

handle_scheduled_refresh Phase 1: per-PR SELECT after each upsert

Inside the "discover new PRs" loop, a separate SELECT … WHERE pr_url = ? is issued per newly inserted row to retrieve the auto-assigned id. For repos with many new PRs this is N additional round-trips:

# ❌ Current (inside loop per new PR)
new_row_result = await db.prepare(
    'SELECT id, ... FROM prs WHERE pr_url = ?'
).bind(pr_url).first()

handle_refresh_repo: sequential fallback fetch_pr_data per discovered PR

for item in discovered_candidates:
    pr_data = batch_results.get(key)
    if not pr_data:
        pr_data = await fetch_pr_data(...)  # sequential REST call per missed PR

If fetch_multiple_prs_batch misses several PRs (e.g. due to the transform bug above), each fallback call is sequential, adding one full REST round-trip per PR.


3. 🔴 Data Consistency

aheadBy used as fallback for behindBy — semantically incorrect

This pattern appears in both fetch_multiple_prs_batch and fetch_repo_comparison_batch:

behind_by = compare_data.get('behindBy')
if behind_by is None:
    behind_by = compare_data.get('aheadBy')   # ← wrong direction

behindBy is how many commits the base has that the PR head does not. aheadBy is the opposite: how many commits the PR head has that the base does not. Substituting one for the other will report the PR as "N commits behind" when it is actually N commits ahead, causing the UI to show incorrect staleness indicators.

_normalize_ref_name unconditionally prepends refs/heads/

def _normalize_ref_name(ref_name):
    if ref_name.startswith('refs/'):
        return ref_name
    return f"refs/heads/{ref_name}"   # ← assumes branch, not tag or SHA

baseRefName in GitHub's GraphQL can be a tag name (e.g. v1.0) or a maintenance branch rooted under refs/tags/ or refs/heads/. Prepending refs/heads/ to a tag name produces a ref that does not exist; the GraphQL ref() query returns null, compare is null, and behind_by silently defaults to 0.

No database transaction wraps delete + insert in handle_refresh_repo

The handler deletes closed PRs and then inserts/updates open ones as separate, non-transactional operations. Concurrent requests (e.g. a UI "Refresh Repo Data" click arriving while a scheduled refresh is mid-flight) can observe a window where deleted rows are gone but new rows are not yet present, temporarily breaking the PR list for that repo.

fetch_pr_data (REST) vs GraphQL: different comparison directions

  • REST path (fetch_pr_data): compares {head}...{base} and reads ahead_by (i.e. how many commits base is ahead of head — which equals behind_by for the PR).
  • GraphQL batch path: queries ref(qualifiedName: "refs/heads/{base}").compare(headRef: "{headRefOid}") and reads behindBy.

These two approaches can diverge after a force-push: the REST path always uses the current live branch tip, while the GraphQL path uses the PR's headRefOid at the time the GraphQL query was made. A force-pushed branch will cause the two paths to report different behind_by values for the same PR depending on which refresh code path was last executed.

Partial GraphQL errors silently produce zeros in fetch_repo_comparison_batch

if 'errors' in result:
    print(f"GraphQL comparison batch errors: {result['errors']}")
# ← no early return; falls through to read partial data
repo_data = (result.get('data') or {}).get('repository') or {}

When GitHub returns a partial response (errors for some aliases, data for others), the code logs the errors but continues. PRs whose aliases errored will have ref_data = {}, compare_data = None, and will be assigned behind_by = 0 with no indication that the value is stale.


Summary table

Issue Location Severity
Transform block only runs for last PR in batch fetch_multiple_prs_batch 🔴 Critical
aheadBy fallback for behindBy Both batch compare helpers 🔴 High
Sequential DB queries per PR ID handle_batch_refresh_prs 🟡 Medium
Sequential DELETEs per closed PR handle_refresh_repo, Phase 1 🟡 Medium
_normalize_ref_name breaks tag-based PRs Both helpers 🟡 Medium
No transaction on delete + insert handle_refresh_repo 🟡 Medium
contexts(first: 100) without pagination fetch_multiple_prs_batch 🟡 Medium
REST vs GraphQL comparison divergence fetch_pr_data vs batch 🟡 Medium
Partial errors silently yield zeros fetch_repo_comparison_batch 🟡 Medium
New PR silently skipped on compare failure handle_refresh_repo 🟡 Medium
Per-PR SELECT after upsert in phase 1 handle_scheduled_refresh 🟢 Low

@swaparup36
Copy link
Copy Markdown
Contributor Author

Hi @Nachiket-Roy, I made the required changes. Can you re-review it please?

@Nachiket-Roy
Copy link
Copy Markdown
Contributor

@coderabbitai were all previously flagged issues resolved?

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 22, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me read the current state of the files on the PR branch to verify which issues have been addressed.


🏁 Script executed:

#!/bin/bash
cat src/github_api.py
echo "===HANDLERS==="
cat src/handlers.py

Length of output: 155396


@Nachiket-Roy Here is a detailed status of all 11 previously flagged issues after reviewing the current branch:


Issue Resolution Status

# Issue Location Status
1 Transform block only saves last PR per batch fetch_multiple_prs_batch ✅ Fixed
2 aheadBy used as fallback for behindBy Both batch compare helpers ✅ Fixed
3 Sequential DB queries per PR ID handle_batch_refresh_prs ✅ Fixed
4 Sequential DELETEs per closed PR handle_refresh_repo ✅ Fixed
5 _normalize_ref_name breaks tag-based PRs Both helpers ✅ Fixed
6 No transaction on delete + insert handle_refresh_repo ✅ Fixed
7 contexts(first: 100) without pagination fetch_multiple_prs_batch ✅ Fixed
8 REST vs GraphQL comparison divergence fetch_pr_data vs batch ⚠️ Still present
9 Partial GraphQL errors silently yield zeros fetch_repo_comparison_batch ✅ Fixed
10 New PR silently skipped on compare failure handle_refresh_repo ✅ Fixed
11 Per-PR SELECT after upsert in Phase 1 handle_scheduled_refresh ✅ Fixed (redesigned)

Details on resolved issues


Remaining concern — Issue #8: REST vs GraphQL comparison divergence

This is the one issue that has not been addressed. fetch_pr_data (REST) computes behind_by using the live branch comparison endpoint:

# REST path: live branch tip at query time
compare_url = f".../compare/{head_full_ref}...{base_branch}"
behind_by = compare_data.get('ahead_by') or 0

The GraphQL batch path uses the PR's headRefOid (a fixed commit SHA captured at query time):

# GraphQL path: headRefOid frozen at query time
ref(qualifiedName: "refs/heads/{base}").compare(headRef: "{headRefOid}") { behindBy }

After a force-push, these can report different values for the same PR depending on which refresh path was last used. Since the two paths are intentionally separate by design (REST for single-PR refresh, GraphQL for batch), this is acceptable to leave as a documented trade-off — but worth noting in a follow-up issue if behind_by accuracy after force-pushes becomes a user-reported problem.

@swaparup36
Copy link
Copy Markdown
Contributor Author

@coderabbitai is there anything to change on this PR?

@owasp-blt
Copy link
Copy Markdown

owasp-blt bot commented Apr 1, 2026

🎉 @swaparup36 You referred a new contributor! Total referrals: 1. Current rank: #3.

Monthly Referral Leaderboard (your neighborhood)

Rank Contributor Referrals
1 @DonnieBLT 1
2 @Jayant2908 1
3 @swaparup36 ← you 1

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check current PR review comments and status
gh pr view 286 --repo OWASP-BLT/BLT-Leaf --json reviews,reviewDecision,statusCheckRollupUrl,statusCheckRollup
echo "=== PR COMMENTS ==="
gh pr view 286 --repo OWASP-BLT/BLT-Leaf --json comments --jq '.comments[-5:] | .[] | {author: .author.login, body: .body[:200]}'
echo "=== REVIEW THREADS ==="
gh api repos/OWASP-BLT/BLT-Leaf/pulls/286/reviews --jq '.[] | {state: .state, user: .user.login, body: .body[:300]}'

Length of output: 4448


🏁 Script executed:

#!/bin/bash
# Get all open/unresolved review comments on this PR
gh api repos/OWASP-BLT/BLT-Leaf/pulls/286/comments --jq '.[] | {id: .id, path: .path, line: .line, body: .body, user: .user.login}' | head -200

echo "=== REVIEW COMMENT THREADS (resolved status) ==="
gh api graphql -f query='
{
  repository(owner: "OWASP-BLT", name: "BLT-Leaf") {
    pullRequest(number: 286) {
      reviewThreads(first: 50) {
        nodes {
          isResolved
          comments(first: 1) {
            nodes {
              author { login }
              body
              path
              line
            }
          }
        }
      }
    }
  }
}' --jq '.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false) | {path: .comments.nodes[0].path, line: .comments.nodes[0].line, author: .comments.nodes[0].author.login, body: .comments.nodes[0].body[:400]}'

Length of output: 43689


🏁 Script executed:

#!/bin/bash
# Check current state of token validation in handlers.py
echo "=== Token validation area ==="
grep -n "caller_token\|x-github-token\|user_token\|len.*caller\|Forbidden\|Unauthorized" src/handlers.py | head -30

echo "=== Exact lines around token validation ==="
grep -n "handle_refresh_repo" src/handlers.py

Length of output: 1861


🏁 Script executed:

#!/bin/bash
echo "=== handle_refresh_repo token validation (lines 760-800) ==="
sed -n '760,800p' src/handlers.py

echo ""
echo "=== Copilot flagged indentation issue around line 875 ==="
sed -n '860,900p' src/handlers.py

echo ""
echo "=== Check for duplicate loop in github_api.py ==="
sed -n '948,975p' src/github_api.py

Length of output: 5019


🏁 Script executed:

#!/bin/bash
echo "=== fetch_repo_comparison_batch in github_api.py ==="
grep -n "def fetch_repo_comparison_batch" src/github_api.py
# Check around the function for duplicate loop
grep -n "for i, pr_number in index_to_pr_number" src/github_api.py

echo ""
echo "=== Duplicate handle_refresh_org check ==="
grep -n "async def handle_refresh_org" src/handlers.py

echo ""
echo "=== Private repo restriction check in handle_refresh_repo ==="
sed -n '800,840p' src/handlers.py

echo ""
echo "=== isPrivate in fetch_repo_open_prs query ==="
grep -n "isPrivate\|is_private\|private" src/github_api.py | head -20

Length of output: 2691


@swaparup36 The PR is in excellent shape — almost everything has been addressed. Here is the full status:


✅ All CodeRabbit inline comments — resolved

All review threads I raised (transform block bug, inverted token validation, duplicate loop, duplicate handle_refresh_org stub, aheadBy/behindBy swap, sequential DB queries, transaction wrapping, _normalize_ref_name, check context pagination, partial-errors handling, etc.) are confirmed fixed and all threads are marked resolved.

✅ All Copilot structural comments — resolved

  • Indentation inside the fallback dict (discovered_candidates.append(...)) — correctly nested now.
  • Missing private-repo restriction in handle_refresh_repo — the _is_caller_scoped_token + repo_is_private check is in place.

One minor open item (Copilot, cosmetic)

The UI label for "Refresh Repo Data" shows ${uniqueRepoCount * 2} API calls, but the backend can require more due to GraphQL pagination. The Copilot suggestion was to display it as a lower bound:

- const repoRefreshApiCalls = uniqueRepoCount * 2;
+ // Lower-bound: at least 2 API calls per repo; pagination/chunking may require more.
+ const repoRefreshApiCalls = `${uniqueRepoCount * 2}+`;

This is minor and won't block the PR, but it avoids misleading users about the exact call count.


Pre-existing trade-off (no action needed)

The REST path (fetch_pr_data) vs GraphQL batch path can diverge on behind_by after a force-push because REST uses the live branch tip while GraphQL uses the headRefOid frozen at query time. This is an accepted design trade-off, not a bug introduced by this PR.

Overall, this PR is ready to merge once you decide on the cosmetic label item above.

@swaparup36
Copy link
Copy Markdown
Contributor Author

uniqueRepoCount 

it's already - const repoRefreshApiCalls = ${uniqueRepoCount * 2}+;

@swaparup36
Copy link
Copy Markdown
Contributor Author

@Nachiket-Roy Can you please approve it if it looks good to you?

@owasp-blt owasp-blt bot added has-peer-review PR has received peer review unresolved-conversations: 0 and removed needs-peer-review PR needs peer review unresolved-conversations: 0 labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

the api calls are still very inefficient lets separate out refreshes that will refresh on each PR and refreshes that will refresh the whole repository

4 participants