fix: implement robust PR URL validation and sanitization#262
fix: implement robust PR URL validation and sanitization#262ojaswa072 wants to merge 9 commits intoOWASP-BLT:mainfrom
Conversation
🍃 PR Readiness CheckCheck the readiness of this PR on Leaf: Leaf reviews pull requests for operational readiness, security risks, and production-impacting changes before they ship. |
|
👋 Hi @ojaswa072! This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:
Once a valid peer review is submitted, this check will pass automatically. Thank you!
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds canonical GitHub URL parsing/validation in frontend and server, sends canonical PR URLs from the UI, migrates DB to composite identity (repo_owner, repo_name, pr_number) with canonical_url persistence and deduplication, updates upsert/lookup logic, and adds a URL-validation test script and migration. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Browser
participant FrontendJS as Frontend JS
participant Backend as /add_pr Handler
participant ServerUtils as Server URL Parser
participant DB as Database
User->>Browser: enter URL, click Add
Browser->>FrontendJS: submit URL
FrontendJS->>FrontendJS: parseGitHubTrackingUrl(rawUrl)
alt client validation fails
FrontendJS-->>Browser: setUrlError(message)
else client validation succeeds
FrontendJS->>Backend: POST /add_pr { pr_url: canonicalUrl, add_all }
Backend->>ServerUtils: parse_pr_url / parse_repo_url as needed
alt server validation fails
Backend-->>FrontendJS: error response
else server validation succeeds
Backend->>DB: upsert_pr(repo_owner, repo_name, pr_number, canonical_url, pr_url, ...)
DB-->>Backend: upsert result
Backend-->>FrontendJS: success { pr_url: canonicalUrl, details }
FrontendJS-->>Browser: highlight newly added PR (by identity)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
src/utils.py (1)
16-22: Remove instructional migration comments from production code.The numbered “place this regex / replace old function” comments read like patch notes and can be dropped to keep source focused on behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils.py` around lines 16 - 22, Remove the instructional migration comments and keep only the actual declarations: delete the two numbered comment lines that say "1. Place this regex..." and "2. Replace the old parse_pr_url..." so the module contains just the compiled regex _GITHUB_PR_RE = re.compile(...) at top-level and the parse_pr_url(pr_url: str) function definition below it; ensure no leftover patch-note style text remains in src/utils.py and run a quick linter/formatter to confirm only functional comments (if any) remain.src/handlers.py (1)
246-247: Use one consistent error response shape inhandle_add_pr.This branch returns
{"ok": false, ...}while other branches in the same handler return{"error": ...}or{"success": ...}. A single envelope avoids extra client-side branching.♻️ Suggested consistency fix
- json.dumps({'ok': False, 'error': str(e)}), + json.dumps({'success': False, 'error': str(e)}),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers.py` around lines 246 - 247, In handle_add_pr, make the error response shape consistent with the other branches by replacing the envelope {"ok": False, "error": str(e)} with the same shape used elsewhere (e.g., {"error": str(e)}); update the json.dumps call that constructs the response and ensure the status/headers remain unchanged so clients always receive the same error envelope from handle_add_pr.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@public/index.html`:
- Around line 653-660: The current validation uses PR_PATH_RE and rejects any
url.pathname that doesn't match /owner/repo/pull/NUMBER, which incorrectly
blocks repository/org import URLs used by the bulk-import checkbox flow; update
the validation so it accepts the other valid patterns (e.g. /owner/repo and
/orgs/owner or /owner) instead of returning invalid for all non-pull paths:
either add additional regexes (e.g. REPO_PATH_RE and ORG_PATH_RE) and test
url.pathname against them before returning the invalid object, or broaden
PR_PATH_RE to recognize the repo/org forms, and keep the existing return {
valid: false, reason: ... } only when none of the allowed patterns match so
repo/org import flows are not rejected.
- Around line 672-686: Move the GitHub URL validation into the existing addPr()
flow (instead of adding a separate click handler) so the early return actually
prevents submission from all entry points including the Enter key; inside
addPr() read prUrl and addAll state, compute effectiveAddAll (consider org/repo
URLs as "addAll"), and if not effectiveAddAll call parseGitHubPRUrl(prUrl) and
on invalid use setUrlError(result.reason) (or set the 'url-error' element) and
input.focus() then return; clear the error before continuing and remove the
extra document.getElementById('addPrBtn').addEventListener('click', ...) handler
and any direct checkPR(raw) calls so submission only proceeds from addPr() after
validation.
- Line 571: The span with id "url-error" is hard-coded to display: none so the
new handler that only updates textContent never makes errors visible; change the
UI logic to not permanently hide that element and instead toggle its visibility
when setting/clearing the error (e.g., remove the inline display: none, or set
element.style.display = ''/'block' or toggle a CSS class from the URL validation
handler that sets textContent). Ensure the code that clears the message also
hides the element again so the span only appears when the validation handler
populates url-error.
In `@src/utils.py`:
- Around line 17-19: The regex _GITHUB_PR_RE currently requires the optional
suffix to start with '/' so query or fragment suffixes like '?diff=split' or
'#discussion...' are rejected; update the pattern for _GITHUB_PR_RE (and the
other similar regex at lines referenced) to allow optional suffixes that begin
with '/', '?' or '#' (e.g., replace the '(/.*)?$' portion with a non-capturing
alternative that accepts '/', '?' or '#' such as '(?:[/?#].*)?$'), keeping the
existing capture groups for owner, repo, and PR number unchanged.
---
Nitpick comments:
In `@src/handlers.py`:
- Around line 246-247: In handle_add_pr, make the error response shape
consistent with the other branches by replacing the envelope {"ok": False,
"error": str(e)} with the same shape used elsewhere (e.g., {"error": str(e)});
update the json.dumps call that constructs the response and ensure the
status/headers remain unchanged so clients always receive the same error
envelope from handle_add_pr.
In `@src/utils.py`:
- Around line 16-22: Remove the instructional migration comments and keep only
the actual declarations: delete the two numbered comment lines that say "1.
Place this regex..." and "2. Replace the old parse_pr_url..." so the module
contains just the compiled regex _GITHUB_PR_RE = re.compile(...) at top-level
and the parse_pr_url(pr_url: str) function definition below it; ensure no
leftover patch-note style text remains in src/utils.py and run a quick
linter/formatter to confirm only functional comments (if any) remain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: fdb1b398-e936-4b7b-9a65-12abf738853c
📒 Files selected for processing (3)
public/index.htmlsrc/handlers.pysrc/utils.py
|
@coderabbitai isn't this done in other any prs before |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 636 🏁 Script executed: Length of output: 3682
The git history for So these two PRs are complementary, not duplicate: |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
public/index.html (1)
672-686:⚠️ Potential issue | 🔴 CriticalRemove redundant click handler that calls undefined
checkPRfunction.The button has duplicate click handlers: one at lines 673–686 (reviewed snippet) and another at line 3518. The first handler calls undefined
checkPR(raw)on line 685, which will throw a ReferenceError at runtime. Additionally,parseGitHubPRUrlis only used within this redundant handler and is not called elsewhere. The correct handler at line 3518 usesaddPr()which properly validates viaparseGitHubTrackingUrland handles all submission logic. Remove lines 673–686 entirely to fix the duplicate execution and undefined function call.🐛 Proposed fix: remove redundant click handler
-// Attach to button -document.getElementById('addPrBtn').addEventListener('click', () => { - const raw = document.getElementById('prUrlInput').value; - const errEl = document.getElementById('url-error'); - errEl.textContent = ''; - - const result = parseGitHubPRUrl(raw); - if (!result.valid) { - errEl.textContent = result.reason; // inline error, no alert() - document.getElementById('prUrlInput').focus(); - return; - } - - checkPR(raw); // passes only after validation -});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@public/index.html` around lines 672 - 686, The snippet registers a redundant click handler on addPrBtn that uses parseGitHubPRUrl and calls an undefined checkPR(raw), causing a runtime ReferenceError and duplicate behavior; delete the entire handler that references parseGitHubPRUrl, checkPR, prUrlInput and url-error (the block attaching the click listener to getElementById('addPrBtn') shown in the diff) so only the correct handler that calls addPr() remains (the handler around addPrBtn at line ~3518), ensuring validation and submission flow use parseGitHubTrackingUrl/addPr.
🧹 Nitpick comments (1)
public/index.html (1)
641-670:parseGitHubPRUrlis now redundant and causes duplicate definitions.This function and its local
PR_PATH_REconstant (line 654) appear to be superseded byparseGitHubTrackingUrlat lines 1346-1431. Having two validation functions with different regex patterns creates:
- Inconsistent validation (line 654 allows
/pull/123/files, line 1323 does not; line 1323 is case-insensitive, line 654 is not)- Test extraction failures (test finds the wrong
const PR_PATH_RE =occurrence)- Dead code since the caller
checkPRdoesn't existConsider removing
parseGitHubPRUrlentirely sinceaddPr()now usesparseGitHubTrackingUrl.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@public/index.html` around lines 641 - 670, Remove the redundant parseGitHubPRUrl function and its local PR_PATH_RE constant since addPr() and other logic now use parseGitHubTrackingUrl; delete the entire parseGitHubPRUrl block (including the PR_PATH_RE regex) and remove any leftover references (e.g., the defunct checkPR) so there are no duplicate URL-parsing definitions or conflicting regexes, and ensure tests/reference points target parseGitHubTrackingUrl instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/handlers.py`:
- Around line 270-278: You are writing canonical_url into the DB but still using
pr_url as the unique lookup, which can create duplicate rows; instead backfill
existing rows or change dedupe to the stable composite key and keep
canonical_url as metadata. Update upsert_pr to perform uniqueness/upsert by
(owner/repo/pr_number) rather than pr_url, add a migration/backfill that
populates canonical_url for existing rows, and change lookups like the
db.prepare('SELECT * FROM prs WHERE pr_url = ?') call and any other exact pr_url
queries to use the composite key (parsed['owner'], parsed['repo'],
parsed['pr_number']) or a normalized URL function; leave canonical_pr_url stored
but do not rely on it as the primary dedupe key until after migration.
In `@src/utils.py`:
- Around line 14-25: The owner/repo capture classes in _GITHUB_PR_RE and
_GITHUB_REPO_RE are too permissive and can include query/hash/space chars;
update those two patterns to use [^/?#\s]+ instead of [^/]+ so owner and repo
segments stop matching ?, # or whitespace. Do not change _GITHUB_ORG_RE to the
broad [^/?#\s]+—leave its restricted class (or explicitly keep/replace with the
safe [A-Za-z0-9_.-]+ form) so organization names remain limited to valid
characters.
In `@test-url-validation.js`:
- Around line 26-30: The extractBlock call is matching the first occurrence of
"const PR_PATH_RE =" (inside parseGitHubPRUrl) and extracting partial function
bodies which lead to top-level return errors; update the extraction to use a
unique, module-scope start marker (for example the declaration that includes the
/i flag) so extractBlock(html, '<unique module-scope marker for PR_PATH_RE with
/i>', 'function showSectionMessage(') targets the module-level PR_PATH_RE
instead of the function-local one; locate and modify the extractBlock invocation
and its start marker in test-url-validation.js (symbols: extractBlock,
PR_PATH_RE, parseGitHubPRUrl, showSectionMessage) so the extraction begins at
the module-scope regex declaration.
---
Duplicate comments:
In `@public/index.html`:
- Around line 672-686: The snippet registers a redundant click handler on
addPrBtn that uses parseGitHubPRUrl and calls an undefined checkPR(raw), causing
a runtime ReferenceError and duplicate behavior; delete the entire handler that
references parseGitHubPRUrl, checkPR, prUrlInput and url-error (the block
attaching the click listener to getElementById('addPrBtn') shown in the diff) so
only the correct handler that calls addPr() remains (the handler around addPrBtn
at line ~3518), ensuring validation and submission flow use
parseGitHubTrackingUrl/addPr.
---
Nitpick comments:
In `@public/index.html`:
- Around line 641-670: Remove the redundant parseGitHubPRUrl function and its
local PR_PATH_RE constant since addPr() and other logic now use
parseGitHubTrackingUrl; delete the entire parseGitHubPRUrl block (including the
PR_PATH_RE regex) and remove any leftover references (e.g., the defunct checkPR)
so there are no duplicate URL-parsing definitions or conflicting regexes, and
ensure tests/reference points target parseGitHubTrackingUrl instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 4d58b63e-67b5-4be5-b19a-bee2e76301bc
📒 Files selected for processing (5)
package.jsonpublic/index.htmlsrc/handlers.pysrc/utils.pytest-url-validation.js
✅ Files skipped from review due to trivial changes (1)
- package.json
… Windows test fixes
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@migrations/0004_add_canonical_url_and_composite_uniqueness.sql`:
- Around line 15-20: The DELETE currently keeps the row with MAX(id) which can
remove the freshest metadata; instead, change the logic to retain the row per
(repo_owner, repo_name, pr_number) chosen by latest readiness_computed_at (use
NULLS LAST), then latest updated_at as a secondary key, and only use id as the
final tie-breaker — implement this by computing a ROW_NUMBER() over (PARTITION
BY repo_owner, repo_name, pr_number ORDER BY readiness_computed_at DESC NULLS
LAST, updated_at DESC NULLS LAST, id DESC) and delete rows where row_number > 1
so the freshest row (by readiness_computed_at/updated_at, id tie-breaker)
survives.
- Around line 15-23: The deduplication DELETE and unique index in
migrations/0004_add_canonical_url_and_composite_uniqueness.sql are
case-sensitive because repo_owner and repo_name lack COLLATE NOCASE, so
mixed-case variants slip through; fix by either updating the schema to enforce
case-insensitive identity (add COLLATE NOCASE to repo_owner and repo_name
columns and recreate the GROUP BY and idx_prs_identity_unique to use NOCASE) or
normalize values to lowercase before persisting (in parse_pr_url() and/or
upsert_pr() ensure repo_owner and repo_name are normalized to lowercase), then
re-run the dedupe SQL using LOWER(...) or COLLATE NOCASE to consolidate existing
duplicates.
In `@public/index.html`:
- Around line 1274-1279: The repo-path branch currently special-cases the second
path segment and misclassifies GitHub pages; update the validation to check only
the first segment against RESERVED_OWNER_NAMES (the RESERVED_OWNER_NAMES Set)
and treat any non-empty second segment as a valid repo name, i.e. accept
two-segment paths when the first segment is not in RESERVED_OWNER_NAMES rather
than blacklisting specific second-segment values—modify the repo-path branch
logic to use RESERVED_OWNER_NAMES.has(firstSegment) and ensure secondSegment is
present/non-empty.
- Line 571: Make the `#url-error` element a proper live region and associate it
with the input so screen-readers announce validation failures: give `#url-error`
an aria-live (e.g., "polite") and role (e.g., "status"), ensure setUrlError()
removes the "hidden" class and sets
`#prUrlInput.setAttribute`('aria-invalid','true') and
`#prUrlInput.setAttribute`('aria-describedby','url-error') when showing an error,
and when clearing the error hide `#url-error` and reset `#prUrlInput` by removing
aria-describedby and setting aria-invalid to "false" (apply the same changes to
the other error block at lines ~1281-1290).
- Around line 3298-3300: The code canonicalizes the input into canonicalInputUrl
but later comparisons still use the raw query-string URL, causing legacy ?pr=
links to miss matches; update the duplicate-check and highlighted-row compare
logic to use parsedUrl.canonicalUrl (canonicalInputUrl) instead of the raw
query-string URL variable so tracked rows are compared against the canonical
form (apply this change to the compare/duplicate-check locations referenced
around the existing comparison blocks at ~3353-3354 and ~3807-3815); ensure any
assignment, lookup, or equality check that currently uses the raw input URL now
uses canonicalInputUrl.
- Around line 1295-1297: The validation message for empty input is misleading —
update the return reason when value is falsy in the validation block that
currently returns { valid: false, reason: 'Please enter a PR URL' } so it
reflects accepted formats (PR, repository, or organization URL); change the
reason to a more generic prompt like "Please enter a PR, repository, or
organization URL" in that same validation branch where value is checked.
In `@src/utils.py`:
- Around line 63-65: The inline guard in parse_repo_url currently uses a
one-line if with a return, causing Ruff E701; change the guard to a multi-line
if statement by splitting it into two lines (use "if not url:" on its own line
followed by an indented "return None") inside the parse_repo_url function so the
single-line multiple-statement form is removed.
- Around line 69-72: The extra special-case guard that checks repo.lower() ==
'pull' should be removed because _GITHUB_REPO_RE already excludes PR-style URLs
and this guard wrongly rejects real repos named "pull"; modify the function that
extracts owner and repo (the code using match.group(1) and match.group(2) and
referencing repo) to simply return the owner and repo without the if
repo.lower() == 'pull' early return, ensuring _GITHUB_REPO_RE remains the
validation point.
In `@test-data-display.js`:
- Around line 715-719: The current logic treats SKIP_RUNTIME_AUTH_TEST as a
pass; update the branch that checks String(error.message ||
'').includes('SKIP_RUNTIME_AUTH_TEST:') so it does NOT call testResult with a
true pass flag — instead report it as skipped (e.g., call testResult with a
distinct skipped status or false plus a skipped marker in the message) so the
runtime auth setup is not counted as passed; change the call to testResult in
that branch to use a skipped status/message (retain `Skipped: ${error.message}`)
and ensure the rest of the code that aggregates results recognizes the skipped
status.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: e6a1a3a9-4007-4073-8cd6-b25470905f4e
📒 Files selected for processing (7)
migrations/0004_add_canonical_url_and_composite_uniqueness.sqlpublic/index.htmlsrc/database.pysrc/handlers.pysrc/utils.pytest-data-display.jstest-url-validation.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test-url-validation.js
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/handlers.py (1)
277-278:⚠️ Potential issue | 🟠 MajorUse NOCASE for the identity lookups too.
The new identity key is case-insensitive, but these reads are still plain
=comparisons. If a PR is stored once asOWASP-BLT/BLT-Leaf#262and later comes in asowasp-blt/blt-leaf#262, the write path can still treat it as the same PR while these follow-up queries miss the tracked row, which breaks auto-analysis and webhook refresh handling.🛠️ Apply the same predicate to every identity lookup
- 'SELECT * FROM prs WHERE repo_owner = ? AND repo_name = ? AND pr_number = ?' + 'SELECT * FROM prs WHERE repo_owner COLLATE NOCASE = ? AND repo_name COLLATE NOCASE = ? AND pr_number = ?'Also applies to: 1122-1123, 1148-1149, 1307-1308
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers.py` around lines 277 - 278, The SELECT identity lookups use case-sensitive '=' comparisons; change the WHERE clauses to perform case-insensitive matching by adding COLLATE NOCASE (e.g. update the query string 'SELECT * FROM prs WHERE repo_owner = ? AND repo_name = ? AND pr_number = ?' to use 'repo_owner = ? COLLATE NOCASE AND repo_name = ? COLLATE NOCASE AND pr_number = ? COLLATE NOCASE') and keep the same bind parameters (parsed['owner'], parsed['repo'], parsed['pr_number']); apply the same change to the other occurrences referenced (the similar queries around the other locations) so every identity lookup uses NOCASE.test-data-display.js (1)
103-110:⚠️ Potential issue | 🟠 Major
child.killeddoes not mean the child has exited.In Node,
child.killedflips totrueoncekill()successfully sendsSIGTERM; it does not mean the process is gone. That makes theSIGKILLfallback unreachable on Unix, so a stuckwranglercan still leak and make later runs flaky.🛠️ Minimal fix
await sleep(800); - if (!child.killed) { + if (child.exitCode === null && child.signalCode === null) { try { child.kill('SIGKILL'); } catch (_) { // Ignore cleanup failures while shutting down test runtime processes. }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-data-display.js` around lines 103 - 110, The current logic assumes child.killed means the process exited, but it only indicates a kill signal was sent; instead, send a graceful signal (SIGTERM), then wait for the child to actually exit by awaiting an event or checking exitCode within a short timeout (e.g., listen for the child's 'exit' or 'close' event or poll child.exitCode for up to ~800ms), and only if the process hasn't exited after that timeout send SIGKILL (preserving the existing try/catch around kill calls). Refer to the child variable and the current sleep(800)/kill('SIGKILL') logic and replace it with: send SIGTERM, await exit with a Promise/timeout, then send SIGKILL as the fallback.
🧹 Nitpick comments (1)
public/index.html (1)
1281-1300: Error state management looks good.The function properly handles both displaying and clearing validation errors with appropriate ARIA attributes.
Minor nit: Line 1296 sets
aria-invalidto"false"rather than removing the attribute. While valid, removing the attribute when the input is valid is slightly cleaner sincearia-invalid="false"is the implicit default.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@public/index.html` around lines 1281 - 1300, In setUrlError, when clearing the error (message falsy) remove the aria-invalid attribute from the input instead of setting it to "false": keep the existing behavior of setting aria-invalid="true" and aria-describedby when message is truthy, but in the else branch call input.removeAttribute('aria-invalid') (and continue removing aria-describedby) so the attribute is absent when the input is valid; reference function name setUrlError and elements with ids 'url-error' and 'prUrlInput'.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@public/index.html`:
- Around line 3319-3329: The first validation uses an unreachable condition
because effectiveAddAll is computed as addAll || parsedUrl.type !== 'pr';
replace that logic by removing effectiveAddAll and directly checking the
intended cases: if (!addAll && parsedUrl.type !== 'pr') { setUrlError('Use a
pull request URL when bulk import is not enabled'); input.focus(); return; } and
keep the existing addAll && parsedUrl.type === 'pr' branch as-is; update
references to effectiveAddAll (if any) and ensure you use addAll and
parsedUrl.type in the conditions and call setUrlError and input.focus in both
branches.
In `@src/utils.py`:
- Around line 18-25: The repo vs org URL regexes are overlapping so
parse_repo_url() accepts "/orgs/<owner>" as a repo; update the patterns used by
_GITHUB_REPO_RE and _GITHUB_ORG_RE (and the other duplicated regex blocks around
the other ranges) so that repo regex does not match paths that begin with
"orgs/" and org regex accepts both bare owner URLs and "/orgs/<owner>" forms;
then ensure parse_repo_url() and parse_org_url() use the corrected
_GITHUB_REPO_RE and _GITHUB_ORG_RE (and adjust handle_add_pr() routing if
needed) so organization bulk URLs are routed to parse_org_url() rather than to
the repo path.
---
Duplicate comments:
In `@src/handlers.py`:
- Around line 277-278: The SELECT identity lookups use case-sensitive '='
comparisons; change the WHERE clauses to perform case-insensitive matching by
adding COLLATE NOCASE (e.g. update the query string 'SELECT * FROM prs WHERE
repo_owner = ? AND repo_name = ? AND pr_number = ?' to use 'repo_owner = ?
COLLATE NOCASE AND repo_name = ? COLLATE NOCASE AND pr_number = ? COLLATE
NOCASE') and keep the same bind parameters (parsed['owner'], parsed['repo'],
parsed['pr_number']); apply the same change to the other occurrences referenced
(the similar queries around the other locations) so every identity lookup uses
NOCASE.
In `@test-data-display.js`:
- Around line 103-110: The current logic assumes child.killed means the process
exited, but it only indicates a kill signal was sent; instead, send a graceful
signal (SIGTERM), then wait for the child to actually exit by awaiting an event
or checking exitCode within a short timeout (e.g., listen for the child's 'exit'
or 'close' event or poll child.exitCode for up to ~800ms), and only if the
process hasn't exited after that timeout send SIGKILL (preserving the existing
try/catch around kill calls). Refer to the child variable and the current
sleep(800)/kill('SIGKILL') logic and replace it with: send SIGTERM, await exit
with a Promise/timeout, then send SIGKILL as the fallback.
---
Nitpick comments:
In `@public/index.html`:
- Around line 1281-1300: In setUrlError, when clearing the error (message falsy)
remove the aria-invalid attribute from the input instead of setting it to
"false": keep the existing behavior of setting aria-invalid="true" and
aria-describedby when message is truthy, but in the else branch call
input.removeAttribute('aria-invalid') (and continue removing aria-describedby)
so the attribute is absent when the input is valid; reference function name
setUrlError and elements with ids 'url-error' and 'prUrlInput'.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: d9e1cc32-5468-4232-aaeb-107299418ddc
📒 Files selected for processing (8)
migrations/0004_add_canonical_url_and_composite_uniqueness.sqlpackage.jsonpublic/index.htmlsrc/database.pysrc/handlers.pysrc/utils.pytest-data-display.jstest-url-validation.js
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 5243
CI checks: ✅ All passing. Review status: ❌ Changes Requested (most recent review: 2026-03-25T17:14:31Z). There are now 18 unresolved inline comments — the re-review after the author's "all fixed" claim actually surfaced additional issues. Summary:
The author should work through all open inline comments on the Files changed tab before requesting another review. |
…ing, and process cleanup
|
@coderabbitai What are the blocking issues in the PR |
|
🎉 @ojaswa072 You referred a new contributor! Total referrals: 1. Current rank: #3. Monthly Referral Leaderboard (your neighborhood)
|
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 55951
✅ Already Resolved (14/18 comments addressed)Great work — the vast majority of issues have been fixed across commits 🔓 Still Open (4 comments)🟠 Major —
|
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | 🟠 Major | public/index.html L686 |
Validation bypassed via keyboard (Enter key) |
| 2 | 🟠 Major | src/handlers.py L278 |
Duplicate PR rows due to wrong upsert conflict key |
| 3 | 🟠 Major | src/utils.py L25 |
/orgs/<owner> bulk URLs misrouted to repo parser |
| 4 | 🟡 Minor | public/index.html L3329 |
Unreachable dead code in validation logic |
Address these 4 items and the PR should be ready to merge. The 3 Major ones are the true blockers.
…e canonical DB upserts
|
Resolved |
|
@coderabbitai Please assess the PR's completeness and merge readiness. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 6816
🔍 PR Status Summary
|
| Severity | File | Line | Issue |
|---|---|---|---|
| 🔴 Critical | test-url-validation.js |
L30 | Test sandbox correctness — duplicate PR_PATH_RE definitions may cause the sandbox to test the wrong regex |
| 🟠 Major | public/index.html |
L1279 | parseGitHubTrackingUrl misclassifies GitHub Pages/reserved paths (e.g. /settings/profile) as repo URLs |
| 🟠 Major | test-data-display.js |
L163 | Windows wrapper spawn issue in process handling |
| 🟠 Major | src/utils.py |
L25 | /orgs/<owner> bulk URLs still misrouted to parse_repo_url() as owner=orgs, repo=<owner> |
| 🟡 Minor | src/utils.py |
L74 | Dead repo == "pull" special-case guard |
| 🟡 Minor | public/index.html |
L1330 | Empty-input error copy still says "Please enter a PR URL" despite repo/org URLs now being valid |
| 🟡 Minor | public/index.html |
L3334 | Legacy ?pr= flow doesn't canonicalize URL before comparing tracked rows |
| 🟡 Minor | test-data-display.js |
L787 | Skipped runtime auth test still recorded as a passed test |
The 10 comments at line: null are on outdated diff positions, suggesting those areas of code were modified — those are likely addressed.
📋 Recommendation
The PR can technically be merged (GitHub shows CLEAN + APPROVED), but before doing so I'd recommend:
- Verify the Critical issue at
test-url-validation.jsL30 — a broken test sandbox means the URL validation tests may pass even if the logic is wrong. - Verify the
src/utils.pyL25 org URL misrouting — this is a functional bug where bulk-import of org URLs would silently be treated as repo imports. - Confirm
public/index.htmlL1279 — reserved GitHub paths being treated as valid repo URLs could mislead users.
The author's last reply was a one-word "Resolved" with no accompanying commit reference. If those 4 issues were genuinely fixed in prior commits, the maintainer can proceed to merge. If not, the author should address them first.
SatishKumar620
left a comment
There was a problem hiding this comment.
Resolve the flag issues by CR
|
@SatishKumar620 can you please state what exactly to resolve |
|
🎉 @ojaswa072 You referred a new contributor! Total referrals: 2. Current rank: #1. Monthly Referral Leaderboard (your neighborhood)
|
|
@ojaswa072 Test sandbox correctness — duplicate PR_PATH_RE definitions may cause the sandbox to test the wrong regex |
|
🎉 @SatishKumar620 You referred a new contributor! Total referrals: 2. Current rank: #2. Monthly Referral Leaderboard (your neighborhood)
|
@SatishKumar620 Are you Confirm it really exist |
|
@coderabbitai Please assess the PR's completeness and merge readiness. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 56018
🔍 PR Status Summary
✅ Progress: 14 of 18 Comments AddressedGood progress across commits
🔓 Still Open (4 comments — PR is NOT ready to merge)
The 3 Major issues are the true blockers. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me inspect the current state of the file directly. 🏁 Script executed: Length of output: 46 🏁 Script executed: Length of output: 92 🏁 Script executed: Length of output: 142 🏁 Script executed: Length of output: 3584
The const parserBlock = extractBlock(
html,
'const PR_PATH_RE = /^\\/([^/]+)\\/([^/]+)\\/pull\\/(\\d+)\\/?$/i;',
'function showSectionMessage('
);This is precise enough to uniquely locate the parser block in if (typeof parseGitHubTrackingUrl !== 'function') {
throw new Error('parseGitHubTrackingUrl function not found in page script');
}So if the extraction were wrong and The 4 remaining blockers for this PR are separate issues — none of them are the
|
|
@ojaswa072 It has been resolved and no longer exists. |
Summary
This PR implements comprehensive input validation for GitHub Pull Request URLs to improve security (mitigating SSRF risks) and enhance the user experience.
Key Changes:
public/index.htmlto provide immediate feedback to users, preventing invalid network requests.src/utils.pywith an anchored regex and length constraints to block malformed or malicious URLs.src/handlers.pyto catchValueErrorexceptions and return a400 Bad Requestinstead of allowing a server-side crash.Testing & Verification:
curl.exethat the API correctly rejects direct malformed POST requests.Before-
After-
Summary by CodeRabbit
New Features
Bug Fixes
Tests