Flakiness detection init by Rudra-rps · Pull Request #281 · OWASP-BLT/BLT-Leaf

Rudra-rps · 2026-03-11T14:48:42Z

This PR introduces an end-to-end CI flakiness detection system for BLT-Leaf that identifies nondeterministic CI failures and integrates flakiness scores into the PR readiness calculation.

Problem

Currently, all CI failures are treated as deterministic failures. In practice, many failures are caused by transient infrastructure issues or flaky tests, which can unfairly penalize contributors and distort readiness scoring.

Solution

This PR adds a pipeline that:

Collects CI job results

Retries failed jobs to confirm flakes

Computes statistical flakiness scores over a 20-run window

Generates reports and integrates flakiness scores into PR readiness scoring

Key Changes

CI pipeline workflow (flakiness_detector.yml)

D1 schema migration for CI history

Flakiness detection scripts:

collect_ci_results.py

retry_failures.py

analyze_flakiness.py

report_flakiness.py

Worker integration to reduce penalties for known flaky checks

Testing Evidence

103 unit tests passing

Full pipeline verified locally using --dry-run

Retry-based flake detection validated

Flakiness scoring tested on a 20-run window

Worker readiness scoring integration demonstrated

Example behavior:

BEFORE: CI penalty = 0.50 → PR readiness = 61
AFTER: CI penalty = 0.20 → PR readiness = 63

This ensures contributors are not unfairly penalized for intermittent CI issues.

Artifacts generated:

flakiness_report.md

flakiness_metrics.json

Summary by CodeRabbit

Release Notes

New Features
- Added automated flakiness detection that identifies unreliable tests and retries failed jobs to confirm flakiness.
- Flaky tests are now reported as GitHub Issues and PR comments for visibility.
- PR readiness scoring now accounts for known flaky tests, reducing CI penalties for failures attributed to test instability.
Chores
- Added GitHub Actions workflow for flakiness detection pipeline.
- Created database schema for CI run history and flakiness metrics.
- Added configuration and test suite for flakiness detection system.

owasp-blt · 2026-03-11T14:48:48Z

📊 Monthly Leaderboard

Hi @Rudra-rps! Here's how you rank for March 2026:

Rank	User	Open PRs	PRs (merged)	PRs (closed)	Reviews	Comments	Total
#23	`@SatishKumar620`	9	0	0	0	23	55
#24	`@Rudra-rps` ✨	2	4	1	1	4	53
#25	`@Riya-Jain-here`	3	2	0	0	10	43

Scoring this month (across OWASP-BLT org): Open PRs (+1 each), Merged PRs (+10), Closed (not merged) (−2), Reviews (+5; first two per PR in-month), Comments (+2, excludes CodeRabbit). Run /leaderboard on any issue or PR to see your rank!

owasp-blt · 2026-03-11T14:48:52Z

👋 Hi @Rudra-rps!

This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:

The PR author
coderabbitai
copilot

Once a valid peer review is submitted, this check will pass automatically. Thank you!

⚠️ Peer review enforcement is active.

github-actions · 2026-03-11T14:48:56Z

🍃 PR Readiness Check

Check the readiness of this PR on Leaf:
👉 Open on Leaf

Leaf reviews pull requests for operational readiness, security risks, and production-impacting changes before they ship.

coderabbitai · 2026-03-11T14:48:57Z

Walkthrough

Introduces a complete flakiness detection pipeline consisting of a GitHub Actions workflow that orchestrates CI result collection, failure analysis, job retries, and flakiness reporting. Includes database schema for storing run history and computed flakiness scores, Python scripts implementing the pipeline stages, comprehensive tests, and integration with the existing PR readiness system to factor flaky failures into CI confidence scoring.

Changes

Cohort / File(s)	Summary
GitHub Actions Orchestration `.github/workflows/flakiness_detector.yml`	New workflow automating the flakiness detection pipeline with metadata resolution, CI result collection, conditional retries, analysis, GitHub reporting, and artifact storage.
Database Schema & Migrations `migrations/0004_create_flakiness_tables.sql`	Creates `ci_run_history` table for logging per-run CI job executions, `flakiness_scores` for computed metrics per job, and seeds `known_infrastructure_issues` patterns for classifying infrastructure-related failures.
Flakiness Pipeline Scripts `scripts/flakiness/collect_ci_results.py`, `scripts/flakiness/analyze_flakiness.py`, `scripts/flakiness/retry_failures.py`, `scripts/flakiness/report_flakiness.py`, `scripts/flakiness/db_utils.py`, `scripts/flakiness/flakiness_config.yml`	Modular scripts implementing collection of CI results from GitHub, statistical analysis with multi-step classification logic (deterministic/flaky/stable), orchestrated retries via GitHub API, flakiness reporting to issues/PRs with file generation, and D1 REST API client utilities. Configuration defines thresholds for analysis window, flaky/deterministic failure-rate bands, severity levels, and GitHub behavior.
Flakiness Tests `scripts/flakiness/tests/test_*.py`	Comprehensive test suite covering classify_conclusion logic, severity mapping, multi-step flakiness classification, GitHub API interactions, D1 queries, database utilities, issue/PR commenting, and markdown report generation; includes dry-run and integration scenarios.
PR Readiness Integration `src/database.py`	New async helpers to retrieve all flakiness scores and lookup specific (repo, workflow, check, job) entries from the database.
Readiness Calculation `src/handlers.py`, `src/utils.py`	Handler loads flakiness scores and passes them to PR readiness calculation. Readiness logic extended to compute known_flaky_count and apply reduced penalties (0.20 vs 0.50) for known flaky failures; flaky failures <= known count are converted to non-blocking warnings instead of blockers; CI confidence scoring accounts for flaky job count.
Configuration & Metadata `package.json`, `.gitignore`	Added `test:flakiness` npm script. Gitignore extended to exclude flakiness report artifacts (`data/flakiness_report.md`, `data/flakiness_metrics.json`).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

quality: high

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Flakiness detection init' directly and clearly describes the main objective of this PR: initializing a comprehensive CI flakiness detection system with workflows, database schema, analysis scripts, and worker integration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

S3DFX-CYBER · 2026-03-12T18:37:12Z

@coderabbitai are all changes inscope or required

coderabbitai · 2026-03-12T18:38:19Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get list of files changed in PR `#281`
gh pr view 281 --repo OWASP-BLT/BLT-Leaf --json files --jq '.files[].path'

Length of output: 662

🏁 Script executed:

#!/bin/bash
# Get the diff for existing modified files
gh pr diff 281 --repo OWASP-BLT/BLT-Leaf -- src/cache.py src/database.py src/handlers.py src/utils.py package.json .gitignore

Length of output: 36

🏁 Script executed:

#!/bin/bash
# Get full PR diff
gh pr diff 281 --repo OWASP-BLT/BLT-Leaf > /tmp/pr281.diff 2>&1

echo "=== src/cache.py diff ==="
grep -A 60 'diff --git a/src/cache.py' /tmp/pr281.diff | head -80

echo "=== src/database.py diff ==="
grep -A 60 'diff --git a/src/database.py' /tmp/pr281.diff | head -80

echo "=== src/handlers.py diff ==="
grep -A 80 'diff --git a/src/handlers.py' /tmp/pr281.diff | head -100

echo "=== src/utils.py diff ==="
grep -A 60 'diff --git a/src/utils.py' /tmp/pr281.diff | head -80

echo "=== package.json diff ==="
grep -A 30 'diff --git a/package.json' /tmp/pr281.diff | head -40

echo "=== .gitignore diff ==="
grep -A 20 'diff --git a/.gitignore' /tmp/pr281.diff | head -30

Length of output: 12151

@S3DFX-CYBER: Yes, all 20 changed files appear to be in-scope and required for the stated PR objective. Here's the breakdown:

Category	Files	Verdict
CI Workflow	`.github/workflows/flakiness_detector.yml`	✅ Core pipeline
DB Migration	`migrations/0004_create_flakiness_tables.sql`	✅ Required schema for new tables (`ci_run_history`, `flakiness_scores`)
Scripts	`scripts/flakiness/collect_ci_results.py`, `analyze_flakiness.py`, `retry_failures.py`, `report_flakiness.py`, `db_utils.py`, `flakiness_config.yml`	✅ Core detection pipeline
Tests	`scripts/flakiness/tests/test_*.py` + `__init__.py`	✅ 103 unit tests for all scripts
Worker: cache.py	Adds `get_cached_flakiness_scores()` with 60-min in-memory TTL	✅ Required to avoid repeated D1 lookups per request
Worker: database.py	Adds `get_all_flakiness_scores()` and `get_flakiness_score()`	✅ Required to read from new D1 tables
Worker: handlers.py	Loads flakiness scores and passes them to `calculate_pr_readiness()`	✅ Core worker integration
Worker: utils.py	Modifies `calculate_ci_confidence()` to apply reduced penalty (0.50 → 0.20) for known-flaky failures	✅ The primary scoring change described in the PR
package.json	Adds `test:flakiness` npm script	✅ Appropriate convenience script
.gitignore	Excludes `data/flakiness_report.md` and `data/flakiness_metrics.json`	✅ Correct — these are generated artifacts

No file appears to introduce unrelated concerns or unnecessary scope creep. The src/ changes are tightly coupled to the flakiness feature and wouldn't function without the scripts and migration. The scoring change in src/utils.py (reduced CI penalty for flaky checks) is exactly what the PR advertises.

ojaswa072 · 2026-03-12T18:38:38Z

@coderabbitai full review

coderabbitai · 2026-03-12T18:38:51Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (5)

scripts/flakiness/tests/test_report_flakiness.py (1)
77-77: Minor typo in test method name.

The method name test_contains_nextg_steps_section appears to have a typo - should likely be test_contains_next_steps_section.
💅 Suggested fix
-    def test_contains_nextg_steps_section(self):
+    def test_contains_next_steps_section(self):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/tests/test_report_flakiness.py` at line 77, Rename the test
function named test_contains_nextg_steps_section to
test_contains_next_steps_section in the test_report_flakiness.py file; update
the function declaration and any internal or external references (e.g., other
tests, fixtures, or test-ids) that refer to test_contains_nextg_steps_section so
test discovery and references remain consistent, ensuring no other code relies
on the old name.
scripts/flakiness/collect_ci_results.py (1)
146-154: Simplify the single-element tuple check.

Line 148 uses if conclusion_category in ('pass',): which is valid but unnecessarily verbose for a single value. Consider simplifying to a direct equality check.
♻️ Suggested simplification
-        if conclusion_category in ('pass',):
+        if conclusion_category == 'pass':
             status = 'pass'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/collect_ci_results.py` around lines 146 - 154, The check
using a single-element tuple is overly verbose: in the loop that calls
classify_conclusion (the block iterating "for job in jobs"), replace the
membership test "if conclusion_category in ('pass',):" with a direct equality
comparison "if conclusion_category == 'pass'" to simplify the logic and improve
readability; keep the subsequent elif (conclusion_category == 'skip') and else
branches unchanged.
src/utils.py (1)
439-450: Clarify the reduced penalty logic for edge cases.

The condition checks_failed <= known_flaky_count applies the reduced penalty even when some failures might be from non-flaky checks. For example, if there are 2 known flaky checks in the repo and 2 current failures, the reduced penalty applies even if both failures are from completely different checks.

This is acknowledged in the comment at Lines 479-481 ("we cannot match by check name here because pr_data only carries aggregate counts"), making this a conservative upper-bound approach. Consider adding a brief inline comment here explaining this design trade-off for future maintainers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils.py` around lines 439 - 450, Add an inline comment near the
reduced-penalty logic (the block using known_flaky_count, checks_failed, and
fail_penalty) explaining the design trade-off: because pr_data only provides
aggregate counts (not check names), the condition checks_failed <=
known_flaky_count is a conservative upper-bound that may wrongly apply the
reduced 0.20 penalty when failing checks are different from the known flaky
ones; explicitly document this limitation and that this choice intentionally
favors weaker penalization in ambiguous cases.
scripts/flakiness/tests/test_retry_failures.py (1)
289-318: Consider asserting on mock_mark for test completeness.

The static analysis correctly identifies that mock_mark is passed but never used. Since job-a is classified as confirmed_flake, mark_flake_confirmed should be called once. Adding an assertion would strengthen this test.
💚 Suggested assertion addition
         self.assertEqual(output['job-a'], 'confirmed_flake')
         self.assertEqual(output['job-b'], 'real_failure')
+        # Verify mark_flake_confirmed was called only for the flaky job
+        mock_mark.assert_called_once()
+        self.assertEqual(mock_mark.call_args[0][4], 'job-a')  # job_name argument
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/tests/test_retry_failures.py` around lines 289 - 318, The
test test_multiple_jobs_classified_independently currently patches
mark_flake_confirmed as mock_mark but never asserts it was called; add an
assertion after the existing output checks to verify mark_flake_confirmed was
invoked once for the confirmed flake (e.g., call mock_mark.assert_called_once()
or mock_mark.assert_called_once_with(...) if you know the expected args) so the
test verifies that retry_failures.mark_flake_confirmed was actually called when
job-a is classified as 'confirmed_flake'.
scripts/flakiness/report_flakiness.py (1)
331-344: Use the parsed report as the artifact source of truth.

report is already loaded from --flaky-report/stdin above, but this branch rebuilds all_scores from D1. That can make flakiness_report.md and flakiness_metrics.json diverge from the exact analysis results the script just processed. If you still need DB-only fields like last_updated, merge them in after starting from report.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/report_flakiness.py` around lines 331 - 344, The dry-run
branch replaces the parsed input `report` with fresh DB rows by calling
`d1_select`, which causes the generated artifacts to diverge from the exact
analysis the script just processed; instead, keep `all_scores` sourced from the
already-loaded `report` (use `report.get('flaky', []) +
report.get('deterministic', []) + report.get('stable', [])`) and, if you still
need DB-only fields such as `last_updated`, merge those fields into the items
from `report` by looking them up via `d1_select` results (e.g., keyed by test
id) and copying only the missing DB fields into the `report` entries before
writing `flakiness_report.md` and `flakiness_metrics.json`.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@migrations/0004_create_flakiness_tables.sql`:
- Around line 40-57: The migration seed for the known_infrastructure_issues
table is missing two patterns that are present in flakiness_config.yml; update
the INSERT INTO known_infrastructure_issues block in
migrations/0004_create_flakiness_tables.sql to include entries for the
"dependency" and "upstream" patterns (with category 'infrastructure' and
appropriate brief descriptions) so the database seeding matches the config;
ensure the new rows use INSERT OR IGNORE semantics and follow the same schema
(pattern, category, description) as the existing entries.

In `@scripts/flakiness/analyze_flakiness.py`:
- Around line 200-250: Selection, history lookup and UPSERT must use the same
identity: include workflow_name everywhere; update the history d1_select call to
filter by workflow_name (add AND workflow_name = ? and pass workflow_name in the
params used in the call) and change the UPSERT conflict target in d1_query from
ON CONFLICT(check_name, job_name) to ON CONFLICT(check_name, job_name,
workflow_name) so rows are per (check_name, job_name, workflow_name); touch the
related variables check_name, job_name, workflow_name and the functions
d1_select, analyze_check, d1_query in the loop to ensure the same tuple is used
for selection, analysis and the INSERT/UPDATE.
- Around line 71-102: The classification currently ignores flask_confirmed
counts; change the classification logic so that the 'flaky' branch requires both
failure_rate >= flaky_min AND flaky_count > 0 (i.e., at least one
'flake_confirmed' in window_rows). Concretely, update the conditional sequence
around variables consecutive, failure_rate, flaky_count, consec_det, flaky_max,
flaky_min so that deterministic branches remain the same but the flaky branch
becomes "elif failure_rate >= flaky_min and flaky_count > 0: classification =
'flaky'"; keep flakiness_score and severity computation using _get_severity only
for classification == 'flaky'.

In `@scripts/flakiness/report_flakiness.py`:
- Around line 55-83: The issue is that titles and lookups collapse identity to
check_name only; update _issue_title (and any callers like search_flaky_issue
and create_issue) to include job_name and workflow_name along with check_name
and prefix (e.g., "{prefix} {workflow_name} / {job_name} / {check_name}") so the
generated title is unique per job+workflow+check, and ensure the PR
summary/table generation uses the same composed identity (replace references to
only check_name with the trio job_name, workflow_name, check_name) so
search_flaky_issue finds/compares using the exact same title format.

In `@scripts/flakiness/retry_failures.py`:
- Line 164: The print statement in retry_failures.py currently uses an
unnecessary f-string: change the line printing "[retry] Simulated retry result:
success" to use a normal string (remove the leading `f`) so it becomes a regular
print call; locate the print in the retry/logging area (the print statement
shown as `print(f'[retry] Simulated retry result: success', file=sys.stderr)`)
and remove the `f` prefix.

---

Nitpick comments:
In `@scripts/flakiness/collect_ci_results.py`:
- Around line 146-154: The check using a single-element tuple is overly verbose:
in the loop that calls classify_conclusion (the block iterating "for job in
jobs"), replace the membership test "if conclusion_category in ('pass',):" with
a direct equality comparison "if conclusion_category == 'pass'" to simplify the
logic and improve readability; keep the subsequent elif (conclusion_category ==
'skip') and else branches unchanged.

In `@scripts/flakiness/report_flakiness.py`:
- Around line 331-344: The dry-run branch replaces the parsed input `report`
with fresh DB rows by calling `d1_select`, which causes the generated artifacts
to diverge from the exact analysis the script just processed; instead, keep
`all_scores` sourced from the already-loaded `report` (use `report.get('flaky',
[]) + report.get('deterministic', []) + report.get('stable', [])`) and, if you
still need DB-only fields such as `last_updated`, merge those fields into the
items from `report` by looking them up via `d1_select` results (e.g., keyed by
test id) and copying only the missing DB fields into the `report` entries before
writing `flakiness_report.md` and `flakiness_metrics.json`.

In `@scripts/flakiness/tests/test_report_flakiness.py`:
- Line 77: Rename the test function named test_contains_nextg_steps_section to
test_contains_next_steps_section in the test_report_flakiness.py file; update
the function declaration and any internal or external references (e.g., other
tests, fixtures, or test-ids) that refer to test_contains_nextg_steps_section so
test discovery and references remain consistent, ensuring no other code relies
on the old name.

In `@scripts/flakiness/tests/test_retry_failures.py`:
- Around line 289-318: The test test_multiple_jobs_classified_independently
currently patches mark_flake_confirmed as mock_mark but never asserts it was
called; add an assertion after the existing output checks to verify
mark_flake_confirmed was invoked once for the confirmed flake (e.g., call
mock_mark.assert_called_once() or mock_mark.assert_called_once_with(...) if you
know the expected args) so the test verifies that
retry_failures.mark_flake_confirmed was actually called when job-a is classified
as 'confirmed_flake'.

In `@src/utils.py`:
- Around line 439-450: Add an inline comment near the reduced-penalty logic (the
block using known_flaky_count, checks_failed, and fail_penalty) explaining the
design trade-off: because pr_data only provides aggregate counts (not check
names), the condition checks_failed <= known_flaky_count is a conservative
upper-bound that may wrongly apply the reduced 0.20 penalty when failing checks
are different from the known flaky ones; explicitly document this limitation and
that this choice intentionally favors weaker penalization in ambiguous cases.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: f37f3649-2296-49b3-8fde-998b8758ebe4

📥 Commits

Reviewing files that changed from the base of the PR and between a22a666 and 71784d7.

📒 Files selected for processing (19)

.github/workflows/flakiness_detector.yml
migrations/0004_create_flakiness_tables.sql
package.json
scripts/flakiness/analyze_flakiness.py
scripts/flakiness/collect_ci_results.py
scripts/flakiness/db_utils.py
scripts/flakiness/flakiness_config.yml
scripts/flakiness/report_flakiness.py
scripts/flakiness/retry_failures.py
scripts/flakiness/tests/__init__.py
scripts/flakiness/tests/test_analyze_flakiness.py
scripts/flakiness/tests/test_collect_ci_results.py
scripts/flakiness/tests/test_db_utils.py
scripts/flakiness/tests/test_report_flakiness.py
scripts/flakiness/tests/test_retry_failures.py
src/cache.py
src/database.py
src/handlers.py
src/utils.py

migrations/0004_create_flakiness_tables.sql

scripts/flakiness/analyze_flakiness.py

scripts/flakiness/report_flakiness.py

scripts/flakiness/retry_failures.py

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (5)

scripts/flakiness/tests/test_analyze_flakiness.py (1)

267-271: Strengthen this upsert assertion against schema drift.

This only proves that an upsert exists. It will still pass if workflow_name or any of the metric columns disappear from the generated SQL. Assert the full column set expected by migrations/0004_create_flakiness_tables.sql.

Suggested test hardening

         sql = mock_query.call_args[0][3]
         self.assertIn('INSERT INTO flakiness_scores', sql)
         self.assertIn('ON CONFLICT', sql)
+        for column in (
+            'check_name',
+            'job_name',
+            'workflow_name',
+            'flakiness_score',
+            'severity',
+            'classification',
+            'total_runs',
+            'failure_count',
+            'flaky_failures',
+            'consecutive_failures',
+            'last_updated',
+        ):
+            self.assertIn(column, sql)

scripts/flakiness/tests/test_report_flakiness.py (1)

77-79: Typo in test method name: test_contains_nextg_steps_section

The method name contains a typo — "nextg" should be "next". While this doesn't affect test execution, it impacts discoverability and readability.

✏️ Proposed fix

-    def test_contains_nextg_steps_section(self):
+    def test_contains_next_steps_section(self):
         body = _build_issue_body(_entry())
         self.assertIn('Next steps', body)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/tests/test_report_flakiness.py` around lines 77 - 79,
Rename the test method test_contains_nextg_steps_section to
test_contains_next_steps_section to fix the typo; update the test function
definition where it calls _build_issue_body(_entry()) and asserts 'Next steps'
is in body so the test name accurately reflects its intent and improves
readability/discoverability.

scripts/flakiness/db_utils.py (1)

83-89: Consider defensive handling for missing 'pattern' key.

If a known_infrastructure_issues row somehow lacks a pattern column, line 89 would raise KeyError. While unlikely given the schema, a defensive .get() with a filter could improve robustness.

🛡️ Defensive approach

 def get_infra_patterns(account_id, db_id, token):
     """Return a list of lowercase infrastructure pattern strings from D1."""
     rows = d1_select(
         account_id, db_id, token,
         'SELECT pattern FROM known_infrastructure_issues',
     )
-    return [row['pattern'].lower() for row in rows]
+    return [row['pattern'].lower() for row in rows if row.get('pattern')]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/db_utils.py` around lines 83 - 89, get_infra_patterns
currently assumes every row has a 'pattern' key and will KeyError if not; update
the return to defensively extract the value (use row.get('pattern')), filter out
None/non-string values, and only call .lower() on legitimate strings so missing
or malformed rows are skipped (modify the list comprehension in
get_infra_patterns accordingly).

scripts/flakiness/tests/test_retry_failures.py (1)

289-317: Unused mock_mark should verify flake confirmation for multi-job scenario.

The test patches mark_flake_confirmed as mock_mark but doesn't assert on it. Since job-a is classified as confirmed_flake, the test should verify mark_flake_confirmed was called for job-a but not for job-b.

✅ Proposed fix to add assertion

         self.assertEqual(output['job-a'], 'confirmed_flake')
         self.assertEqual(output['job-b'], 'real_failure')
+        # Verify mark_flake_confirmed was called only for the confirmed flake
+        mock_mark.assert_called_once()
+        call_args = mock_mark.call_args[0]
+        self.assertEqual(call_args[4], 'job-a')  # job_name argument

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/tests/test_retry_failures.py` around lines 289 - 317, The
test_multiple_jobs_classified_independently test patches mark_flake_confirmed as
mock_mark but never asserts it was called; update the test to assert mock_mark
was called once for 'job-a' and not called for 'job-b' after
retry_failures.main() runs by using mock_mark.assert_any_call(...) or equivalent
and mock_mark.assert_not_called()/assert_called_once_with for the appropriate
arguments matching how mark_flake_confirmed is invoked in the code; ensure you
reference the patched mock_mark and the job identifiers 'job-a' and 'job-b' when
adding the assertions.

scripts/flakiness/collect_ci_results.py (1)

98-99: Consider logging unknown conclusions rather than silently treating them as pass.

Unknown conclusions (e.g., 'action_required', 'stale') are treated as 'pass', which may hide unexpected job states. Consider logging these for observability.

🔍 Proposed enhancement

     # unknown conclusions (e.g. 'action_required') → treat as pass
+    if conclusion not in ('', 'success', 'failure', 'skipped', 'cancelled', 'neutral', 'timed_out'):
+        import sys
+        print(f'[warn] Unknown conclusion "{conclusion}" for job "{job.get("name")}" — treating as pass',
+              file=sys.stderr)
     return 'pass'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/collect_ci_results.py` around lines 98 - 99, The code
currently treats any unknown conclusion by silently returning 'pass' (the line
with return 'pass'); update this to log the unexpected conclusion value before
returning so it's visible in CI logs — e.g., use the module's logger
(logger.warning or logging.warn) or print a warning that includes the conclusion
variable (e.g., f"Unexpected conclusion: {conclusion} -> treating as 'pass'")
and then return 'pass'; place the log immediately before the existing return so
behavior is unchanged but observable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/flakiness/analyze_flakiness.py`:
- Line 172: The print call uses an unnecessary f-string prefix in the message
printed to sys.stderr (the line containing print(f'[dry-run]   → insufficient
data (need ≥5 runs)', file=sys.stderr)); remove the leading "f" so the literal
string is printed (change to print('[dry-run]   → insufficient data (need ≥5
runs)', file=sys.stderr)) to avoid misleading use of f-strings.

In `@scripts/flakiness/flakiness_config.yml`:
- Around line 22-23: The YAML contains overly broad infrastructure patterns
"dependency" and "upstream" that will match many non-infra failures; replace
them with narrower, explicit patterns (e.g., more specific labels or prefixed
tokens like "infra:dependency" / "infra:upstream" or anchored regexes such as
^dependency-service- or ^upstream-ci- ) so only true infra-related test failures
are classified as noise; update the entries referencing "dependency" and
"upstream" in flakiness_config.yml (and any consumers that emit those tags) to
the chosen, more specific tokens.

In `@scripts/flakiness/report_flakiness.py`:
- Line 261: The unpacking owner, repo_name = args.repo.split('/', 1) in
report_flakiness.py will crash if args.repo lacks a '/'—add validation before
unpacking: check that args.repo contains a single '/' (or that split('/',1)
returns two parts), and if not call parser.error or raise a clear ValueError
with a message like "Invalid --repo format, expected owner/repo"; update the
parsing flow where args.repo is consumed (same spot as owner, repo_name) to
perform this check and fail fast with a helpful message.

In `@scripts/flakiness/retry_failures.py`:
- Around line 182-183: Validate and handle a malformed args.repo before
unpacking: check that args.repo contains a '/' (e.g., using 'in' or split
length) and, if not, print a clear error or raise a parser error/exit with
non-zero status so the script doesn't crash on unpacking; update the code around
the owner, repo_name = args.repo.split('/', 1) assignment to perform this
validation or wrap it in try/except and emit a helpful message (reference:
args.repo, owner, repo_name, and get_d1_credentials).

In `@src/cache.py`:
- Around line 337-347: The code currently sets _flakiness_cache['data'] = {} on
any exception from get_db/get_all_flakiness_scores which caches a failed D1
read; change it so the cache is only updated on a successful load: move the
assignments to _flakiness_cache['data'] and _flakiness_cache['timestamp'] inside
the try block after scores is retrieved, and in the except block do not write an
empty dict to the cache (just log the error). If you must handle the bootstrap
"table missing" case specially, detect that specific error from
get_all_flakiness_scores and decide whether to cache an explicit sentinel, but
do not globally memoize empty {} on any D1 failure; use the existing symbols
get_db, get_all_flakiness_scores, _flakiness_cache, env, and current_time to
locate and apply the change.

In `@src/database.py`:
- Around line 397-429: The flakiness helpers currently key and query only by
(check_name, job_name) causing collisions; update get_all_flakiness_scores and
get_flakiness_score to include repo (e.g., repository or repo_name) and
workflow_name in the SELECT, the dict key, and the WHERE clause (so keys become
(repo, workflow_name, check_name, job_name) and the prepared statement in
get_flakiness_score binds repo and workflow_name before check_name/job_name);
also update any callers consuming the dict to expect the extended key and modify
the DB schema/index (flakiness_scores table and its indexes) to store and index
repo and workflow_name for correct scoped lookups.

---

Nitpick comments:
In `@scripts/flakiness/collect_ci_results.py`:
- Around line 98-99: The code currently treats any unknown conclusion by
silently returning 'pass' (the line with return 'pass'); update this to log the
unexpected conclusion value before returning so it's visible in CI logs — e.g.,
use the module's logger (logger.warning or logging.warn) or print a warning that
includes the conclusion variable (e.g., f"Unexpected conclusion: {conclusion} ->
treating as 'pass'") and then return 'pass'; place the log immediately before
the existing return so behavior is unchanged but observable.

In `@scripts/flakiness/db_utils.py`:
- Around line 83-89: get_infra_patterns currently assumes every row has a
'pattern' key and will KeyError if not; update the return to defensively extract
the value (use row.get('pattern')), filter out None/non-string values, and only
call .lower() on legitimate strings so missing or malformed rows are skipped
(modify the list comprehension in get_infra_patterns accordingly).

In `@scripts/flakiness/tests/test_report_flakiness.py`:
- Around line 77-79: Rename the test method test_contains_nextg_steps_section to
test_contains_next_steps_section to fix the typo; update the test function
definition where it calls _build_issue_body(_entry()) and asserts 'Next steps'
is in body so the test name accurately reflects its intent and improves
readability/discoverability.

In `@scripts/flakiness/tests/test_retry_failures.py`:
- Around line 289-317: The test_multiple_jobs_classified_independently test
patches mark_flake_confirmed as mock_mark but never asserts it was called;
update the test to assert mock_mark was called once for 'job-a' and not called
for 'job-b' after retry_failures.main() runs by using
mock_mark.assert_any_call(...) or equivalent and
mock_mark.assert_not_called()/assert_called_once_with for the appropriate
arguments matching how mark_flake_confirmed is invoked in the code; ensure you
reference the patched mock_mark and the job identifiers 'job-a' and 'job-b' when
adding the assertions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 16b17373-fb92-4ce5-baa7-28db482584ad

📥 Commits

Reviewing files that changed from the base of the PR and between a22a666 and 93a0dd1.

📒 Files selected for processing (20)

.github/workflows/flakiness_detector.yml
.gitignore
migrations/0004_create_flakiness_tables.sql
package.json
scripts/flakiness/analyze_flakiness.py
scripts/flakiness/collect_ci_results.py
scripts/flakiness/db_utils.py
scripts/flakiness/flakiness_config.yml
scripts/flakiness/report_flakiness.py
scripts/flakiness/retry_failures.py
scripts/flakiness/tests/__init__.py
scripts/flakiness/tests/test_analyze_flakiness.py
scripts/flakiness/tests/test_collect_ci_results.py
scripts/flakiness/tests/test_db_utils.py
scripts/flakiness/tests/test_report_flakiness.py
scripts/flakiness/tests/test_retry_failures.py
src/cache.py
src/database.py
src/handlers.py
src/utils.py

coderabbitai · 2026-03-12T18:50:43Z

scripts/flakiness/analyze_flakiness.py

+                  file=sys.stderr)
+            result = analyze_check(history, config)
+            if result is None:
+                print(f'[dry-run]   → insufficient data (need ≥5 runs)', file=sys.stderr)


⚠️ Potential issue | 🟡 Minor

Remove unnecessary f-string prefix.

Line 172 has an f-string without any placeholders.

🔧 Fix

- print(f'[dry-run] → insufficient data (need ≥5 runs)', file=sys.stderr) + print('[dry-run] → insufficient data (need ≥5 runs)', file=sys.stderr)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f'[dry-run] → insufficient data (need ≥5 runs)', file=sys.stderr)

print('[dry-run] → insufficient data (need ≥5 runs)', file=sys.stderr)

🧰 Tools

🪛 Ruff (0.15.5)

[error] 172-172: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/flakiness/analyze_flakiness.py` at line 172, The print call uses an unnecessary f-string prefix in the message printed to sys.stderr (the line containing print(f'[dry-run] → insufficient data (need ≥5 runs)', file=sys.stderr)); remove the leading "f" so the literal string is printed (change to print('[dry-run] → insufficient data (need ≥5 runs)', file=sys.stderr)) to avoid misleading use of f-strings.

scripts/flakiness/flakiness_config.yml

scripts/flakiness/report_flakiness.py

scripts/flakiness/retry_failures.py

src/cache.py

src/database.py

ojaswa072 · 2026-03-12T18:50:58Z

@Rudra-rps Solid concept and great test coverage. A few things to check:

Is flaky_count ever used in classification? Unconfirmed flakes could still reduce CI penalties
Is workflow_name intentionally dropped in history lookup?
Should issue identity include job_name too?
Migration missing dependency and upstream patterns from config - intentional?

infrastructure pattern scope, and validation issues.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

scripts/flakiness/analyze_flakiness.py (1)

172-172: ⚠️ Potential issue | 🟡 Minor

Remove unnecessary f-string prefix.

Line 172 has an f-string without any placeholders. This was flagged in a previous review.

🔧 Proposed fix

-                print(f'[dry-run]   → insufficient data (need ≥5 runs)', file=sys.stderr)
+                print('[dry-run]   → insufficient data (need ≥5 runs)', file=sys.stderr)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/analyze_flakiness.py` at line 172, The print call using an
f-string at print(f'[dry-run]   → insufficient data (need ≥5 runs)',
file=sys.stderr) has no placeholders—remove the unnecessary f prefix and make it
a normal string literal in the same print(...) call; update the expression in
the analyze_flakiness.py print statement accordingly so it reads
print('[dry-run]   → insufficient data (need ≥5 runs)', file=sys.stderr) without
changing behavior.

🧹 Nitpick comments (5)

scripts/flakiness/report_flakiness.py (1)
278-279: Verify: Credentials variables may be unbound on certain code paths.

The variables account_id, db_id, token are only assigned when not args.dry_run (Line 278-279), but they're used in the d1_select call on Lines 363-367, which is also guarded by else (not dry_run). The logic appears correct, but the variable scope could be clearer.

Consider initializing to None before the conditional or restructuring to make the scope explicit:
💡 Optional clarity improvement
+    account_id = db_id = token = None
     if not args.dry_run:
         account_id, db_id, token = get_d1_credentials()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/flakiness/report_flakiness.py` around lines 278 - 279, The variables
account_id, db_id, and token are only set inside the if not args.dry_run branch
which can make their scope unclear; to fix, explicitly initialize account_id,
db_id, token = None, None, None before the if or move the d1-related logic
(including the d1_select call) into the same not args.dry_run block so
get_d1_credentials() and d1_select are in the same scope; update references to
args.dry_run, get_d1_credentials, and d1_select accordingly to ensure the
variables are always defined when used.
src/database.py (1)
414-415: Minor: Use f-string conversion flag for cleaner exception formatting.

Per static analysis, prefer {e!s} over {str(e)} for slightly cleaner syntax.
🔧 Proposed fix
     except Exception as e:
-        print(f"Flakiness: Error loading all scores: {str(e)}")
+        print(f"Flakiness: Error loading all scores: {e!s}")
         return {}
     except Exception as e:
         print(
             "Flakiness: Error loading score for "
-            f"{repo}/{workflow_name}/{check_name}/{job_name}: {str(e)}"
+            f"{repo}/{workflow_name}/{check_name}/{job_name}: {e!s}"
         )
Also applies to: 431-434
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/database.py` around lines 414 - 415, Replace uses of str(e) in exception
formatting with the f-string conversion flag {e!s} for cleaner syntax;
specifically update the print/error messages that currently use
print(f"Flakiness: Error loading all scores: {str(e)}") and the similar block
around lines 431-434 to use print(f"Flakiness: Error loading all scores: {e!s}")
(or the equivalent logging call) so all exception interpolations use {e!s}
instead of str(e).
migrations/0005_extend_flakiness_scores.sql (1)
41-44: Note: idx_flakiness_scores_lookup may duplicate the PRIMARY KEY index.

SQLite automatically creates an index for composite PRIMARY KEYs. The explicit idx_flakiness_scores_lookup on the same columns is likely redundant but harmless. Consider removing it to reduce index maintenance overhead, or keep it if you need a named index for explicit query hints.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@migrations/0005_extend_flakiness_scores.sql` around lines 41 - 44, The
explicit CREATE INDEX for idx_flakiness_scores_lookup duplicates the composite
PRIMARY KEY index on flakiness_scores(repo, workflow_name, check_name,
job_name); remove the CREATE INDEX IF NOT EXISTS idx_flakiness_scores_lookup ...
statement from the migration to avoid redundant index maintenance (or if you
intentionally need a named index for query hints, add a code comment explaining
that decision instead of leaving the redundant index in place).
src/utils.py (1)
479-492: Consider: Aggregate count approach may be overly lenient.

The current logic counts all known-flaky checks for the repo and applies reduced penalty if checks_failed <= known_flaky_count. This could reduce penalties even when the actual failing checks aren't the known-flaky ones.

Since pr_data only carries aggregate counts (not individual check names), this is a pragmatic compromise. The comment at Lines 480-481 acknowledges this as a "conservative upper bound." If more precise matching becomes important later, the readiness endpoint could be enhanced to pass individual check identities.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils.py` around lines 479 - 492, The current aggregate matching can
over-credit failures as "known flaky" because known_flaky_count may exceed the
repo's failing checks; to avoid reducing penalties incorrectly, clamp
known_flaky_count to the reported failing checks by using
pr_data.get('checks_failed') when computing the effective flaky count (i.e.,
effective_flaky = min(known_flaky_count, pr_data.get('checks_failed', 0)));
update the computation around known_flaky_count and add a short comment
referencing flakiness_scores, pr_data, and checks_failed that explains this
pragmatic clamp and TODO to switch to per-check identities when the readiness
endpoint can provide them.
src/cache.py (1)
341-342: Minor: Use f-string conversion flag.
🔧 Proposed fix
     except Exception as e:
-        print(f"Flakiness Cache: D1 load failed ({str(e)}), not caching")
+        print(f"Flakiness Cache: D1 load failed ({e!s}), not caching")
         return {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cache.py` around lines 341 - 342, In the except block that prints the D1
load failure message in src/cache.py (the handler that currently does: except
Exception as e: print(f"Flakiness Cache: D1 load failed ({str(e)}), not
caching")), replace the str(e) call with an f-string conversion flag so the
exception is shown with its representation (use {e!r}) to preserve type and
detail; keep the same print call and message context but use {e!r} instead of
str(e).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@migrations/0005_extend_flakiness_scores.sql`:
- Around line 21-36: The INSERT currently INNER JOINs flakiness_scores to
ci_run_history which will drop any flakiness_scores rows with no matching
ci_run_history; change the JOIN to a LEFT JOIN and use a fallback for repo
(e.g., COALESCE(h.repo, s.repo)) so orphaned scores are preserved when inserting
into flakiness_scores_v2, updating the SELECT to pull repo from h when present
otherwise from s; alternatively, if dropping orphans is intentional add a
comment above the query explaining that behavior.

---

Duplicate comments:
In `@scripts/flakiness/analyze_flakiness.py`:
- Line 172: The print call using an f-string at print(f'[dry-run]   →
insufficient data (need ≥5 runs)', file=sys.stderr) has no placeholders—remove
the unnecessary f prefix and make it a normal string literal in the same
print(...) call; update the expression in the analyze_flakiness.py print
statement accordingly so it reads print('[dry-run]   → insufficient data (need
≥5 runs)', file=sys.stderr) without changing behavior.

---

Nitpick comments:
In `@migrations/0005_extend_flakiness_scores.sql`:
- Around line 41-44: The explicit CREATE INDEX for idx_flakiness_scores_lookup
duplicates the composite PRIMARY KEY index on flakiness_scores(repo,
workflow_name, check_name, job_name); remove the CREATE INDEX IF NOT EXISTS
idx_flakiness_scores_lookup ... statement from the migration to avoid redundant
index maintenance (or if you intentionally need a named index for query hints,
add a code comment explaining that decision instead of leaving the redundant
index in place).

In `@scripts/flakiness/report_flakiness.py`:
- Around line 278-279: The variables account_id, db_id, and token are only set
inside the if not args.dry_run branch which can make their scope unclear; to
fix, explicitly initialize account_id, db_id, token = None, None, None before
the if or move the d1-related logic (including the d1_select call) into the same
not args.dry_run block so get_d1_credentials() and d1_select are in the same
scope; update references to args.dry_run, get_d1_credentials, and d1_select
accordingly to ensure the variables are always defined when used.

In `@src/cache.py`:
- Around line 341-342: In the except block that prints the D1 load failure
message in src/cache.py (the handler that currently does: except Exception as e:
print(f"Flakiness Cache: D1 load failed ({str(e)}), not caching")), replace the
str(e) call with an f-string conversion flag so the exception is shown with its
representation (use {e!r}) to preserve type and detail; keep the same print call
and message context but use {e!r} instead of str(e).

In `@src/database.py`:
- Around line 414-415: Replace uses of str(e) in exception formatting with the
f-string conversion flag {e!s} for cleaner syntax; specifically update the
print/error messages that currently use print(f"Flakiness: Error loading all
scores: {str(e)}") and the similar block around lines 431-434 to use
print(f"Flakiness: Error loading all scores: {e!s}") (or the equivalent logging
call) so all exception interpolations use {e!s} instead of str(e).

In `@src/utils.py`:
- Around line 479-492: The current aggregate matching can over-credit failures
as "known flaky" because known_flaky_count may exceed the repo's failing checks;
to avoid reducing penalties incorrectly, clamp known_flaky_count to the reported
failing checks by using pr_data.get('checks_failed') when computing the
effective flaky count (i.e., effective_flaky = min(known_flaky_count,
pr_data.get('checks_failed', 0))); update the computation around
known_flaky_count and add a short comment referencing flakiness_scores, pr_data,
and checks_failed that explains this pragmatic clamp and TODO to switch to
per-check identities when the readiness endpoint can provide them.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 034ff890-a354-4901-af94-bbf77a7571b0

📥 Commits

Reviewing files that changed from the base of the PR and between 93a0dd1 and 1c8ca7c.

📒 Files selected for processing (9)

migrations/0004_create_flakiness_tables.sql
migrations/0005_extend_flakiness_scores.sql
scripts/flakiness/analyze_flakiness.py
scripts/flakiness/flakiness_config.yml
scripts/flakiness/report_flakiness.py
scripts/flakiness/retry_failures.py
src/cache.py
src/database.py
src/utils.py

🚧 Files skipped from review as they are similar to previous changes (2)

scripts/flakiness/flakiness_config.yml
migrations/0004_create_flakiness_tables.sql

migrations/0005_extend_flakiness_scores.sql

Rudra-rps · 2026-03-13T20:20:21Z

@Rudra-rps Solid concept and great test coverage. A few things to check:

Is flaky_count ever used in classification? Unconfirmed flakes could still reduce CI penalties

Is workflow_name intentionally dropped in history lookup?

Should issue identity include job_name too?

Migration missing dependency and upstream patterns from config - intentional?

The system classifies a check as flaky only when the failure rate crosses the configured threshold and at least one intermittent flaky occurrence is detected.

Rudra-rps · 2026-03-13T20:25:37Z

@Rudra-rps Solid concept and great test coverage. A few things to check:

Is flaky_count ever used in classification? Unconfirmed flakes could still reduce CI penalties

Is workflow_name intentionally dropped in history lookup?

Should issue identity include job_name too?

Migration missing dependency and upstream patterns from config - intentional?

The system classifies a check as flaky only when the failure rate crosses the configured threshold and at least one intermittent flaky occurrence is detected.

also the system doesn't drop workflow_name, it is included in the distinct job combination includes workflow_name, job_name and check_name

…it directly for a more optimal approach

Rudra-rps · 2026-03-15T16:14:55Z

I made a few optimizations to simplify the flakiness schema and lookup flow.

• Schema cleanup: flakiness_scores now uses (repo, workflow_name, check_name, job_name) as the primary key in 0004_create_flakiness_tables.sql, with repo as the leading column. The indexes previously introduced in 0005 were also moved into this migration.

• Removed redundant migration: Since the schema now includes repo from the start, 0005_extend_flakiness_scores.sql (which previously added the column via a create–copy–rename process) is no longer needed and has been removed.

• Simplified flakiness lookup: The in-memory 60-minute TTL cache layer was removed. handlers.py now directly calls get_all_flakiness_scores() instead of using get_cached_flakiness_scores() from cache.py. The behavior remains the same because get_all_flakiness_scores() already returns {} safely on failure.

Overall this reduces migration complexity and removes an unnecessary caching layer while keeping the runtime behavior unchanged.

owasp-blt · 2026-03-15T16:16:13Z

🎓 A mentor has been assigned to this issue!

Mentor: @dev-sanidhya
Contributor: @ojaswa072

@dev-sanidhya — please provide guidance and support. Use /handoff if you need to transfer mentorship.

@ojaswa072 — @dev-sanidhya will help you through this. Feel free to ask questions here. Use /rematch if you need a different mentor.

Happy coding! 🚀 — OWASP BLT-Pool

coderabbitai

🧹 Nitpick comments (1)

src/handlers.py (1)

1587-1588: Reuse one DB handle inside _run_readiness_analysis.

This works, but opening another DB handle at Line 1588 is avoidable. Reusing one local db instance keeps the function tighter and reduces per-call overhead.

♻️ Proposed refactor

 async def _run_readiness_analysis(env, pr, pr_id, github_token):
@@
-        # Calculate and update review_status from timeline data
+        db = get_db(env)
+
+        # Calculate and update review_status from timeline data
         # This ensures the database has the latest review status without making duplicate API calls
         original_review_status = pr.get('review_status', 'pending')
         review_status = calculate_review_status(timeline_data.get('reviews', []))
         if review_status != original_review_status:
             # Update review_status in database only if it actually changed
-            db = get_db(env)
             await db.prepare(
                 'UPDATE prs SET review_status = ?, updated_at = CURRENT_TIMESTAMP WHERE id = ?'
             ).bind(review_status, pr_id).run()
             pr['review_status'] = review_status
@@
-        flakiness_scores = await get_all_flakiness_scores(get_db(env))
+        flakiness_scores = await get_all_flakiness_scores(db)

Also applies to: 1591-1594

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/handlers.py` around lines 1587 - 1588, In _run_readiness_analysis, avoid
opening multiple DB handles by calling get_db(env) once into a local variable
(e.g., db) and reuse that handle for subsequent calls such as
get_all_flakiness_scores(db), get_latest_member_metrics(db), and
get_member_merged_prs_by_repo(db) (and any other DB-using helpers in the same
function), replacing the extra get_db(env) invocations so the function reuses a
single DB instance for all operations.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/handlers.py`:
- Around line 1587-1588: In _run_readiness_analysis, avoid opening multiple DB
handles by calling get_db(env) once into a local variable (e.g., db) and reuse
that handle for subsequent calls such as get_all_flakiness_scores(db),
get_latest_member_metrics(db), and get_member_merged_prs_by_repo(db) (and any
other DB-using helpers in the same function), replacing the extra get_db(env)
invocations so the function reuses a single DB instance for all operations.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: OWASP-BLT/coderabbit/.coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 90ce6247-fe73-4264-94cb-7b5c3ac17ef8

📥 Commits

Reviewing files that changed from the base of the PR and between 1c8ca7c and 99df40a.

📒 Files selected for processing (2)

migrations/0004_create_flakiness_tables.sql
src/handlers.py

🚧 Files skipped from review as they are similar to previous changes (1)

migrations/0004_create_flakiness_tables.sql

Flakiness detection init

2a0ba31

owasp-blt bot added unresolved-conversations: 0 needs-peer-review PR needs peer review labels Mar 11, 2026

github-actions bot added the files-changed: 9 PR changes 9 files label Mar 11, 2026

Rudra-rps added 2 commits March 11, 2026 20:42

Phase 2

7eaeda7

phase 3

bbd9fdf

github-actions bot added files-changed: 12 PR changes 12 files and removed files-changed: 9 PR changes 9 files labels Mar 11, 2026

added tests

3281eb2

github-actions bot added files-changed: 19 PR changes 19 files and removed files-changed: 12 PR changes 12 files labels Mar 12, 2026

Dry run pipeline output

71784d7

Rudra-rps marked this pull request as ready for review March 12, 2026 18:28

coderabbitai bot added the quality: high label Mar 12, 2026

small cleanup

93a0dd1

github-actions bot added files-changed: 20 PR changes 20 files and removed files-changed: 19 PR changes 19 files labels Mar 12, 2026

coderabbitai bot requested changes Mar 12, 2026

View reviewed changes

coderabbitai bot added quality: high and removed quality: high labels Mar 12, 2026

coderabbitai bot requested changes Mar 12, 2026

View reviewed changes

Addresses CodeRabbit review feedback on database identity collapse,

1c8ca7c

infrastructure pattern scope, and validation issues.

github-actions bot added files-changed: 21 PR changes 21 files and removed files-changed: 20 PR changes 20 files labels Mar 13, 2026

coderabbitai bot removed the quality: high label Mar 13, 2026

coderabbitai bot requested changes Mar 13, 2026

View reviewed changes

migrations/0005_extend_flakiness_scores.sql Outdated Show resolved Hide resolved

fixed flagged issues by coderabbit

336e25c

handlers.py imports get_all_flakiness_scores from database and calls …

99df40a

…it directly for a more optimal approach

github-actions bot added files-changed: 19 PR changes 19 files and removed files-changed: 21 PR changes 21 files labels Mar 15, 2026

coderabbitai bot added the quality: high label Mar 15, 2026

owasp-blt bot added the mentor-assigned label Mar 15, 2026

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

	print(f'[dry-run] → insufficient data (need ≥5 runs)', file=sys.stderr)
	print('[dry-run] → insufficient data (need ≥5 runs)', file=sys.stderr)

Conversation

Rudra-rps commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

owasp-blt bot commented Mar 11, 2026

📊 Monthly Leaderboard

Uh oh!

owasp-blt bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

🍃 PR Readiness Check

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

S3DFX-CYBER commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026

Uh oh!

ojaswa072 commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ojaswa072 commented Mar 12, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Rudra-rps commented Mar 13, 2026

Uh oh!

Rudra-rps commented Mar 13, 2026

Uh oh!

Rudra-rps commented Mar 15, 2026

Uh oh!

owasp-blt bot commented Mar 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rudra-rps commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading