Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 208 additions & 0 deletions S-CORE_Quality_Maintenance_Proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
# S-CORE Quality Maintenance Proposal

**Author:** Mahale Komal
**Date:** May 6, 2026
**For:** Ahmed & Jan

---

## Summary

This document describes how S-CORE can define, track, and enforce software quality goals (KPIs).

**Quality goals we want to reach:**
- 100% line coverage
- No clang-tidy issues
- No CodeQL security issues
- No compiler warnings

The main question is not just which goals to set, but how we maintain them as a team.

**Recommended approach: Hybrid Quality Model**

| Check Type | When It Runs | Examples |
|------------|-------------|---------|
| Fast checks | Every PR | Build, unit tests, formatting, basic lint |
| Heavy checks | Once after review (manual) or every night | CodeQL, clang-tidy, sanitizers, coverage |
| Reporting | After every nightly run | Shared quality dashboard |

---

## Problem

Today we can set quality goals, but we have no clear way to:
- See the current status of each KPI
- Stop quality going down over time
- Block bad code from being merged

This proposal compares three options and recommends one for the team to decide.

---

## Options

### Option 1: Dashboard Only

Track quality numbers in a shared dashboard. No automatic blocking.

| Pros | Cons |
|------|------|
| Clear visibility of quality status | Does not block bad code from being merged |
| Easy to show trends over time | Needs manual team reaction |
| Good for management reporting | Not useful if nobody checks the dashboard |

---

### Option 2: CI Enforcement Only

Block merges in CI if quality thresholds are not met.

| Pros | Cons |
|------|------|
| Automatically enforces standards | Can slow development if thresholds are too strict |
| Prevents regressions | May cause friction if codebase is not ready |
| Makes expectations clear | Requires careful setup and baseline agreement |

---

### Option 3: Combined Approach — Recommended

Use both: dashboard for visibility and CI for enforcement of the most important checks.

Heavy checks (CodeQL, clang-tidy, coverage) run in one of three ways:
- **Manual trigger** — developer triggers once after review is done, before merge
- **Nightly schedule** — runs automatically every night for continuous monitoring
- **Mandatory on PR** — selected critical checks (e.g. clang-tidy) run automatically on every PR, while expensive checks (e.g. CodeQL, coverage) run only nightly or manually

This means expensive jobs do not run on every commit, but still run before merge.

| Pros | Cons |
|------|------|
| Balances speed and quality | Needs more setup |
| Prevents regressions and shows trends | Requires agreement on what to enforce vs. track |
| PR feedback stays fast | Developer must remember to trigger before merge |
| Avoids wasted CI runs on in-progress commits | |

---

## Measured CI Runtimes

These times were measured from a real CI run on this repository (PR #398):

| Check | Measured Time |
|-------|--------------|
| CodeQL analysis | ~5 hours 56 minutes |
| Clang-Tidy analysis | ~44 minutes |
| Coverage analysis | ~38 minutes |

**Key observations:**
- These timings are not fixed values. They can change from run to run based on code changes, cache hits/misses, runner load, and workflow configuration.
- CodeQL is very slow (~6 hours) because it traces all compilations, builds a security database, and runs queries. Running it on every commit is not practical.
- Clang-tidy and coverage analysis are moderately fast but still too slow for every PR commit.
- The manual post-review trigger is the best fit — runs once when it matters, does not slow down normal development.
- These are from one run. We need more runs to confirm average times.
- Future work: CodeQL runtime can be reduced by scoping targets and tuning cache settings.

---

## Findings Fix Policy

When a check finds an issue, this policy applies:

| Severity | Fix Deadline |
|----------|-------------|
| Critical (e.g. CodeQL security finding) | Must be fixed before merge |
| High | Owner assigned, fixed within 1–2 working days |
| Medium / Low | Added to backlog with owner and due date |

**Rules:**
- Every finding must have an assigned owner.
- Nightly failures are reviewed at the start of the next working day.
- If a critical finding is found after a merge, a fix PR must be raised immediately.

---

## Frequently Asked Questions

**Can we reduce CodeQL runtime?**
Yes. We can limit which targets CodeQL scans, improve caching, and run it only at controlled points (manual post-review or nightly).

**Should heavy checks block merge?**
Fast checks block merge on every PR. Heavy checks run manually after review or nightly. Critical findings from heavy checks must be fixed before the next merge.

**How does this affect developers?**
PR feedback stays fast (5–10 minutes). Heavy checks only run once after review is complete, not on every commit.

**How do we handle findings?**
Critical findings block merge. High severity fixed in 1–2 days. Medium and low tracked in backlog with owner and due date.

**How do we know the approach is working?**
We track per-job duration, pass/fail results, and findings count over time. We review whether quality improves and whether developer wait time is acceptable.

**What do you recommend right now?**
Start with the hybrid model. Run heavy checks manually after review and nightly. Focus first on reducing CodeQL runtime since it is the biggest bottleneck at ~6 hours.

---

## What Is Already in S-CORE

| Item | Status |
|------|--------|
| PR build and test checks | Present |
| Formatting checks | Present |
| Coverage and sanitizer workflows | Present but not PR-blocking |
| Dedicated CodeQL workflow | Missing |
| Nightly scheduled quality workflow | Missing |
| KPI export to shared dashboard | Missing |

---

## Implementation Plan

### Phase 1 — Start Heavy Quality Jobs
- [ ] Create nightly workflow for CodeQL and full clang-tidy
- [ ] Create nightly workflow for sanitizers (TSAN, ASAN, LSAN)
- [ ] Add manual trigger option for post-review runs
- [ ] Export coverage result from nightly job
- [ ] Define fix-time rules for failures

### Phase 2 — Add Shared Dashboard
- [ ] Agree on which numbers to show (coverage, warnings, test pass rate, defects)
- [ ] Choose dashboard tool (see options below)
- [ ] Push nightly results to dashboard
- [ ] Share dashboard with S-CORE teams

---

## Dashboard Tool Options

| Option | Cost | Best For |
|--------|------|---------|
| **Grafana** | Free (open source) | Trend charts, team dashboards |
| **Sphinx (Static Site Dashboard)** | Free (open source) | Publishing KPI summary pages from CI artifacts in a documentation-style site |
| **Static HTML Dashboard** | Free | Simple KPI reports, low setup effort |
| **GitHub Pages (with Static HTML/Sphinx)** | Free for public repositories | Hosting a permanent dashboard URL directly from CI output |
| **MkDocs + Material Theme** | Free (open source) | Clean documentation-style KPI dashboard with easy navigation |
| **SonarQube Community** | Free (self-hosted) | Code quality and coverage reporting |
| **Power BI** | Free desktop, paid cloud | Good visualization if company license exists |

---

## Recommendation

Use **Option 3: Combined Approach (Hybrid Model)**.

**Why:**
- Fast PR checks keep developer feedback under 10 minutes.
- Heavy checks run manually after review or nightly — not on every commit.
- CI enforces the most critical rules (no compiler warnings, no critical CodeQL findings).
- Dashboard tracks broader trends (coverage progress, quality over time).
- This approach is practical and can be rolled out step by step.

**Next Steps:**

1. Agree on the final list of KPIs to track and enforce.
2. Decide which KPIs block merge in CI and which are tracked in the dashboard only.
3. Run a short pilot: nightly CodeQL + coverage.
4. Review results with the team.
5. Choose a dashboard tool and start rollout.