Skip to content

CLI-627 Analyze change sets as multi-file requests with chunking#446

Draft
nquinquenel wants to merge 1 commit into
masterfrom
feature/nq/CLI-627-sqaa-multi-file-ux
Draft

CLI-627 Analyze change sets as multi-file requests with chunking#446
nquinquenel wants to merge 1 commit into
masterfrom
feature/nq/CLI-627-sqaa-multi-file-ux

Conversation

@nquinquenel

@nquinquenel nquinquenel commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary by Gitar

  • Core Analysis Engine:
    • Switched from concurrent worker-pool to a sequential chunked execution engine for SQAA analysis.
    • Added chunking logic to aggregate change-set files based on SQAA_MAX_FILE_BYTES and SQAA_MAX_FILES_PER_REQUEST limits.
    • Implemented fetchChunkWith413Split to handle API 413 Request Payload Too Large errors by automatically splitting chunks.
  • Progress & Reporting:
    • Replaced file-by-file progress with an animated live status line for better UX during chunked requests.
    • Consolidated text reports with printSqaaTextReport, including path collapsing for clean files (SQAA_COLLAPSE_CLEAN_THRESHOLD).
  • API Integration:
    • Updated SonarQubeClient to handle 413 errors and implemented DEEP analysis depth for all SQAA requests.
  • Hooks & Context:
    • Refactored format-sqaa-hook-context.ts to use shared reporting logic for IDE-style issue display and summary footers.

This will update automatically on new commits.

@netlify

netlify Bot commented Jun 12, 2026

Copy link
Copy Markdown

Deploy Preview for sonarqube-cli canceled.

Name Link
🔨 Latest commit 80aedde
🔍 Latest deploy log https://app.netlify.com/projects/sonarqube-cli/deploys/6a2c0c9b2e10080008da4dcf

@hashicorp-vault-sonar-prod

hashicorp-vault-sonar-prod Bot commented Jun 12, 2026

Copy link
Copy Markdown

CLI-627

Comment on lines +94 to +108
for (const item of items) {
const candidate = [...current, item];
const bytes = estimateRequestBytes({
organizationKey: '',
projectKey: '',
files: candidate.map(toAnalysisFile),
});

if (current.length > 0 && (bytes > maxRequestBytes || candidate.length > maxFilesPerRequest)) {
flush();
current = [item];
continue;
}

current = candidate;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance: packFilesIntoChunks re-serializes whole candidate per file (O(n²))

For every file added to the current chunk, packFilesIntoChunks builds candidate = [...current, item] and calls estimateRequestBytes, which runs JSON.stringify over the entire accumulated candidate array (and Buffer.byteLength over the whole serialized string). This makes packing O(n²) in the number of files per chunk and, worse, O(n²) in total content bytes — each of up to SQAA_MAX_FILES_PER_REQUEST (200) files triggers a full re-serialization of all previously packed file contents. For change sets with many sizeable files this repeatedly stringifies multi-megabyte payloads on the hot path of every multi-file analysis.

Track the byte size incrementally instead: compute a per-file byte cost once and a fixed envelope overhead, accumulating a running total rather than re-stringifying the growing array each iteration.

Accumulate byte size per file instead of re-serializing the entire candidate array each iteration.:

// Precompute per-file byte cost once, accumulate incrementally.
function fileBytes(file: SqaaChunkFile): number {
  return Buffer.byteLength(JSON.stringify(toAnalysisFile(file)), 'utf8');
}

// In packFilesIntoChunks:
let currentBytes = ENVELOPE_BYTES; // fixed overhead of the request wrapper
for (const item of items) {
  const itemBytes = fileBytes(item) + 1; // +1 for array comma separator
  if (current.length > 0 && (currentBytes + itemBytes > maxRequestBytes || current.length + 1 > maxFilesPerRequest)) {
    flush();
    currentBytes = ENVELOPE_BYTES;
  }
  current.push(item);
  currentBytes += itemBytes;
}
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

Comment on lines +259 to +273
export function distributeChunkResponse(
issues: SqaaIssue[],
errors: Array<{ code: string; message: string }> | null | undefined,
chunkPaths: ChunkPath[],
): FileSuccess[] {
const issuesByPath = new Map<string, SqaaIssue[]>();
for (const issue of issues) {
const path = issue.filePath ?? chunkPaths[0]?.filePath;
if (!path) continue;
const list = issuesByPath.get(path) ?? [];
list.push(issue);
issuesByPath.set(path, list);
}

return chunkPaths.map(({ file, filePath }, index) => ({

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: distributeChunkResponse silently drops issues with unmatched filePath

distributeChunkResponse groups issues by issue.filePath ?? chunkPaths[0]?.filePath, then rebuilds results by reading issuesByPath.get(filePath) for each chunk path. If the server returns an issue whose filePath does not exactly match any of the relative paths sent for the chunk (e.g. a different path normalization, a leading ./, or an absolute path), that issue is stored in the map under an unmatched key and is never read back — it is silently dropped and never counted in the tally. This would under-report issues to the user with no error or warning. The fake test server echoes the exact sent path so tests pass, but a real backend may not.

Consider attributing issues with an unrecognized filePath to the first file of the chunk (as already done for issues with no filePath) so they are never lost, or at minimum logging a debug warning when an issue cannot be matched.

Fall back to the first chunk file when an issue's filePath doesn't match any sent path, so issues are never dropped.:

const knownPaths = new Set(chunkPaths.map((p) => p.filePath));
for (const issue of issues) {
  const path =
    issue.filePath && knownPaths.has(issue.filePath)
      ? issue.filePath
      : chunkPaths[0]?.filePath;
  if (!path) continue;
  const list = issuesByPath.get(path) ?? [];
  list.push(issue);
  issuesByPath.set(path, list);
}
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

Comment on lines +101 to +111
function prepareChunks(ctx: RunContext): { chunks: SqaaChunk[]; chunkFileIndices: number[][] } {
const chunkFiles: SqaaChunkFile[] = ctx.files.map((file, idx) => ({
absolutePath: file,
relativePath: ctx.allPaths[idx],
content: readSqaaFileContent(file),
}));
const chunks = packFilesIntoChunks(chunkFiles);
const chunkFileIndices = chunks.map((chunk) =>
chunk.files.map((f) => ctx.files.indexOf(f.absolutePath)),
);
return { chunks, chunkFileIndices };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Performance: Repeated ctx.files.indexOf inside loops is O(n²)

prepareChunks builds chunkFileIndices via chunk.files.map((f) => ctx.files.indexOf(f.absolutePath)), and processChunk again calls ctx.files.indexOf(chunkFile.absolutePath) for every file in every part and every failed file. Each indexOf is a linear scan, so across a full change set this is O(n²) in the number of files. For large change sets this duplicates the per-file cost. Build a single Map<absolutePath, index> once and reuse it everywhere indices are resolved. (Also note: if indexOf ever returns -1, ctx.allPaths[-1] is undefined and progress.update(-1, ...) writes to a negative index — a Map lookup with an explicit guard would make this safer.)

Precompute an absolutePath→index Map and reuse it instead of repeated linear indexOf scans.:

// Build once, e.g. in RunContext setup or prepareChunks:
const fileIndexByAbsolutePath = new Map(ctx.files.map((f, i) => [f, i]));
// Then replace ctx.files.indexOf(absPath) with:
const idx = fileIndexByAbsolutePath.get(absPath);
if (idx === undefined) continue; // guard against unmatched paths
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

@gitar-bot

gitar-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown
Code Review ⚠️ Changes requested 0 resolved / 3 findings

Implements chunked multi-file request processing for SQAA analysis, but introduces O(n²) performance bottlenecks in file serialization and indexing while silently dropping unmatched issues.

⚠️ Performance: packFilesIntoChunks re-serializes whole candidate per file (O(n²))

📄 src/cli/commands/analyze/sqaa-chunking.ts:94-108 📄 src/cli/commands/analyze/sqaa-chunking.ts:65-67

For every file added to the current chunk, packFilesIntoChunks builds candidate = [...current, item] and calls estimateRequestBytes, which runs JSON.stringify over the entire accumulated candidate array (and Buffer.byteLength over the whole serialized string). This makes packing O(n²) in the number of files per chunk and, worse, O(n²) in total content bytes — each of up to SQAA_MAX_FILES_PER_REQUEST (200) files triggers a full re-serialization of all previously packed file contents. For change sets with many sizeable files this repeatedly stringifies multi-megabyte payloads on the hot path of every multi-file analysis.

Track the byte size incrementally instead: compute a per-file byte cost once and a fixed envelope overhead, accumulating a running total rather than re-stringifying the growing array each iteration.

Accumulate byte size per file instead of re-serializing the entire candidate array each iteration.
// Precompute per-file byte cost once, accumulate incrementally.
function fileBytes(file: SqaaChunkFile): number {
  return Buffer.byteLength(JSON.stringify(toAnalysisFile(file)), 'utf8');
}

// In packFilesIntoChunks:
let currentBytes = ENVELOPE_BYTES; // fixed overhead of the request wrapper
for (const item of items) {
  const itemBytes = fileBytes(item) + 1; // +1 for array comma separator
  if (current.length > 0 && (currentBytes + itemBytes > maxRequestBytes || current.length + 1 > maxFilesPerRequest)) {
    flush();
    currentBytes = ENVELOPE_BYTES;
  }
  current.push(item);
  currentBytes += itemBytes;
}
💡 Edge Case: distributeChunkResponse silently drops issues with unmatched filePath

📄 src/cli/commands/analyze/sqaa-analysis.ts:259-273

distributeChunkResponse groups issues by issue.filePath ?? chunkPaths[0]?.filePath, then rebuilds results by reading issuesByPath.get(filePath) for each chunk path. If the server returns an issue whose filePath does not exactly match any of the relative paths sent for the chunk (e.g. a different path normalization, a leading ./, or an absolute path), that issue is stored in the map under an unmatched key and is never read back — it is silently dropped and never counted in the tally. This would under-report issues to the user with no error or warning. The fake test server echoes the exact sent path so tests pass, but a real backend may not.

Consider attributing issues with an unrecognized filePath to the first file of the chunk (as already done for issues with no filePath) so they are never lost, or at minimum logging a debug warning when an issue cannot be matched.

Fall back to the first chunk file when an issue's filePath doesn't match any sent path, so issues are never dropped.
const knownPaths = new Set(chunkPaths.map((p) => p.filePath));
for (const issue of issues) {
  const path =
    issue.filePath && knownPaths.has(issue.filePath)
      ? issue.filePath
      : chunkPaths[0]?.filePath;
  if (!path) continue;
  const list = issuesByPath.get(path) ?? [];
  list.push(issue);
  issuesByPath.set(path, list);
}
💡 Performance: Repeated ctx.files.indexOf inside loops is O(n²)

📄 src/cli/commands/analyze/sqaa-analysis.ts:101-111 📄 src/cli/commands/analyze/sqaa-analysis.ts:151-165

prepareChunks builds chunkFileIndices via chunk.files.map((f) => ctx.files.indexOf(f.absolutePath)), and processChunk again calls ctx.files.indexOf(chunkFile.absolutePath) for every file in every part and every failed file. Each indexOf is a linear scan, so across a full change set this is O(n²) in the number of files. For large change sets this duplicates the per-file cost. Build a single Map<absolutePath, index> once and reuse it everywhere indices are resolved. (Also note: if indexOf ever returns -1, ctx.allPaths[-1] is undefined and progress.update(-1, ...) writes to a negative index — a Map lookup with an explicit guard would make this safer.)

Precompute an absolutePath→index Map and reuse it instead of repeated linear indexOf scans.
// Build once, e.g. in RunContext setup or prepareChunks:
const fileIndexByAbsolutePath = new Map(ctx.files.map((f, i) => [f, i]));
// Then replace ctx.files.indexOf(absPath) with:
const idx = fileIndexByAbsolutePath.get(absPath);
if (idx === undefined) continue; // guard against unmatched paths
🤖 Prompt for agents
Code Review: Implements chunked multi-file request processing for SQAA analysis, but introduces O(n²) performance bottlenecks in file serialization and indexing while silently dropping unmatched issues.

1. ⚠️ Performance: packFilesIntoChunks re-serializes whole candidate per file (O(n²))
   Files: src/cli/commands/analyze/sqaa-chunking.ts:94-108, src/cli/commands/analyze/sqaa-chunking.ts:65-67

   For every file added to the current chunk, `packFilesIntoChunks` builds `candidate = [...current, item]` and calls `estimateRequestBytes`, which runs `JSON.stringify` over the *entire* accumulated candidate array (and `Buffer.byteLength` over the whole serialized string). This makes packing O(n²) in the number of files per chunk and, worse, O(n²) in total content bytes — each of up to `SQAA_MAX_FILES_PER_REQUEST` (200) files triggers a full re-serialization of all previously packed file contents. For change sets with many sizeable files this repeatedly stringifies multi-megabyte payloads on the hot path of every multi-file analysis.
   
   Track the byte size incrementally instead: compute a per-file byte cost once and a fixed envelope overhead, accumulating a running total rather than re-stringifying the growing array each iteration.

   Fix (Accumulate byte size per file instead of re-serializing the entire candidate array each iteration.):
   // Precompute per-file byte cost once, accumulate incrementally.
   function fileBytes(file: SqaaChunkFile): number {
     return Buffer.byteLength(JSON.stringify(toAnalysisFile(file)), 'utf8');
   }
   
   // In packFilesIntoChunks:
   let currentBytes = ENVELOPE_BYTES; // fixed overhead of the request wrapper
   for (const item of items) {
     const itemBytes = fileBytes(item) + 1; // +1 for array comma separator
     if (current.length > 0 && (currentBytes + itemBytes > maxRequestBytes || current.length + 1 > maxFilesPerRequest)) {
       flush();
       currentBytes = ENVELOPE_BYTES;
     }
     current.push(item);
     currentBytes += itemBytes;
   }

2. 💡 Edge Case: distributeChunkResponse silently drops issues with unmatched filePath
   Files: src/cli/commands/analyze/sqaa-analysis.ts:259-273

   `distributeChunkResponse` groups issues by `issue.filePath ?? chunkPaths[0]?.filePath`, then rebuilds results by reading `issuesByPath.get(filePath)` for each chunk path. If the server returns an issue whose `filePath` does not exactly match any of the relative paths sent for the chunk (e.g. a different path normalization, a leading `./`, or an absolute path), that issue is stored in the map under an unmatched key and is never read back — it is silently dropped and never counted in the tally. This would under-report issues to the user with no error or warning. The fake test server echoes the exact sent path so tests pass, but a real backend may not.
   
   Consider attributing issues with an unrecognized `filePath` to the first file of the chunk (as already done for issues with no `filePath`) so they are never lost, or at minimum logging a debug warning when an issue cannot be matched.

   Fix (Fall back to the first chunk file when an issue's filePath doesn't match any sent path, so issues are never dropped.):
   const knownPaths = new Set(chunkPaths.map((p) => p.filePath));
   for (const issue of issues) {
     const path =
       issue.filePath && knownPaths.has(issue.filePath)
         ? issue.filePath
         : chunkPaths[0]?.filePath;
     if (!path) continue;
     const list = issuesByPath.get(path) ?? [];
     list.push(issue);
     issuesByPath.set(path, list);
   }

3. 💡 Performance: Repeated ctx.files.indexOf inside loops is O(n²)
   Files: src/cli/commands/analyze/sqaa-analysis.ts:101-111, src/cli/commands/analyze/sqaa-analysis.ts:151-165

   `prepareChunks` builds `chunkFileIndices` via `chunk.files.map((f) => ctx.files.indexOf(f.absolutePath))`, and `processChunk` again calls `ctx.files.indexOf(chunkFile.absolutePath)` for every file in every part and every failed file. Each `indexOf` is a linear scan, so across a full change set this is O(n²) in the number of files. For large change sets this duplicates the per-file cost. Build a single `Map<absolutePath, index>` once and reuse it everywhere indices are resolved. (Also note: if `indexOf` ever returns -1, `ctx.allPaths[-1]` is `undefined` and `progress.update(-1, ...)` writes to a negative index — a Map lookup with an explicit guard would make this safer.)

   Fix (Precompute an absolutePath→index Map and reuse it instead of repeated linear indexOf scans.):
   // Build once, e.g. in RunContext setup or prepareChunks:
   const fileIndexByAbsolutePath = new Map(ctx.files.map((f, i) => [f, i]));
   // Then replace ctx.files.indexOf(absPath) with:
   const idx = fileIndexByAbsolutePath.get(absPath);
   if (idx === undefined) continue; // guard against unmatched paths

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.
Unblock → Override a blocking verdict and allow merging.

Comment with these commands to change:

Auto-apply Compact Unblock
gitar auto-apply:on         
gitar display:verbose         
gitar unblock         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud

sonarqubecloud Bot commented Jun 12, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
5 New issues

🛠️ Remediation Agent ready

  • Fix automatically
    Creates a separate PR with fixes for eligible issues

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant