Skip to content

Remove UTF-8 BOM from changelog command outputs#3243

Open
lcawl wants to merge 6 commits intomainfrom
bom-fix
Open

Remove UTF-8 BOM from changelog command outputs#3243
lcawl wants to merge 6 commits intomainfrom
bom-fix

Conversation

@lcawl
Copy link
Copy Markdown
Contributor

@lcawl lcawl commented May 4, 2026

Fixes #3191

AI implementation details

The implementation ensures that:

  1. No existing functionality is changed - only BOM handling is added
  2. All outputs are normalized - both fresh generation and file copying operations
  3. Robust testing - comprehensive unit and integration test coverage
  4. Future-proof - any BOM introduced by external tools will be stripped

BOM Detection Utility Created

  • ChangelogUtf8Normalization class in /src/services/Elastic.Changelog/Utilities/
  • StripLeadingUtf8BomChar() - removes BOM character from strings
  • HasUtf8Bom() - detects BOM in byte arrays
  • Proper documentation explaining this is for tooling compatibility

File Writers Updated

All key YAML writing operations now strip BOMs before writing:

  • ChangelogFileWriter - changelog creation
  • ChangelogBundlingService - bundling operations
  • ChangelogBundleAmendService - bundle amendments
  • GitHubReleaseChangelogService - GitHub release processing

File Readers Updated

  • ChangelogPrepareArtifactService - replaced File.Copy with read-normalize-write for YAML files
  • ChangelogCommand init - normalizes content when updating existing changelog.yml

Comprehensive Tests Added

  • ChangelogUtf8NormalizationTests - 12 utility tests covering all scenarios
  • BundleChangelogsTests - BOM handling in bundling operations
  • ChangelogCreationServiceTests - BOM-free creation verification
  • ChangelogPrepareArtifactServiceTests - File copying normalization tests

Verification Complete

  • Confirmed existing YAML files in the repo contain no BOMs
  • All output paths now guarantee UTF-8 without BOM
  • Tests verify byte-level BOM detection and removal

Generative AI disclosure

  1. Did you use a generative AI (GenAI) tool to assist in creating this contribution?
  • Yes
  • No
  1. If you answered "Yes" to the previous question, please specify the tool(s) and model(s) used (e.g., Google Gemini, OpenAI ChatGPT-4, etc.).

Tool(s) and model(s) used: composer-2, claude-4-sonnet-thinking

@lcawl lcawl marked this pull request as ready for review May 4, 2026 15:40
@lcawl lcawl requested a review from a team as a code owner May 4, 2026 15:40
@lcawl lcawl requested a review from cotti May 4, 2026 15:40
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

The PR adds UTF-8 BOM detection/stripping utilities and applies them across changelog workflows. New Utf8TextNormalization and a Changelog-facing forwarder expose BOM constants, StripLeadingUtf8Bom and HasUtf8Bom. Services that read or write changelog YAML (serialization, bundling, file creation, artifact preparation, GitHub release, and docs-builder command) now strip any leading U+FEFF and write UTF-8 without emitting a BOM. Tests were added to validate the normalization utilities and that generated/processed YAML files do not contain a BOM.

Sequence Diagram(s)

sequenceDiagram
  participant Producer as Docs-builder / Generator
  participant Service as Changelog Service
  participant Normalizer as Utf8Normalization Utility
  participant FS as File System
  Producer->>Service: provide YAML content (may include BOM)
  Service->>Normalizer: StripLeadingUtf8Bom(text)
  Normalizer-->>Service: normalized text (no leading U+FEFF)
  Service->>FS: Write file with UTF8Encoding(emitBOM:false)
  FS-->>Service: file persisted (no BOM bytes)
Loading
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: removing UTF-8 BOM from changelog command outputs, which is the core objective of this PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the implementation, testing strategy, and verification performed.
Linked Issues check ✅ Passed The PR fully addresses issue #3191 by implementing BOM removal across all changelog YAML output paths, with comprehensive testing verifying byte-level correctness.
Out of Scope Changes check ✅ Passed All changes are directly scoped to UTF-8 BOM handling in changelog operations; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch bom-fix

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs (1)

378-383: 💤 Low value

Private helper duplicates the shared utility — consider consolidating.

ReleaseNotesSerialization.StripLeadingUtf8BomChar (private) is a local copy of ChangelogUtf8Normalization.StripLeadingUtf8BomChar. Because Elastic.Documentation.Configuration cannot reference Elastic.Changelog.Utilities, this duplication is understandable, but it means the two implementations could diverge.

If the BOM-normalization utility ever needs to handle edge cases (e.g., multiple leading BOMs, or ZERO-WIDTH NO-BREAK SPACE variants), the private copy will need a separate update. Consider moving ChangelogUtf8Normalization (or a minimal StringUtilities base) to Elastic.Documentation.Configuration so it can be shared from a lower-level assembly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs`
around lines 378 - 383, ReleaseNotesSerialization.StripLeadingUtf8BomChar
duplicates ChangelogUtf8Normalization.StripLeadingUtf8BomChar causing
maintenance drift; consolidate by moving the BOM/string normalization helper
into a lower-level/shared assembly consumed by both projects (either relocate
ChangelogUtf8Normalization or create a minimal StringUtilities in
Elastic.Documentation.Configuration) and update ReleaseNotesSerialization to
call the shared method (keep the same behavior but ensure edge cases like
multiple BOMs or related zero-width variants are handled in the single shared
implementation).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs`:
- Around line 378-383: ReleaseNotesSerialization.StripLeadingUtf8BomChar
duplicates ChangelogUtf8Normalization.StripLeadingUtf8BomChar causing
maintenance drift; consolidate by moving the BOM/string normalization helper
into a lower-level/shared assembly consumed by both projects (either relocate
ChangelogUtf8Normalization or create a minimal StringUtilities in
Elastic.Documentation.Configuration) and update ReleaseNotesSerialization to
call the shared method (keep the same behavior but ensure edge cases like
multiple BOMs or related zero-width variants are handled in the single shared
implementation).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2edaba19-33b8-49bc-b017-273766006b75

📥 Commits

Reviewing files that changed from the base of the PR and between 25c5de5 and de3f48a.

📒 Files selected for processing (12)
  • src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs
  • src/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cs
  • src/services/Elastic.Changelog/Bundling/ChangelogBundlingService.cs
  • src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs
  • src/services/Elastic.Changelog/Evaluation/ChangelogPrepareArtifactService.cs
  • src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs
  • src/services/Elastic.Changelog/Utilities/ChangelogUtf8Normalization.cs
  • src/tooling/docs-builder/Commands/ChangelogCommand.cs
  • tests/Elastic.Changelog.Tests/Changelogs/BundleChangelogsTests.cs
  • tests/Elastic.Changelog.Tests/Creation/ChangelogCreationServiceTests.cs
  • tests/Elastic.Changelog.Tests/Evaluation/ChangelogPrepareArtifactServiceTests.cs
  • tests/Elastic.Changelog.Tests/Utilities/ChangelogUtf8NormalizationTests.cs

await _fileSystem.File.WriteAllTextAsync(amendFilePath, yaml, Encoding.UTF8, ctx);
// Strip any leading BOM to ensure clean UTF-8 output for tooling compatibility
var normalizedYaml = ChangelogUtf8Normalization.StripLeadingUtf8BomChar(yaml);
await _fileSystem.File.WriteAllBytesAsync(amendFilePath, Encoding.UTF8.GetBytes(normalizedYaml), ctx);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can simply do WriteAllText with new UTF8Encoding(false) instead of Encoding.UTF8.

That way we allocate less strings, see also: https://danielwertheim.se/utf-8-bom-adventures-in-c/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

460f419 is my attempt to address this comment as well as #3243 (review)

The AI summary for that change is as follows:

Summary of Changes in 460f419

This commit addresses PR review feedback by consolidating BOM helper duplication and optimizing UTF-8 write style:

🔄 Consolidated BOM/String Normalization

  • Created shared utility: New Utf8TextNormalization.cs in Elastic.Documentation/Text/ with enhanced BOM stripping that removes all consecutive leading U+FEFF characters (handles double-BOM scenarios)
  • Updated ReleaseNotesSerialization.cs: Removed private StripLeadingUtf8BomChar method, now uses shared Utf8TextNormalization.StripLeadingUtf8Bom
  • Converted ChangelogUtf8Normalization.cs: Now serves as a thin forwarder to the shared implementation, maintaining backward compatibility

Optimized UTF-8 File Writes

Replaced WriteAllBytesAsync(path, Encoding.UTF8.GetBytes(content)) with WriteAllTextAsync(path, content, Utf8NoBom) in all changelog writers:

  • ChangelogFileWriter.cs
  • ChangelogBundlingService.cs
  • ChangelogBundleAmendService.cs
  • GitHubReleaseChangelogService.cs
  • ChangelogPrepareArtifactService.cs

Each now uses static readonly UTF8Encoding Utf8NoBom = new(encoderShouldEmitUTF8Identifier: false) to avoid per-write byte array allocations.

Enhanced Testing

  • New test file: Utf8TextNormalizationTests.cs with comprehensive shared utility tests
  • Enhanced existing tests: Added consecutive BOM test cases in ChangelogUtf8NormalizationTests.cs
  • Edge case coverage: Tests for multiple BOMs, BOM-only strings, and BOMs in middle/end positions

The changes eliminate code duplication, improve memory efficiency, and provide more robust BOM handling across the codebase.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
src/Elastic.Documentation/Text/Utf8TextNormalization.cs (1)

20-20: ⚡ Quick win

Make Utf8BomBytes immutable to callers.

public static readonly byte[] is still externally mutable (Utf8TextNormalization.Utf8BomBytes[0] = ...). For a shared normalization constant, expose a read-only view instead.

Proposed change
-public static readonly byte[] Utf8BomBytes = [0xEF, 0xBB, 0xBF];
+private static readonly byte[] Utf8BomBytesValue = [0xEF, 0xBB, 0xBF];
+public static ReadOnlySpan<byte> Utf8BomBytes => Utf8BomBytesValue;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/Elastic.Documentation/Text/Utf8TextNormalization.cs` at line 20,
Utf8BomBytes is currently a public mutable byte[]; make the array immutable to
callers by keeping a private static readonly byte[] backing field (e.g.
_utf8BomBytes with the same byte values 0xEF,0xBB,0xBF) and replace the public
symbol Utf8BomBytes with a read-only view such as a public static
ReadOnlySpan<byte> Utf8BomBytes => _utf8BomBytes (or public static
ReadOnlyMemory<byte> if preferred); this prevents external mutation while
preserving the constant bytes.
src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs (1)

312-312: ⚡ Quick win

Add ConfigureAwait(false) to the async file write.

Line 312 introduces a new library async I/O await without ConfigureAwait(false), which can cause unnecessary context capture.

Suggested patch
-		await _fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx);
+		await _fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx).ConfigureAwait(false);

As per coding guidelines: "Use ConfigureAwait(false) in library code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs`
at line 312, The await call to _fileSystem.File.WriteAllTextAsync(filePath,
normalizedContent, Utf8NoBom, ctx) captures the synchronization context; update
the await in GitHubReleaseChangelogService (method containing this call) to
append ConfigureAwait(false) so the Task does not capture context (i.e., await
_fileSystem.File.WriteAllTextAsync(...).ConfigureAwait(false)); keep the same
arguments (filePath, normalizedContent, Utf8NoBom, ctx) and preserve existing
error handling/flow.
src/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cs (1)

267-267: ⚡ Quick win

Use ConfigureAwait(false) on this new await.

Line 267 is a library async write and should avoid context capture.

Suggested patch
-			await _fileSystem.File.WriteAllTextAsync(amendFilePath, normalizedYaml, Utf8NoBom, ctx);
+			await _fileSystem.File.WriteAllTextAsync(amendFilePath, normalizedYaml, Utf8NoBom, ctx).ConfigureAwait(false);

As per coding guidelines: "Use ConfigureAwait(false) in library code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cs` at
line 267, The await in ChangelogBundleAmendService (the call to
_fileSystem.File.WriteAllTextAsync(amendFilePath, normalizedYaml, Utf8NoBom,
ctx)) is capturing the synchronization context; change the await to use
ConfigureAwait(false) to avoid context capture in library code (i.e., await
_fileSystem.File.WriteAllTextAsync(..., ctx).ConfigureAwait(false)) so the write
operates without capturing the caller context.
src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs (1)

56-56: ⚡ Quick win

Apply ConfigureAwait(false) for this library async I/O call.

Line 56 should use ConfigureAwait(false) to follow library async guidance.

Suggested patch
-		await fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx);
+		await fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx).ConfigureAwait(false);

As per coding guidelines: "Use ConfigureAwait(false) in library code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs` at line 56,
In ChangelogFileWriter (the async method that calls
fileSystem.File.WriteAllTextAsync), append ConfigureAwait(false) to the awaited
I/O call: change the await fileSystem.File.WriteAllTextAsync(filePath,
normalizedContent, Utf8NoBom, ctx) usage so it calls .ConfigureAwait(false) on
the Task (keep the same parameters: filePath, normalizedContent, Utf8NoBom, ctx)
to avoid capturing the synchronization context in this library code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/Elastic.Documentation/Text/Utf8TextNormalization.cs`:
- Line 20: Utf8BomBytes is currently a public mutable byte[]; make the array
immutable to callers by keeping a private static readonly byte[] backing field
(e.g. _utf8BomBytes with the same byte values 0xEF,0xBB,0xBF) and replace the
public symbol Utf8BomBytes with a read-only view such as a public static
ReadOnlySpan<byte> Utf8BomBytes => _utf8BomBytes (or public static
ReadOnlyMemory<byte> if preferred); this prevents external mutation while
preserving the constant bytes.

In `@src/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cs`:
- Line 267: The await in ChangelogBundleAmendService (the call to
_fileSystem.File.WriteAllTextAsync(amendFilePath, normalizedYaml, Utf8NoBom,
ctx)) is capturing the synchronization context; change the await to use
ConfigureAwait(false) to avoid context capture in library code (i.e., await
_fileSystem.File.WriteAllTextAsync(..., ctx).ConfigureAwait(false)) so the write
operates without capturing the caller context.

In `@src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs`:
- Line 56: In ChangelogFileWriter (the async method that calls
fileSystem.File.WriteAllTextAsync), append ConfigureAwait(false) to the awaited
I/O call: change the await fileSystem.File.WriteAllTextAsync(filePath,
normalizedContent, Utf8NoBom, ctx) usage so it calls .ConfigureAwait(false) on
the Task (keep the same parameters: filePath, normalizedContent, Utf8NoBom, ctx)
to avoid capturing the synchronization context in this library code.

In
`@src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs`:
- Line 312: The await call to _fileSystem.File.WriteAllTextAsync(filePath,
normalizedContent, Utf8NoBom, ctx) captures the synchronization context; update
the await in GitHubReleaseChangelogService (method containing this call) to
append ConfigureAwait(false) so the Task does not capture context (i.e., await
_fileSystem.File.WriteAllTextAsync(...).ConfigureAwait(false)); keep the same
arguments (filePath, normalizedContent, Utf8NoBom, ctx) and preserve existing
error handling/flow.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b11b7d00-8826-404a-96d5-625803a67b3c

📥 Commits

Reviewing files that changed from the base of the PR and between de3f48a and 460f419.

📒 Files selected for processing (11)
  • src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs
  • src/Elastic.Documentation/Text/Utf8TextNormalization.cs
  • src/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cs
  • src/services/Elastic.Changelog/Bundling/ChangelogBundlingService.cs
  • src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs
  • src/services/Elastic.Changelog/Evaluation/ChangelogPrepareArtifactService.cs
  • src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs
  • src/services/Elastic.Changelog/Utilities/ChangelogUtf8Normalization.cs
  • src/tooling/docs-builder/Commands/ChangelogCommand.cs
  • tests/Elastic.Changelog.Tests/Utilities/ChangelogUtf8NormalizationTests.cs
  • tests/Elastic.Documentation.Configuration.Tests/Text/Utf8TextNormalizationTests.cs
✅ Files skipped from review due to trivial changes (1)
  • tests/Elastic.Changelog.Tests/Utilities/ChangelogUtf8NormalizationTests.cs
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs
  • src/services/Elastic.Changelog/Utilities/ChangelogUtf8Normalization.cs
  • src/tooling/docs-builder/Commands/ChangelogCommand.cs
  • src/services/Elastic.Changelog/Evaluation/ChangelogPrepareArtifactService.cs
  • src/services/Elastic.Changelog/Bundling/ChangelogBundlingService.cs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

changelog generated yaml utf8 BOM

2 participants