Skip to content

Conversation

@addisonbeck
Copy link
Contributor

@addisonbeck addisonbeck commented Oct 28, 2025

🎟️ Tracking

Jira: https://bitwarden.atlassian.net/browse/PM-22218
Tech breakdown: https://bitwarden.atlassian.net/wiki/x/HIApfg
Calling workflow in the SDK: bitwarden/sdk-internal#538

📔 Objective

This PR introduces a GitHub Actions workflow that automatically detects TypeScript breaking changes when the SDK repository is updated. The workflow addresses the problem where SDK breaking changes are only discovered when clients attempt to update their SDK version, rather than during SDK development.

What this workflow does:

  • Triggers automatically via repository_dispatch events from the SDK repository when SDK PRs are created/updated
  • Downloads new SDK build artifacts using secure GitHub App authentication
  • Installs the SDK artifacts locally and runs the existing npm run test:types command
  • Reports success/failure status back to the SDK repository via workflow exit codes
  • Provides immediate feedback to SDK developers about potential breaking changes

Technical implementation:

  • Uses GitHub App (BW-GHAPP) authentication with Azure Key Vault for secure cross-repository access
  • Integrates with existing TypeScript type checking infrastructure (test:types)
  • Includes comprehensive error handling and status reporting
  • Designed for cross-repository workflow coordination using gh run watch

This enables SDK developers to catch breaking changes during development rather than discovering them later during client integration.

📸 Screenshots

I temporarily hardcoded some values in pointing to a build of sdk-internal that I had intentially broken, and here is a screenshot of it failing with expected errors

Screenshot 2025-10-30 at 2 41 17 PM

The job passes pointed at a build of the sdk without breaking changes in it. Other runs can be found here but there is some debugging noise. This will be better tested once merged and available to the calling sdk job.

⏰ Reminders before review

  • Contributor guidelines followed
  • All formatters and local linters executed and passed
  • Written new unit and / or integration tests where applicable
  • Protected functional changes with optionality (feature flags)
  • Used internationalization (i18n) for all UI strings
  • CI builds passed
  • Communicated to DevOps any deployment requirements
  • Updated any necessary documentation (Confluence, contributing docs) or informed the documentation team

🦮 Reviewer guidelines

  • 👍 (:+1:) or similar for great changes
  • 📝 (:memo:) or ℹ️ (:information_source:) for notes or general info
  • ❓ (:question:) for questions
  • 🤔 (:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion
  • 🎨 (:art:) for suggestions / improvements
  • ❌ (:x:) or ⚠️ (:warning:) for more significant problems or concerns needing attention
  • 🌱 (:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt
  • ⛏ (:pick:) for minor or nitpick changes

@github-actions
Copy link
Contributor

github-actions bot commented Oct 28, 2025

Logo
Checkmarx One – Scan Summary & Details41a1adb8-7d11-4aed-81af-b5e70748a869

Great job! No new security vulnerabilities introduced in this pull request

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.44%. Comparing base (2ff9c23) to head (999a4d0).
⚠️ Report is 76 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17075      +/-   ##
==========================================
+ Coverage   40.17%   40.44%   +0.27%     
==========================================
  Files        3486     3513      +27     
  Lines       99633   100344     +711     
  Branches    14911    15030     +119     
==========================================
+ Hits        40031    40588     +557     
- Misses      57909    58045     +136     
- Partials     1693     1711      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@addisonbeck addisonbeck force-pushed the sdk-warn-on-breaking-changes branch 4 times, most recently from 3c9b984 to 5a5c746 Compare October 30, 2025 17:51
@addisonbeck addisonbeck changed the title add hook for the sdk ci: add SDK breaking change detection workflow Oct 30, 2025
Introduces GitHub Actions workflow that detects TypeScript breaking changes when SDK artifacts are updated. Workflow is triggered via repository_dispatch from SDK repository and runs npm test:types with newly built SDK artifacts.

The workflow downloads SDK build artifacts, installs them locally, and executes the existing TypeScript type checking process. Exit codes determine success/failure for SDK repository monitoring via gh run watch.

Addresses issue where breaking changes in SDK are discovered only when clients attempt SDK version updates, rather than during SDK development.
@addisonbeck addisonbeck force-pushed the sdk-warn-on-breaking-changes branch from 5a5c746 to 5ee5a4f Compare October 30, 2025 18:39
@addisonbeck addisonbeck marked this pull request as ready for review October 30, 2025 18:50
@claude
Copy link

claude bot commented Oct 30, 2025

Claude finished @addisonbeck's task —— View job


Code Review: SDK Breaking Change Detection Workflow

Summary of Changes Since Last Review

The author has addressed all of @coroiu's review feedback with the following improvements:

  • ✅ Removed unnecessary CI runner cleanup (commit 6ae3017)
  • ✅ Switched to npm link instead of manual copy/install (commit 218dab6)
  • ✅ Switched to bitwarden/gh-actions/download-artifacts custom action (commit 218dab6)
  • ✅ Added underscore prefixes to job-level env vars (commit cbc33b4)
  • ✅ Added retry logic for npm ci with error handling (commit 6b2d10d)

The workflow introduces a repository_dispatch triggered workflow that downloads SDK artifacts, installs them via npm link, and runs TypeScript type checking to detect breaking changes.


Critical Issues

1. Missing newline at end of file

Severity: Low
Location: .github/workflows/sdk-breaking-change-check.yml:167

The workflow file is missing a trailing newline, which violates standard text file conventions and may cause issues with some tools.

Fix: Add a newline at the end of the file.


Suggested Improvements

1. Inconsistent step structure ordering

Location: .github/workflows/sdk-breaking-change-check.yml:27-47

The Azure login/logout steps are split around the GitHub App token generation step. This creates an awkward flow where Azure is logged in, then GitHub App credentials are generated, then Azure is logged out - all before the main work begins.

Suggestion: Consider whether the Azure logout could be combined with the login step in a post-action, or add a comment explaining why the logout happens before checkout.

Current structure
- name: Log in to Azure
- name: Get Azure Key Vault secrets
- name: Generate GH App token
- name: Log out from Azure  # ← Logout before main work starts
- name: Validate inputs
- name: Check out clients repository

2. Retry logic discussion point

Location: .github/workflows/sdk-breaking-change-check.yml:82-104

As @coroiu noted, retry logic for npm commands is not standard practice in this repository. While the author mentioned it's good practice for transient network failures, this PR introduces a pattern that doesn't exist elsewhere in the codebase.

Discussion points:

  • If retries are valuable here, should they be added to other workflows?
  • If not valuable enough to add elsewhere, should they be removed for consistency?
  • The retry logic adds 23 lines of shell script complexity

Current practice: All other workflows in the repository use simple npm ci without retries (test.yml:51, build-desktop.yml:204, etc.)

3. Hardcoded timeout value

Location: .github/workflows/sdk-breaking-change-check.yml:154

The timeout 10m command uses a hardcoded value that's buried in a shell script. This makes it less discoverable and harder to adjust.

Suggestion: Extract to a job-level environment variable:

env:
  _TYPE_CHECK_TIMEOUT_MINUTES: 10

Then use: timeout ${_TYPE_CHECK_TIMEOUT_MINUTES}m npm run test:types

4. Potential false negatives from timeout command

Location: .github/workflows/sdk-breaking-change-check.yml:154

The timeout command exits with status 124 when a timeout occurs, which would be caught by the else branch as a type check failure. However, this conflates "breaking changes detected" with "type checking took too long."

Suggestion: Add explicit handling for timeout exit code:

if timeout 10m npm run test:types; then
  # success case
elif [ $? -eq 124 ]; then
  echo "::error::Type checking timed out after 10 minutes"
  echo "⏱️ **Result**: Type checking timed out" >> "$GITHUB_STEP_SUMMARY"
  exit 1
else
  # breaking changes detected case
fi

5. CLIENT_LABEL validation is incomplete

Location: .github/workflows/sdk-breaking-change-check.yml:52-60

The validation checks if _CLIENT_LABEL is empty but doesn't fail if it is - it just gets logged. This appears intentional since CLIENT_LABEL is used later but not required for the workflow to function. However, the validation message says "Missing required client_payload fields" which is misleading.

Suggestion: Either make CLIENT_LABEL truly required in the validation, or clarify in the error message which fields are required vs optional.

6. Working directory redundancy

Location: .github/workflows/sdk-breaking-change-check.yml:119

The step specifies working-directory: ./ which is redundant since ./ is the default.

Suggestion: Remove the working-directory line or change it to be meaningful if a different directory is intended.


Security Observations

✅ Good: Secure credential handling

  • Proper use of Azure Key Vault for sensitive credentials
  • GitHub App token properly scoped and not exposed
  • persist-credentials: false on checkout prevents credential leakage
  • Appropriate OIDC permissions (id-token: write)

✅ Good: Input validation

  • Client payload fields are validated before use
  • Proper error messages for debugging

✅ Good: Artifact verification

  • if_no_artifact_found: fail ensures the workflow fails fast if artifacts are missing
  • Artifact source is logged for audit trail

Performance Considerations

Job timeout is reasonable

The 15-minute job timeout (.github/workflows/sdk-breaking-change-check.yml:18) with a 10-minute type check timeout (.github/workflows/sdk-breaking-change-check.yml:154) provides a good balance between allowing sufficient time and preventing runaway jobs.

Efficient use of GitHub Actions cache

The workflow correctly uses actions/setup-node with npm cache enabled (.github/workflows/sdk-breaking-change-check.yml:76-80), which should speed up subsequent runs.


Good Practices Observed

  • Comprehensive logging with emojis for better readability
  • GitHub Step Summary integration for workflow results
  • Proper pinning of action versions with SHA comments
  • Consistent with repository's action version standards (matches test.yml)
  • Good error messages and debugging information
  • Timing measurements for type checking duration

Action Items

Must address:

  1. Add trailing newline to end of workflow file

Recommended:
2. Decide on npm ci retry logic - keep for robustness or remove for consistency with other workflows
3. Add explicit timeout handling to distinguish timeout from type errors
4. Clarify CLIENT_LABEL validation requirements

Optional:
5. Extract timeout value to environment variable
6. Remove redundant working-directory: ./
7. Add comment explaining Azure logout timing or restructure steps


Previous Review Status

All issues raised by @coroiu have been successfully addressed in recent commits. The workflow now follows repository conventions for artifact downloading and SDK overriding, uses proper environment variable naming, and includes appropriate error handling.

The one remaining discussion point from the review is about npm ci retry logic, where the author noted it's good practice but acknowledged it's not standard in this repository.

Add validation step early in the workflow to check for required
client_payload fields and prevent failures from malformed payloads.
This improvement was requested during code review to provide better
error handling and debugging information when the SDK workflow sends
incomplete data.

Validates SOURCE_REPO, SDK_VERSION, ARTIFACTS_RUN_ID, and ARTIFACT_NAME
before proceeding with artifact download and type checking.
Update GitHub Actions to consistent versions used across the clients
repository for better security and compatibility. This change was
requested during code review to align with existing patterns.

- actions/checkout: v4.2.2 → v5.0.0 with specific SHA hash
- actions/setup-node: v4.2.0 → v5.0.0 with specific SHA hash
- actions/create-github-app-token: v2.1.1 → v2.0.3 with specific SHA hash

Uses specific SHA hashes for all actions following repository security standards.
…tion

Wrap npm run test:types with 10-minute timeout to provide faster feedback
when type checking hangs and more predictable workflow behavior. This
improvement was requested during code review to prevent workflows from
running until the 15-minute job timeout.

Provides clearer indication when type checking itself fails versus other
workflow issues, improving debugging experience for developers.
Add CLIENT_LABEL to log messages and GitHub Step Summary output for
better traceability and debugging. This change
 was requested during
code review to make use of the defined CLIENT_LABEL environment
variable that was previously unused.

Improves workflow output clarity by showing which client type
(typescript, mobile, etc.) is being processed.
Implement shell-based retry logic (3 attempts with 5-second delays) for
npm ci command to handle temporary network issues without adding external
dependencies. This improvement was requested during code review to make
the workflow more resilient to transient failures.

Continues with existing npm install approach while adding robustness
for dependency installation in GitHub Actions environment.
Update shell script to use proper variable quoting syntax throughout
(${VARIABLE} instead of $VARIABLE) for better shell scripting practices
and consistency. This change was requested during code review to follow
shell scripting best practices.

While this won't cause problems in practice, it prevents potential
word splitting issues and improves code maintainability.
@addisonbeck addisonbeck force-pushed the sdk-warn-on-breaking-changes branch from e37c5da to 6b2d10d Compare October 30, 2025 20:43
@addisonbeck
Copy link
Contributor Author

I'm going to leave the rest of Claude's comments on the cutting room floor. Many are false positives, and the others are nits. Even the "high" severity one seems a little silly.

@addisonbeck addisonbeck requested review from a team and gitclonebrian and removed request for a team October 30, 2025 21:33
@addisonbeck
Copy link
Contributor Author

@gitclonebrian it would be nice to have a BRE review on this, but not required.

Comment on lines 86 to 102
while [ ${RETRY_COUNT} -lt ${MAX_RETRIES} ]; do
RETRY_COUNT=$((RETRY_COUNT + 1))
echo "🔄 npm ci attempt ${RETRY_COUNT} of ${MAX_RETRIES}..."

if npm ci; then
echo "✅ npm ci successful"
break
else
echo "❌ npm ci attempt ${RETRY_COUNT} failed"
[ ${RETRY_COUNT} -lt ${MAX_RETRIES} ] && sleep 5
fi
done

if [ ${RETRY_COUNT} -eq ${MAX_RETRIES} ]; then
echo "::error::npm ci failed after ${MAX_RETRIES} attempts"
exit 1
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: we don't usually retry npm commands do we? Is there any specific reason for doing it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually do this. I think using retries to protect against transient network failures is generally a good practice. I do sometimes see timeouts from things like npm i. I think we should be doing this more as a practice, but if you think it's not worth it I won't die on this hill and will take this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, leave it 🤷

@addisonbeck addisonbeck requested a review from coroiu October 31, 2025 20:14
@addisonbeck addisonbeck merged commit c1dec40 into main Nov 3, 2025
35 of 37 checks passed
@addisonbeck addisonbeck deleted the sdk-warn-on-breaking-changes branch November 3, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants