Skip to content

feat(config): add yaml: and http: URI providers#2696

Open
mateenali66 wants to merge 5 commits intoopen-telemetry:mainfrom
mateenali66:feat/2598-yaml-http-providers
Open

feat(config): add yaml: and http: URI providers#2696
mateenali66 wants to merge 5 commits intoopen-telemetry:mainfrom
mateenali66:feat/2598-yaml-http-providers

Conversation

@mateenali66
Copy link
Copy Markdown
Contributor

Fixes #2598. Phase 2 follow-up to #2529 (which added the ConfigProvider trait and the file:/env: providers).

What this adds

yaml: provider — inline YAML on the CLI following the OTel Collector convention. :: expands into nested YAML, and the segment after the final :: is treated as a trailing YAML fragment (typically key: value). No :: means the content is passed through as literal YAML.

yaml:version: otel_dataflow/v1            ->  version: otel_dataflow/v1
yaml:exporters::debug::verbosity: detailed ->  exporters:
                                                 debug:
                                                   verbosity: detailed
yaml:engine::{}                            ->  engine:
                                                 {}

http: provider — unauthenticated HTTP GET with a 30s timeout. Response body is treated as YAML unless Content-Type is application/json. Standard HTTP redirects are followed. Uses reqwest::blocking so config resolution can happen at startup without a tokio runtime.

Out of scope (deferred to Phase 3 per the issue)

  • https: and TLS configuration
  • Bearer / mTLS authentication
  • Configurable timeouts
  • Multi-config merge semantics (which is where the :: inline-override example from the issue body really lands)

Tests

  • yaml: provider: literal content, single-level key path, nested key path, flow-style value, round-trip through default_resolver and serde_yaml
  • http: provider: yaml body, JSON content-type detection, non-2xx status error. Uses wiremock as an async mock server with tokio::task::spawn_blocking to drive the blocking reqwest client.
  • parse_scheme already accepts http:// from Phase 1 (test is in feat(config): add URI providers for file: and env: config sources #2529). Added a yaml: case alongside.

Validation

cargo test -p otap-df-config -> 273 tests pass.
cargo clippy -p otap-df-config --all-targets -- -D warnings -> clean.
cargo check -p otap-df-engine -p otap-df-controller -> clean (config crate still builds for its downstream consumers).

cargo xtask quick-check fails on an unrelated pre-existing df_engine main.rs issue: static GLOBAL: Jemalloc = Jemalloc; is used without the same not(clippy) cfg guard that gates the use tikv_jemallocator::Jemalloc; import. Reproduces on upstream main with this branch stashed, so not caused by this PR. Happy to file a separate issue if useful.

Note on rustls

The workspace's reqwest uses the rustls-no-provider feature, so callers must install a crypto provider before any reqwest::blocking::Client is built. Production code already handles this via otap-df-otap/crypto-ring. The config crate's http tests install rustls::crypto::ring::default_provider() lazily via a std::sync::Once, and pull in rustls with the ring feature only as a dev-dependency.

Phase 2 of the config URI provider work from open-telemetry#2529. Two new providers
riding on the existing ConfigProvider trait, plus a new error variant
for HTTP fetch failures.

yaml: inline YAML on the CLI. Follows the OTel Collector convention
where `::` expands into nested YAML and the segment after the final
`::` is a trailing YAML fragment. No `::` means the content is passed
through as literal YAML.

  yaml:version: otel_dataflow/v1
  yaml:exporters::debug::verbosity: detailed  ->
    exporters:
      debug:
        verbosity: detailed

http: unauthenticated HTTP GET with a 30s timeout. Response body is
YAML unless Content-Type is application/json. Standard redirects are
followed. https:, authentication, and configurable timeouts are
deferred to Phase 3 as per the issue.

parse_scheme already accepts http://. default_resolver now registers
all four providers. Tests use wiremock for the http: cases and a
lazy rustls ring provider install since the workspace's reqwest uses
rustls-no-provider.

Fixes open-telemetry#2598

Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
@mateenali66 mateenali66 requested a review from a team as a code owner April 17, 2026 20:40
@github-actions github-actions Bot added the rust Pull requests that update Rust code label Apr 17, 2026
@lquerel
Copy link
Copy Markdown
Contributor

lquerel commented Apr 17, 2026

That was fast!

Can you take into consideration the following comment #2598 (comment) Thanks

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 98.08917% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.17%. Comparing base (495588e) to head (6de5aa4).
⚠️ Report is 106 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##             main    #2696     +/-   ##
=========================================
  Coverage   88.17%   88.17%             
=========================================
  Files         633      636      +3     
  Lines      238646   239931   +1285     
=========================================
+ Hits       210428   211563   +1135     
- Misses      27694    27844    +150     
  Partials      524      524             
Components Coverage Δ
otap-dataflow 89.84% <98.08%> (-0.02%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.74% <ø> (ø)
otel-arrow-go 52.45% <ø> (ø)
quiver 92.27% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add a retry loop around the http: fetch so startup waits out transient
failures (network errors, 404, 5xx) instead of bailing on the first
attempt. Backoff starts at 500ms and doubles up to a 60s cap per sleep.
Default max attempt count is u64::MAX so the provider keeps retrying
until the remote config server becomes available; `with_max_attempts`
bounds this for tests and other constrained callers.

Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
@mateenali66
Copy link
Copy Markdown
Contributor Author

That was fast!

Can you take into consideration the following comment #2598 (comment) Thanks

Thanks @lquerel , faster than the backoff will be. I was already prepared and was looking forward to triage.

exp backoff 500ms doubling to a 60s cap, retries on network errors and any non-2xx, configurable via HttpConfigProvider::with_max_attempts(n), default u64::MAX.

Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
Add three tests to close out the remaining branches in the new http:
provider: Default::default() matches new(), with_max_attempts(0) returns
the empty-attempts error, and a resolve against a closed port surfaces
the network error through ConfigHttpRequestFailed.

Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
Copy link
Copy Markdown
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these quick contributions. I just noted a few minor issues that need to be addressed before merging.

let client = reqwest::blocking::Client::builder()
.timeout(Duration::from_secs(30))
.build()
.expect("reqwest blocking client should always build");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is preferable to use expect only for errors that are not expected. In this specific case, this is user input, so we should use the error-handling mechanism (Result) in order to provide a unified error presentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped the expect. HttpConfigProvider::new and with_max_attempts now return Result<Self, Error> with a new ConfigHttpClientBuildFailed variant, and default_resolver propagates that through.

let scaled = Self::BASE_BACKOFF
.checked_mul(multiplier.try_into().unwrap_or(u32::MAX))
.unwrap_or(Self::MAX_BACKOFF);
scaled.min(Self::MAX_BACKOFF)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be simplified as follow:

    let shift = attempt.min(20) as u32;
    let multiplier = 1u32 << shift;

    Self::BASE_BACKOFF
        .checked_mul(multiplier)
        .unwrap_or(Self::MAX_BACKOFF)
        .min(Self::MAX_BACKOFF)

Explanation:

  • multiplier.try_into().unwrap_or(u32::MAX) is unnecessary in practice because 1u64 << 20 is only 1_048_576, which always fits in u32
  • the comment about staying below u32::MAX is true, but misleading, because with a cap of 20 you are not even close to the limit
  • the saturation behavior depends on checked_mul, which is fine, but the code is more complicated than needed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied your suggestion. also dropped the stale comment about u32::MAX since the cap at 20 makes it irrelevant.

Comment on lines +422 to +429
/// Install a rustls crypto provider so reqwest can build a blocking client.
/// Production code installs this at process startup; tests do it lazily.
static CRYPTO_INIT: Once = Once::new();
fn ensure_crypto_provider() {
CRYPTO_INIT.call_once(|| {
let _ = rustls::crypto::ring::default_provider().install_default();
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a similar initialization in main. Is it necessary to declare it again here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept the local helper because otap_df_otap::crypto isn't reachable here: otap-df-otap already depends on otap-df-config, so pulling it in as a dev-dep would create a workspace cycle. Happy to extract the init into a shared test-utility crate if you'd prefer, but that's a separate refactor.


/// An http: config URI could not be fetched.
#[error(
"Could not fetch config from '{uri}': {details}\n\nHint: Check the URL is reachable and responds within 30s. `https://` and authenticated endpoints are not supported yet."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is no longer accurate with the added retry mechanism

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrote the message to describe the retry behavior instead of the "30s" claim, and split off a ConfigHttpClientBuildFailed variant for the construction failure path.

- HttpConfigProvider::new/with_max_attempts now return Result instead of
  panicking when the underlying HTTP client cannot be built, via a new
  ConfigHttpClientBuildFailed error variant. default_resolver propagates
  the error.
- Dropped the Default impl since it cannot return Result.
- Simplified backoff_for per review (shift caps at 20 so u32 multiplier
  cannot overflow; no try_into dance needed).
- Updated the ConfigHttpRequestFailed message to describe the retry
  behavior instead of the stale 30s claim.

Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
@mateenali66
Copy link
Copy Markdown
Contributor Author

@lquerel ping when you have a moment. pushed the feedback fixes in 6de5aa4:

  • dropped the .expect(...) on the reqwest client; HttpConfigProvider::new and with_max_attempts now return Result<Self, Error> with a new ConfigHttpClientBuildFailed variant, threaded through default_resolver
  • replaced the manual checked_shl pattern with your suggested shift.min(20) + BASE.checked_mul chain
  • rewrote the stale timeout error message and split the client-build failure into its own error variant
  • kept the local rustls crypto init helper because moving it out would create a workspace cycle (otap-df-otap already depends on otap-df-config); happy to extract into a shared test utility crate if you'd prefer

the build_nonrequired (otap-dataflow, ubuntu-24.04-arm) failure is unrelated to the PR (non-required CI, all required builds and clippy pass on arm64). would appreciate a re-review when you get a slot.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

This pull request has been marked as stale due to lack of recent activity. It will be closed in 30 days if no further activity occurs. If this PR is still relevant, please comment or push new commits to keep it active.

@github-actions github-actions Bot added stale Not actively pursued and removed stale Not actively pursued labels May 8, 2026
@mateenali66
Copy link
Copy Markdown
Contributor Author

@lquerel could you take another pass when you have a slot? pushed the requested fixes in 6de5aa4 on apr 23 (return Result from HttpConfigProvider::new/with_max_attempts with a new ConfigHttpClientBuildFailed variant, exp backoff with shift.min(20) + checked_mul, reworked timeout error). all required CI is green, the only red check is build_nonrequired (ubuntu-24.04-arm) which is unrelated runner flake. stale bot pinged the PR yesterday so figured i'd nudge before it auto-closes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

feat(config): Phase 2 config URI providers — yaml: and http:

2 participants