feat(config): add yaml: and http: URI providers#2696
feat(config): add yaml: and http: URI providers#2696mateenali66 wants to merge 5 commits intoopen-telemetry:mainfrom
Conversation
Phase 2 of the config URI provider work from open-telemetry#2529. Two new providers riding on the existing ConfigProvider trait, plus a new error variant for HTTP fetch failures. yaml: inline YAML on the CLI. Follows the OTel Collector convention where `::` expands into nested YAML and the segment after the final `::` is a trailing YAML fragment. No `::` means the content is passed through as literal YAML. yaml:version: otel_dataflow/v1 yaml:exporters::debug::verbosity: detailed -> exporters: debug: verbosity: detailed http: unauthenticated HTTP GET with a 30s timeout. Response body is YAML unless Content-Type is application/json. Standard redirects are followed. https:, authentication, and configurable timeouts are deferred to Phase 3 as per the issue. parse_scheme already accepts http://. default_resolver now registers all four providers. Tests use wiremock for the http: cases and a lazy rustls ring provider install since the workspace's reqwest uses rustls-no-provider. Fixes open-telemetry#2598 Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
|
That was fast! Can you take into consideration the following comment #2598 (comment) Thanks |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2696 +/- ##
=========================================
Coverage 88.17% 88.17%
=========================================
Files 633 636 +3
Lines 238646 239931 +1285
=========================================
+ Hits 210428 211563 +1135
- Misses 27694 27844 +150
Partials 524 524
🚀 New features to boost your workflow:
|
Add a retry loop around the http: fetch so startup waits out transient failures (network errors, 404, 5xx) instead of bailing on the first attempt. Backoff starts at 500ms and doubles up to a 60s cap per sleep. Default max attempt count is u64::MAX so the provider keeps retrying until the remote config server becomes available; `with_max_attempts` bounds this for tests and other constrained callers. Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
Thanks @lquerel , faster than the backoff will be. I was already prepared and was looking forward to triage. exp backoff 500ms doubling to a 60s cap, retries on network errors and any non-2xx, configurable via |
Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
Add three tests to close out the remaining branches in the new http: provider: Default::default() matches new(), with_max_attempts(0) returns the empty-attempts error, and a resolve against a closed port surfaces the network error through ConfigHttpRequestFailed. Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
lquerel
left a comment
There was a problem hiding this comment.
Thank you for these quick contributions. I just noted a few minor issues that need to be addressed before merging.
| let client = reqwest::blocking::Client::builder() | ||
| .timeout(Duration::from_secs(30)) | ||
| .build() | ||
| .expect("reqwest blocking client should always build"); |
There was a problem hiding this comment.
It is preferable to use expect only for errors that are not expected. In this specific case, this is user input, so we should use the error-handling mechanism (Result) in order to provide a unified error presentation.
There was a problem hiding this comment.
dropped the expect. HttpConfigProvider::new and with_max_attempts now return Result<Self, Error> with a new ConfigHttpClientBuildFailed variant, and default_resolver propagates that through.
| let scaled = Self::BASE_BACKOFF | ||
| .checked_mul(multiplier.try_into().unwrap_or(u32::MAX)) | ||
| .unwrap_or(Self::MAX_BACKOFF); | ||
| scaled.min(Self::MAX_BACKOFF) |
There was a problem hiding this comment.
Could be simplified as follow:
let shift = attempt.min(20) as u32;
let multiplier = 1u32 << shift;
Self::BASE_BACKOFF
.checked_mul(multiplier)
.unwrap_or(Self::MAX_BACKOFF)
.min(Self::MAX_BACKOFF)Explanation:
multiplier.try_into().unwrap_or(u32::MAX)is unnecessary in practice because 1u64 << 20 is only 1_048_576, which always fits in u32- the comment about staying below u32::MAX is true, but misleading, because with a cap of 20 you are not even close to the limit
- the saturation behavior depends on
checked_mul, which is fine, but the code is more complicated than needed
There was a problem hiding this comment.
applied your suggestion. also dropped the stale comment about u32::MAX since the cap at 20 makes it irrelevant.
| /// Install a rustls crypto provider so reqwest can build a blocking client. | ||
| /// Production code installs this at process startup; tests do it lazily. | ||
| static CRYPTO_INIT: Once = Once::new(); | ||
| fn ensure_crypto_provider() { | ||
| CRYPTO_INIT.call_once(|| { | ||
| let _ = rustls::crypto::ring::default_provider().install_default(); | ||
| }); | ||
| } |
There was a problem hiding this comment.
We already have a similar initialization in main. Is it necessary to declare it again here?
There was a problem hiding this comment.
kept the local helper because otap_df_otap::crypto isn't reachable here: otap-df-otap already depends on otap-df-config, so pulling it in as a dev-dep would create a workspace cycle. Happy to extract the init into a shared test-utility crate if you'd prefer, but that's a separate refactor.
|
|
||
| /// An http: config URI could not be fetched. | ||
| #[error( | ||
| "Could not fetch config from '{uri}': {details}\n\nHint: Check the URL is reachable and responds within 30s. `https://` and authenticated endpoints are not supported yet." |
There was a problem hiding this comment.
The error message is no longer accurate with the added retry mechanism
There was a problem hiding this comment.
rewrote the message to describe the retry behavior instead of the "30s" claim, and split off a ConfigHttpClientBuildFailed variant for the construction failure path.
- HttpConfigProvider::new/with_max_attempts now return Result instead of panicking when the underlying HTTP client cannot be built, via a new ConfigHttpClientBuildFailed error variant. default_resolver propagates the error. - Dropped the Default impl since it cannot return Result. - Simplified backoff_for per review (shift caps at 20 so u32 multiplier cannot overflow; no try_into dance needed). - Updated the ConfigHttpRequestFailed message to describe the retry behavior instead of the stale 30s claim. Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
|
@lquerel ping when you have a moment. pushed the feedback fixes in
the |
|
This pull request has been marked as stale due to lack of recent activity. It will be closed in 30 days if no further activity occurs. If this PR is still relevant, please comment or push new commits to keep it active. |
|
@lquerel could you take another pass when you have a slot? pushed the requested fixes in |
Fixes #2598. Phase 2 follow-up to #2529 (which added the ConfigProvider trait and the
file:/env:providers).What this adds
yaml:provider — inline YAML on the CLI following the OTel Collector convention.::expands into nested YAML, and the segment after the final::is treated as a trailing YAML fragment (typicallykey: value). No::means the content is passed through as literal YAML.http:provider — unauthenticated HTTP GET with a 30s timeout. Response body is treated as YAML unlessContent-Typeisapplication/json. Standard HTTP redirects are followed. Usesreqwest::blockingso config resolution can happen at startup without a tokio runtime.Out of scope (deferred to Phase 3 per the issue)
https:and TLS configuration::inline-override example from the issue body really lands)Tests
yaml:provider: literal content, single-level key path, nested key path, flow-style value, round-trip throughdefault_resolverandserde_yamlhttp:provider: yaml body, JSON content-type detection, non-2xx status error. Useswiremockas an async mock server withtokio::task::spawn_blockingto drive the blocking reqwest client.parse_schemealready acceptshttp://from Phase 1 (test is in feat(config): add URI providers for file: and env: config sources #2529). Added ayaml:case alongside.Validation
cargo test -p otap-df-config-> 273 tests pass.cargo clippy -p otap-df-config --all-targets -- -D warnings-> clean.cargo check -p otap-df-engine -p otap-df-controller-> clean (config crate still builds for its downstream consumers).cargo xtask quick-checkfails on an unrelated pre-existingdf_enginemain.rs issue:static GLOBAL: Jemalloc = Jemalloc;is used without the samenot(clippy)cfg guard that gates theuse tikv_jemallocator::Jemalloc;import. Reproduces on upstream main with this branch stashed, so not caused by this PR. Happy to file a separate issue if useful.Note on rustls
The workspace's
reqwestuses therustls-no-providerfeature, so callers must install a crypto provider before anyreqwest::blocking::Clientis built. Production code already handles this viaotap-df-otap/crypto-ring. The config crate's http tests installrustls::crypto::ring::default_provider()lazily via astd::sync::Once, and pull inrustlswith theringfeature only as a dev-dependency.