feat(agentd): introduce AgentdConfig to read env vars once at startup#507
feat(agentd): introduce AgentdConfig to read env vars once at startup#507dijdzv wants to merge 3 commits intosuperradcompany:mainfrom
Conversation
|
You are right that
As for the flaky PTY failures, I believe the root cause is reading/setting the same env vars in multiple unit tests, which is thread unsafe. So there is an opportunity for a proper fix here. Maybe we should introduce AgentdConfig where the env vars are read into when agentd starts so we don't have to go through std::env::var every single time. |
|
Replaced the ENV_USER removal with AgentdConfig — reads all MSB_* env vars once at startup. This preserves the ENV_USER defense-in-depth fallback while eliminating the repeated std::env::var calls that caused the flaky test. |
… parallel test runs
The SDK already resolves config.user into ExecRequest.user before sending — this env var fallback never fires in normal SDK/CLI usage and caused an env race that made the PTY test flaky.
b487fab to
7ec3813
Compare
| pub fn from_env() -> Self { | ||
| Self { | ||
| block_root: read_env(ENV_BLOCK_ROOT), | ||
| mounts: read_env(ENV_MOUNTS), | ||
| tmpfs: read_env(ENV_TMPFS), | ||
| hostname: read_env(ENV_HOSTNAME), | ||
| net: read_env(ENV_NET), | ||
| net_ipv4: read_env(ENV_NET_IPV4), | ||
| net_ipv6: read_env(ENV_NET_IPV6), | ||
| user: read_env(ENV_USER), | ||
| } | ||
| } |
There was a problem hiding this comment.
Just one more nit 😅.
It would be nice if all the env parsing operations are moved into this file as well.
So we parse all the env vars once in from_env and fields in AgentdConfig can be the actual types. Something like
pub(crate) struct AgentdConfig<'a> {
block_root: Option<BlockRootSpec<'a>>,
mounts: Option<VolumeMountSpec<'a>>,
...
}`
Summary
test_pty_reader_drains_ready_fdfails intermittently (~10%) undercargo testincrates/agentd/.test_request_user_overrides_env_defaultsetsMSB_USER=0:0viaset_var, which leaks into the parallel PTY test'sfork()— the child picks it up, attempts a privilege switch, and_exit(1)s.The root cause is that
resolve_requested_userreadsENV_USERas a fallback whenExecRequest.userisNone. But the SDK already resolvesconfig.userintoExecRequest.userinbuild_exec_request(), so this env var fallback never fires in normal SDK/CLI usage — it's effectively dead code.Changes
ENV_USERfallback fromresolve_requested_user, relying solely onExecRequest.usertest_request_user_overrides_env_default(the code path it tested was removed above)sleep(1s → 0.1s) — original delay was unnecessary for verifying output orderingVerification
Before: 3 failures in 30 runs
After: 0 failures in 80 runs