-
Notifications
You must be signed in to change notification settings - Fork 7.2k
scheduler: config-threaded TLS/backoff + due window; rand + justfile fixes #4501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: config-threaded TLS/backoff + due window; rand + justfile fixes #4501
Conversation
…y + filter; panic hook for alt-screen cleanup; tests green Signed-off-by: Graham Anderson <[email protected]>
…bootstrap; rustls reqwest
…otify/events batch
…ow; add tests for redaction, URL policy, cron proximity, timeout
…ue window; fix rand deprecations; justfile fix for just 1.42.x
|
Added CI to monitor Copilot reviews:
Notes/Limitations for this PR:
If desired, I can open a small follow-up PR (or amend this one) to include any naming tweaks you prefer for the check context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR wires runtime behavior to configuration in codex-scheduler while preserving current defaults unless explicitly enabled in config. It adds configuration-driven TLS enforcement, retry/backoff settings, and due window timing along with some quality-of-life improvements.
- Adds threaded
strict_tlsanddatabase.arango.allow_insecureinto HTTPS-only enforcement with rustls - Adds configurable retry/backoff tunables with validation bounds
- Replaces hardcoded proximity window with configurable
due_window_ms
Reviewed Changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/pr.sh | New PR automation script for formatting, branching, and GitHub PR operations |
| codex-rs/tui/src/resume_picker.rs | Enhanced resume picker with plain mode, project tags, and environment filtering |
| codex-rs/scheduler/tests/config_redaction.rs | Unit tests for config validation and password redaction |
| codex-rs/scheduler/src/runner.rs | Job execution engine with timeout handling and event batching |
| codex-rs/scheduler/src/lib.rs | Main scheduler entry point and handle management |
| codex-rs/scheduler/src/db.rs | Database layer with configurable TLS and retry logic |
| codex-rs/scheduler/src/cronloop.rs | Cron scheduling loop with configurable due window |
| codex-rs/scheduler/src/config.rs | Configuration structs with validation and TLS enforcement |
| codex-rs/scheduler/Cargo.toml | New scheduler crate dependencies |
| codex-rs/mcp-server/src/scheduler_bootstrap.rs | Feature-gated scheduler initialization |
| codex-rs/mcp-server/src/notify_watcher.rs | Feature-gated notification watcher stub |
| codex-rs/mcp-server/src/main.rs | MCP server integration with scheduler bootstrap |
| codex-rs/mcp-server/src/context_injector.rs | Context injection hooks for notifications |
| codex-rs/mcp-server/Cargo.toml | Added optional scheduler dependency |
| codex-rs/justfile | New PR helper commands and review bundle generation |
| codex-rs/Cargo.toml | Added scheduler to workspace members |
| AGENTS.md | Added PR autopilot guidelines and helper script documentation |
| .github/workflows/wait-for-copilot.yml | Workflow to wait for Copilot review before proceeding |
| .github/workflows/copilot-review-notify.yml | Workflow to notify when Copilot submits a review |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
| let now = chrono::Utc::now(); | ||
| for (job, schedule) in &compiled { | ||
| if let Some(next) = schedule.after(&now).next() { | ||
| if is_due_within(next, now, due_window) { | ||
| if let Ok(permit) = sem.clone().try_acquire_owned() { | ||
| let dbc = db.clone(); | ||
| let jobc = job.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve due jobs when concurrency limit reached
In the cron loop, jobs that are due but fail to grab the semaphore are dropped. The code calls try_acquire_owned() and, if it returns an error because all permits are in use, simply skips the job and continues. Because Schedule::after(now) produces the next future tick, that particular invocation is never revisited, so spikes that exceed max_concurrency permanently lose scheduled work. Consider waiting for capacity or re‑checking the job rather than discarding it outright.
Useful? React with 👍 / 👎.
| let status = match timeout(Duration::from_secs(req.max_duration_seconds), run_future).await { | ||
| Ok(Ok(())) => { | ||
| let exit = child.wait().await?; | ||
| if exit.success() { | ||
| "succeeded" | ||
| } else { | ||
| "failed" | ||
| } | ||
| } | ||
| Ok(Err(e)) => { | ||
| db.update_run_finished(&run_id, "failed", Some(&e.to_string())) | ||
| .await | ||
| .ok(); | ||
| "failed" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kill subprocess when log reader fails
If the stdout-reading task returns an error (Ok(Err(e)) branch) the run is marked failed, but the spawned codex process is neither killed nor awaited. The function exits, the scheduler reports completion, yet the child process continues running in the background, leaking resources and potentially issuing more actions. Terminate and wait on child in this branch (similar to the timeout path) and keep the error message when doing so.
Useful? React with 👍 / 👎.
| arg0_dispatch_or_else(|codex_linux_sandbox_exe| async move { | ||
| run_main(codex_linux_sandbox_exe, CliConfigOverrides::default()).await?; | ||
| // Launch the MCP server. | ||
| let res = run_main(codex_linux_sandbox_exe, CliConfigOverrides::default()).await; | ||
| // Opportunistically start scheduler when feature is enabled; it is no-op otherwise. | ||
| #[cfg(feature = "scheduler")] | ||
| { | ||
| // Note: this call does not change behavior unless [scheduler] is enabled in ~/.codex/config.toml | ||
| // and Arango config is present. | ||
| super::scheduler_bootstrap::start_if_enabled(); | ||
| } | ||
| res?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start scheduler before blocking on run_main
The scheduler bootstrap is invoked only after run_main completes. run_main runs the MCP server and does not return until stdin/stdout tasks have shut down, so start_if_enabled() is executed while the runtime is exiting. The spawned task is immediately dropped and the scheduler never actually runs during normal server operation. Bootstrap the scheduler before awaiting run_main (or spawn both concurrently) so it can operate while the server is alive.
Useful? React with 👍 / 👎.
|
Thanks for the contribution, and apologies for the slow response. We've received many PRs, and we don't have the bandwidth on the codex team to review all of them. We've updated our contribution guidelines to clarify that we're currently accepting contributions for bugs and security fixes, but we're not generally accepting new features at this time. We need to make sure that all new features compose well with both existing and upcoming features and fit into our roadmap. If you would like to propose a new feature, please file or upvote an enhancement request in the issue tracker. We will generally prioritize new features based on community feedback. |
Scheduler: Config-Threaded TLS, Backoff, and Due Window
This PR wires runtime behavior to configuration in
codex-schedulerwhile preserving current defaults unless explicitly enabled in config.What Changed
strict_tlsanddatabase.arango.allow_insecureintoreqwest::ClientBuilder::https_onlywith rustls.[scheduler]and validated bounds:max_write_retries(default 5)backoff_base_ms(default 200)backoff_jitter(0.0–1.0, default 0.25)notify()andinsert_events_batch().due_window_msfrom config.randdeprecations (thread_rng→rng,gen_range→random_range).review-bundlerecipe forjust 1.42.x.Safety/Defaults
allow_insecure=trueand notstrict_tls.strict_tls=true.Tests
cargo test -p codex-schedulerall passing (5 tests).How to Verify
cd codex-rs && cargo test -p codex-scheduler.[scheduler]enabled and[database.arango]URL:strict_tlsandallow_insecureto observe validation.due_window_msto a small value and observe job triggering cadence.If desired, I can follow up with: