Skip to content

Conversation

@grahama1970
Copy link

Scheduler: Config-Threaded TLS, Backoff, and Due Window

This PR wires runtime behavior to configuration in codex-scheduler while preserving current defaults unless explicitly enabled in config.

What Changed

  • Threaded strict_tls and database.arango.allow_insecure into reqwest::ClientBuilder::https_only with rustls.
  • Added retry/backoff tunables to [scheduler] and validated bounds:
    • max_write_retries (default 5)
    • backoff_base_ms (default 200)
    • backoff_jitter (0.0–1.0, default 0.25)
  • Centralized POST retry logic and used it for notify() and insert_events_batch().
  • Replaced hardcoded proximity window with due_window_ms from config.
  • Fixed small rand deprecations (thread_rngrng, gen_rangerandom_range).
  • Justfile: fixed default arg parsing and simplified review-bundle recipe for just 1.42.x.

Safety/Defaults

  • Defaults maintain previous behavior: HTTPS enforced unless allow_insecure=true and not strict_tls.
  • Config validation rejects insecure combinations when strict_tls=true.

Tests

  • Unit tests: cargo test -p codex-scheduler all passing (5 tests).
  • Manual: exercised config parse/validation and due-window path.

How to Verify

  • cd codex-rs && cargo test -p codex-scheduler.
  • Provide a TOML with [scheduler] enabled and [database.arango] URL:
    • Toggle strict_tls and allow_insecure to observe validation.
    • Set due_window_ms to a small value and observe job triggering cadence.

If desired, I can follow up with:

  • Lock TTL hardening (single-owner UPSERT/transaction).
  • Idempotency keys for events/notifications (unique indexes are created already).
  • Additional tracing spans around DB ops.

@grahama1970
Copy link
Author

Added CI to monitor Copilot reviews:

  • wait-for-copilot (PR check): Polls for up to 15 minutes for a Copilot review or comment and passes/fails accordingly.
  • copilot-review-notify: Posts an automated comment when Copilot submits a review; optional Slack notify via SLACK_WEBHOOK_URL secret.

Notes/Limitations for this PR:

  • These checks will run for future PRs once merged into the base branch. For this PR, they won’t appear as required checks until the workflow files are present on openai/codex’s default branch and branch protection marks the check as required.
  • To require the check, an admin can add the status context: wait-for-copilot / Wait for Copilot review to branch protections for main.

If desired, I can open a small follow-up PR (or amend this one) to include any naming tweaks you prefer for the check context.

@grahama1970 grahama1970 marked this pull request as ready for review September 30, 2025 14:40
Copilot AI review requested due to automatic review settings September 30, 2025 14:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR wires runtime behavior to configuration in codex-scheduler while preserving current defaults unless explicitly enabled in config. It adds configuration-driven TLS enforcement, retry/backoff settings, and due window timing along with some quality-of-life improvements.

  • Adds threaded strict_tls and database.arango.allow_insecure into HTTPS-only enforcement with rustls
  • Adds configurable retry/backoff tunables with validation bounds
  • Replaces hardcoded proximity window with configurable due_window_ms

Reviewed Changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated no comments.

Show a summary per file
File Description
scripts/pr.sh New PR automation script for formatting, branching, and GitHub PR operations
codex-rs/tui/src/resume_picker.rs Enhanced resume picker with plain mode, project tags, and environment filtering
codex-rs/scheduler/tests/config_redaction.rs Unit tests for config validation and password redaction
codex-rs/scheduler/src/runner.rs Job execution engine with timeout handling and event batching
codex-rs/scheduler/src/lib.rs Main scheduler entry point and handle management
codex-rs/scheduler/src/db.rs Database layer with configurable TLS and retry logic
codex-rs/scheduler/src/cronloop.rs Cron scheduling loop with configurable due window
codex-rs/scheduler/src/config.rs Configuration structs with validation and TLS enforcement
codex-rs/scheduler/Cargo.toml New scheduler crate dependencies
codex-rs/mcp-server/src/scheduler_bootstrap.rs Feature-gated scheduler initialization
codex-rs/mcp-server/src/notify_watcher.rs Feature-gated notification watcher stub
codex-rs/mcp-server/src/main.rs MCP server integration with scheduler bootstrap
codex-rs/mcp-server/src/context_injector.rs Context injection hooks for notifications
codex-rs/mcp-server/Cargo.toml Added optional scheduler dependency
codex-rs/justfile New PR helper commands and review bundle generation
codex-rs/Cargo.toml Added scheduler to workspace members
AGENTS.md Added PR autopilot guidelines and helper script documentation
.github/workflows/wait-for-copilot.yml Workflow to wait for Copilot review before proceeding
.github/workflows/copilot-review-notify.yml Workflow to notify when Copilot submits a review

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines +49 to +55
let now = chrono::Utc::now();
for (job, schedule) in &compiled {
if let Some(next) = schedule.after(&now).next() {
if is_due_within(next, now, due_window) {
if let Ok(permit) = sem.clone().try_acquire_owned() {
let dbc = db.clone();
let jobc = job.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve due jobs when concurrency limit reached

In the cron loop, jobs that are due but fail to grab the semaphore are dropped. The code calls try_acquire_owned() and, if it returns an error because all permits are in use, simply skips the job and continues. Because Schedule::after(now) produces the next future tick, that particular invocation is never revisited, so spikes that exceed max_concurrency permanently lose scheduled work. Consider waiting for capacity or re‑checking the job rather than discarding it outright.

Useful? React with 👍 / 👎.

Comment on lines +104 to +118
let status = match timeout(Duration::from_secs(req.max_duration_seconds), run_future).await {
Ok(Ok(())) => {
let exit = child.wait().await?;
if exit.success() {
"succeeded"
} else {
"failed"
}
}
Ok(Err(e)) => {
db.update_run_finished(&run_id, "failed", Some(&e.to_string()))
.await
.ok();
"failed"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Kill subprocess when log reader fails

If the stdout-reading task returns an error (Ok(Err(e)) branch) the run is marked failed, but the spawned codex process is neither killed nor awaited. The function exits, the scheduler reports completion, yet the child process continues running in the background, leaking resources and potentially issuing more actions. Terminate and wait on child in this branch (similar to the timeout path) and keep the error message when doing so.

Useful? React with 👍 / 👎.

Comment on lines 6 to +16
arg0_dispatch_or_else(|codex_linux_sandbox_exe| async move {
run_main(codex_linux_sandbox_exe, CliConfigOverrides::default()).await?;
// Launch the MCP server.
let res = run_main(codex_linux_sandbox_exe, CliConfigOverrides::default()).await;
// Opportunistically start scheduler when feature is enabled; it is no-op otherwise.
#[cfg(feature = "scheduler")]
{
// Note: this call does not change behavior unless [scheduler] is enabled in ~/.codex/config.toml
// and Arango config is present.
super::scheduler_bootstrap::start_if_enabled();
}
res?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Start scheduler before blocking on run_main

The scheduler bootstrap is invoked only after run_main completes. run_main runs the MCP server and does not return until stdin/stdout tasks have shut down, so start_if_enabled() is executed while the runtime is exiting. The spawned task is immediately dropped and the scheduler never actually runs during normal server operation. Bootstrap the scheduler before awaiting run_main (or spawn both concurrently) so it can operate while the server is alive.

Useful? React with 👍 / 👎.

@etraut-openai
Copy link
Collaborator

Thanks for the contribution, and apologies for the slow response. We've received many PRs, and we don't have the bandwidth on the codex team to review all of them.

We've updated our contribution guidelines to clarify that we're currently accepting contributions for bugs and security fixes, but we're not generally accepting new features at this time. We need to make sure that all new features compose well with both existing and upcoming features and fit into our roadmap. If you would like to propose a new feature, please file or upvote an enhancement request in the issue tracker. We will generally prioritize new features based on community feedback.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 31, 2025
@openai openai unlocked this conversation Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants