Conversation
This is a breaking change. The torc-server used to bind to localhost by default, which was not appropriate for HPC deployments. It now defaults to 0.0.0.0, which is routable. Also changed the option to specify the host to be more clear (--url -> --host).
There was a problem hiding this comment.
Pull request overview
Updates Torc’s server binding configuration and HPC deployment guidance, while also extending CLI/dashboard functionality and adding a new built-in HPC profile.
Changes:
- Rename torc-server bind setting/flag from
urltohost, adjust defaults, and update related docs/tests. - Extend
torc jobs updateto support--runtimeand--resource-requirements-id, with new integration tests. - Add LLNL “Dane” as a built-in HPC profile; add torc-dash behavior to save inline workflow specs.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| torc-server/src/service.rs | Renames service config field and service install args from url→host. |
| torc-server/src/main.rs | Introduces --host CLI option and uses it for bind address construction. |
| torc-dash/src/main.rs | Adds inline spec saving utilities and hooks into create handlers. |
| src/config/server.rs | Renames server config field to host with serde(alias = "url") for backward compat. |
| src/client/commands/jobs.rs | Adds jobs update --runtime and --resource-requirements-id support. |
| tests/test_jobs.rs | Adds tests covering new jobs update behaviors. |
| tests/test_config.rs | Updates expected server default bind value to 0.0.0.0. |
| src/client/hpc.rs | Exposes new dane module. |
| src/client/hpc/dane.rs | Adds the Dane cluster profile definition. |
| src/client/hpc/profiles.rs | Registers the Dane profile in the built-in registry. |
| docs/src/specialized/tools/dashboard-deployment.md | Updates deployment examples/host guidance and adds troubleshooting note for routable hostnames. |
| docs/src/specialized/hpc/hpc-deployment.md | Replaces -u references with --host and updates examples/config snippets. |
| docs/src/specialized/admin/server-deployment.md | Updates server/service examples from --url to --host. |
| docs/src/getting-started/quick-start-remote.md | Adjusts quick-start to rely on new default bind behavior. |
| docs/src/getting-started/quick-start-hpc.md | Adds --host example and port conflict guidance. |
| docs/src/core/reference/cli-cheatsheet.md | Wraps cheatsheet content in a .cheatsheet container for targeted styling. |
| docs/custom.css | Refines table styling and scopes fixed column widths to cheatsheet tables. |
| Cargo.toml | Bumps workspace version to 0.11.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
torc-dash/src/main.rs
Outdated
| async fn save_workflow_spec(spec_json: &str, workflow_id: Option<&str>) -> Option<String> { | ||
| // Parse the spec to extract the workflow name | ||
| let spec: serde_json::Value = match serde_json::from_str(spec_json) { | ||
| Ok(v) => v, | ||
| Err(e) => { | ||
| warn!("Failed to parse workflow spec for saving: {}", e); | ||
| return None; |
There was a problem hiding this comment.
save_workflow_spec parses the inline spec as JSON and always writes a .json file, but cli_create_handler/cli_create_slurm_handler accept inline specs with arbitrary file_extension (YAML/KDL/etc). This will prevent saving non-JSON specs and/or save them with the wrong extension. Consider saving the raw spec_content using the provided extension, or parsing based on file_extension rather than assuming JSON.
torc-dash/src/main.rs
Outdated
| // Try to save to the current directory (with trailing newline) | ||
| let filename = format!("{}.json", base_filename); | ||
| let content = format!("{}\n", pretty_json); | ||
| match tokio::fs::write(&filename, &content).await { | ||
| Ok(()) => { | ||
| info!("Saved workflow spec to: {}", filename); | ||
| Some(filename) | ||
| } | ||
| Err(e) => { | ||
| warn!("Failed to save workflow spec to {}: {}", filename, e); |
There was a problem hiding this comment.
This writes the saved workflow spec directly into the process current working directory and will overwrite any existing file with the same name (and follow symlinks). Because this is triggered by an HTTP request, it can allow clobbering files on the torc-dash host (within process permissions). Prefer writing into a dedicated, configurable directory and use an atomic create-new strategy (fail if exists / add unique suffix) to avoid overwrites and symlink attacks.
| host: if cli_config.host != "0.0.0.0" { | ||
| cli_config.host | ||
| } else { | ||
| server_file_config.url.clone() | ||
| server_file_config.host.clone() | ||
| }, |
There was a problem hiding this comment.
CLI/file-config merging uses a sentinel default string ("0.0.0.0") to decide precedence. This makes it impossible to override a config-file server.host back to 0.0.0.0 via CLI, since passing --host 0.0.0.0 is indistinguishable from “not provided”. Prefer making host an Option<String> in the CLI struct (no default), then unwrap_or_else to the file/default value, or use clap’s default_value_t only at the final merged config layer.
| // Handle runtime update (requires updating resource requirements) | ||
| if let Some(new_runtime) = runtime { | ||
| let rr_id = job.resource_requirements_id.unwrap_or_else(|| { | ||
| eprintln!( | ||
| "Error: Cannot update runtime - job {} has no resource requirements assigned.", | ||
| id | ||
| ); | ||
| eprintln!( | ||
| "Hint: First assign resource requirements with --resource-requirements-id" | ||
| ); | ||
| std::process::exit(1); | ||
| }); | ||
|
|
||
| // Get and update the resource requirements | ||
| match default_api::get_resource_requirements(config, rr_id) { | ||
| Ok(mut rr) => { | ||
| rr.runtime = new_runtime.clone(); | ||
| match default_api::update_resource_requirements(config, rr_id, rr) { | ||
| Ok(_) => { | ||
| if format != "json" { | ||
| println!( |
There was a problem hiding this comment.
jobs update --runtime ... updates the resource requirements first, then calls update_job. If update_job fails, the command exits with an error even though the runtime change was already applied, leaving a partial update that’s hard to reason about. Consider either (a) skipping update_job when only runtime changed, or (b) updating the job first (when RR id changes) and only updating the RR after update_job succeeds, and/or making the output clearly reflect which parts succeeded.
torc-dash/src/main.rs
Outdated
| /// Save a workflow spec to a file in the current directory. | ||
| /// The filename is derived from the workflow name in the spec. | ||
| /// Returns the path to the saved file, or None if saving failed. | ||
| async fn save_workflow_spec(spec_json: &str, workflow_id: Option<&str>) -> Option<String> { |
There was a problem hiding this comment.
torc-dash still spawns torc-server using the old --url flag (see earlier in this file: args include "--url"), but torc-server CLI now expects --host. Standalone mode will fail to start torc-server unless this is updated (or torc-server keeps --url as an alias).
| Start a Torc server. By default, it binds to `0.0.0.0` so it's accessible from remote machines: | ||
|
|
||
| ```console | ||
| torc-server run --database torc.db --url 0.0.0.0 --port 8080 | ||
| torc-server run --database torc.db --port 8080 |
There was a problem hiding this comment.
These Quick Start instructions start torc-server bound to 0.0.0.0 without --auth-file/--require-auth, and the authentication docs state that in this mode authentication is disabled and all requests are allowed. On any network where this host is reachable, an attacker can point their own torc CLI or HTTP client at the server and create, modify, or read workflows and jobs without credentials. To avoid exposing a fully unauthenticated orchestration API, update the example to enable authentication (e.g. using --auth-file with --require-auth and, on untrusted networks, --https) or clearly scope it to localhost or an isolated network instead of 0.0.0.0.
| **Note**: This uses a specific hostname routable from compute nodes, which may vary across HPC | ||
| systems. Adjust as necessary or exclude `--host` to use the default. | ||
|
|
||
| ```console | ||
| torc-server run --database torc.db --completion-check-interval-secs 5 | ||
| torc-server run --database torc.db --host kl1.hsn.cm.kestrel.hpc.nrel.gov --completion-check-interval-secs 5 |
There was a problem hiding this comment.
This Quick Start runs torc-server on an HPC login node with a routable hostname but without --auth-file/--require-auth, which per the authentication docs means authentication is disabled and all requests are allowed. On a multi-user cluster, any other user who can reach that hostname and port can call the server API (via torc or plain HTTP) to submit, inspect, or control workflows under the owner’s account. The guide should recommend starting the server with authentication enabled (and HTTPS where appropriate) or explicitly constraining it to a trusted, isolated network/IP to prevent unauthorized access.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Determine API URL | ||
| // - Standalone mode: dashboard connects to the managed server via localhost | ||
| // - Non-standalone: use the user-provided --api-url | ||
| let final_api_url = if standalone { | ||
| // In standalone mode, we need to determine the correct host for the API URL. | ||
| // If the user provided a custom --api-url, extract the host from it. | ||
| // Otherwise, use the host the dashboard is binding to (or localhost as fallback). | ||
| let api_host = if api_url != "http://localhost:8080/torc-service/v1" { | ||
| // User provided a custom API URL, extract the host | ||
| url::Url::parse(&api_url) | ||
| .ok() | ||
| .and_then(|u| u.host_str().map(|h| h.to_string())) | ||
| .unwrap_or_else(|| "localhost".to_string()) | ||
| } else if host != "127.0.0.1" { | ||
| // User specified a non-default host, use it | ||
| host.clone() | ||
| } else { | ||
| "localhost".to_string() | ||
| }; | ||
| format!("http://{}:{}/torc-service/v1", api_host, actual_server_port) | ||
| // In standalone mode, the dashboard connects to the server locally. | ||
| // The server_host controls external accessibility, but we always connect via localhost. | ||
| format!("http://localhost:{}/torc-service/v1", actual_server_port) | ||
| } else { |
There was a problem hiding this comment.
torc-dash starts the managed torc-server with --host {server_host}, but in standalone mode it always sets final_api_url to http://localhost:{port}/.... If server_host is a specific interface hostname/IP (not 0.0.0.0), the server will not be listening on 127.0.0.1 and the dashboard’s proxy calls will fail. Consider computing the connect host from server_host (use localhost/127.0.0.1 only when binding to 0.0.0.0/::).
| let server_host = if cli.server_host != "0.0.0.0" { | ||
| cli.server_host.clone() | ||
| } else { | ||
| dash_config.server_host.clone() | ||
| }; |
There was a problem hiding this comment.
The CLI/file-config merge logic treats "0.0.0.0" as a sentinel for “not provided”. This makes it impossible to explicitly override a config-file server_host back to 0.0.0.0 via CLI (because clap always populates the default). Prefer Option<String> for --server-host (no default) and apply defaults only after merging, or use clap’s “value source” APIs to detect whether the arg was set.
| let server_host = if cli.server_host != "0.0.0.0" { | |
| cli.server_host.clone() | |
| } else { | |
| dash_config.server_host.clone() | |
| }; | |
| let server_host = cli.server_host.clone(); |
torc-dash/src/main.rs
Outdated
| @@ -607,9 +677,8 @@ async fn cli_create_handler( | |||
| .await | |||
| } else { | |||
| // Spec is inline content - write to temp file with correct extension | |||
| let extension = req.file_extension.as_deref().unwrap_or(".json"); | |||
| let unique_id = uuid::Uuid::new_v4(); | |||
| let temp_path = format!("/tmp/torc_spec_{}{}", unique_id, extension); | |||
| let temp_path = format!("/tmp/torc_spec_{}{}", unique_id, file_extension); | |||
| if let Err(e) = tokio::fs::write(&temp_path, &req.spec).await { | |||
There was a problem hiding this comment.
file_extension is taken directly from the request and concatenated into both the temp path (/tmp/torc_spec_*{file_extension}) and the saved filename. If it contains path separators (e.g. "/../../...") this can escape the intended directory or create unexpected paths. Validate/sanitize file_extension (e.g., allowlist {.json, .yaml, .yml, .kdl} and reject anything containing / or ..).
torc-dash/src/main.rs
Outdated
| if let Some(caps) = stdout | ||
| .lines() | ||
| .find(|line| line.contains("Created workflow")) | ||
| { | ||
| // Extract the number after "Created workflow" | ||
| caps.split_whitespace().last().map(|s| s.trim().to_string()) | ||
| } else { | ||
| None | ||
| } |
There was a problem hiding this comment.
The comment says it handles output patterns like "ID: 123", but the implementation only searches for lines containing "Created workflow". Either implement the additional pattern(s) or update the comment so it matches behavior.
| if let Some(caps) = stdout | |
| .lines() | |
| .find(|line| line.contains("Created workflow")) | |
| { | |
| // Extract the number after "Created workflow" | |
| caps.split_whitespace().last().map(|s| s.trim().to_string()) | |
| } else { | |
| None | |
| } | |
| for line in stdout.lines() { | |
| if line.contains("Created workflow") { | |
| // Extract the number after "Created workflow" | |
| if let Some(id) = line.split_whitespace().last() { | |
| return Some(id.trim().to_string()); | |
| } | |
| } else if let Some(pos) = line.find("ID:") { | |
| // Extract the ID after "ID:" | |
| let after = &line[pos + "ID:".len()..]; | |
| if let Some(id) = after.trim().split_whitespace().next() { | |
| return Some(id.to_string()); | |
| } | |
| } | |
| } | |
| None |
torc-server/src/main.rs
Outdated
| /// Hostname or IP address to bind the server to | ||
| #[arg(short = 'H', long, default_value = "0.0.0.0")] |
There was a problem hiding this comment.
The --host CLI option replaces the previous --url/-u bind option. This is a breaking change for existing scripts. Consider adding clap aliases (e.g., long alias url and possibly a short alias u) with help text marking them deprecated, to preserve backward compatibility.
| /// Hostname or IP address to bind the server to | |
| #[arg(short = 'H', long, default_value = "0.0.0.0")] | |
| /// Hostname or IP address to bind the server to. | |
| /// Deprecated aliases: `--url`, `-u` (use `--host` instead). | |
| #[arg(short = 'H', long, alias = "url", short_alias = 'u', default_value = "0.0.0.0")] |
| host: if cli_config.host != "0.0.0.0" { | ||
| cli_config.host | ||
| } else { | ||
| server_file_config.url.clone() | ||
| server_file_config.host.clone() | ||
| }, |
There was a problem hiding this comment.
The CLI/file-config merge treats "0.0.0.0" as “not provided”. With clap’s default, this prevents users from explicitly overriding a config-file host back to 0.0.0.0. Using Option<String> for --host (and merging with unwrap_or_else(...)) or checking clap’s value source would avoid this ambiguity.
No description provided.