Skip to content

Conversation

djeebus
Copy link
Contributor

@djeebus djeebus commented Oct 4, 2025

Pre-requisite for running multiple orchestrators on a single node.


Note

Introduce orchestrator config model and network config, switch port types to uint16 across services, and move Nomad jobs to env-driven ports (drop CLI flags).

  • Config refactor (orchestrator):
    • Add internal/cfg with env-parsed Config (e.g., GRPC_PORT, PROXY_PORT, REDIS_URL|REDIS_CLUSTER_URL, CLICKHOUSE_CONNECTION_STRING, ALLOW_SANDBOX_INTERNET, ORCHESTRATOR_SERVICES, plus NetworkConfig).
    • Use cfg.Parse() in main and propagate config (lock path, force stop, services, ports, Redis/ClickHouse, hyperloop port).
    • Add tests for config parsing.
  • Network config:
    • Introduce network.Config and ParseConfig; pass through pool/storage/slot; remove internal/consts.go and read hyperloop IP/port from config.
    • Update iptables redirect to use per-slot hyperloop port.
  • Port type unification:
    • Change port parameters to uint16 in proxy (shared), orchestrator grpcserver, hyperloop server, sandbox proxy, and client proxy; adjust logs and validation.
  • Client proxy:
    • Validate proxy port range; update NewClientProxy signature and calls.
  • Jobs (Nomad):
    • Add env GRPC_PORT/PROXY_PORT and stop passing --port/--proxy-port args in orchestrator.hcl and template-manager.hcl.
  • Misc:
    • Update benchmarks and build-template command to use network.ParseConfig and uint16 ports.
    • Add github.com/caarlos0/env/v11 dependency.

Written by Cursor Bugbot for commit 6209e35. This will update automatically on new commits. Configure here.

cursor[bot]

This comment was marked as outdated.

@tomassrnka tomassrnka requested a review from jakubno October 6, 2025 07:45
@jakubno jakubno self-assigned this Oct 6, 2025
jakubno
jakubno previously requested changes Oct 6, 2025
Copy link
Member

@jakubno jakubno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also some issue with build in integration tests

cursor[bot]

This comment was marked as outdated.

@djeebus
Copy link
Contributor Author

djeebus commented Oct 6, 2025

The CodeQL issues will be resolved in #1296 , so this is dependent on that one.

# Conflicts:
#	packages/client-proxy/internal/proxy/proxy.go
#	packages/client-proxy/main.go
cursor[bot]

This comment was marked as outdated.

@djeebus djeebus requested a review from jakubno October 7, 2025 18:28
cursor[bot]

This comment was marked as outdated.

@djeebus
Copy link
Contributor Author

djeebus commented Oct 7, 2025

CodeQL's wrong, %T can't possibly emit sensitive data.

@djeebus djeebus dismissed jakubno’s stale review October 7, 2025 23:03

changes were implemented

@jakubno jakubno enabled auto-merge (squash) October 8, 2025 08:34
cursor[bot]

This comment was marked as outdated.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Config Parameter Omission in Slot Creation

The Config parameter, recently added to network.Pool and network.StorageKV, isn't consistently passed to NewSlot calls. This results in network slots being created without essential configuration, such as hyperloop IP and port, leading to compilation errors or runtime issues. This affects both initial slot creation and fallback acquisition.

packages/orchestrator/internal/sandbox/network/pool.go#L81-L88

}
zap.L().Info("network slot pool populate closed")
}()
return pool, nil
}

packages/orchestrator/internal/sandbox/network/storage_kv.go#L67-L68

if status {
return NewSlot(key, slotIdx, s.config)

packages/orchestrator/internal/sandbox/network/storage_kv.go#L89-L96

if slot == nil {
// This is a fallback for the case when all slots are taken.
// There is no Consul lock so it's possible that multiple sandboxes will try to acquire the same slot.
// In this case, only one of them will succeed and other will try with different slots.
reservedKeys, _, keysErr := kv.Keys(s.nodeID+"/", "", nil)
if keysErr != nil {
return nil, fmt.Errorf("failed to read Consul KV: %w", keysErr)

Fix in Cursor Fix in Web


@jakubno jakubno merged commit b7ce7cb into main Oct 8, 2025
25 checks passed
@jakubno jakubno deleted the orchestrator-config branch October 8, 2025 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants