Skip to content

Conversation

@commonware-llm
Copy link
Contributor

Add a Manager and Instance API for creating multiple isolated runtime
instances that share a single deterministic executor. Each instance has:

  • Its own IP address for network operations
  • Its own storage namespace (partitions are prefixed)
  • Its own metrics namespace (metrics are prefixed)

This enables simulating multiple distributed system nodes in tests where
each node needs network isolation while sharing the same simulated time
and task scheduler.

Closes #2417

Add a Manager and Instance API for creating multiple isolated runtime
instances that share a single deterministic executor. Each instance has:

- Its own IP address for network operations
- Its own storage namespace (partitions are prefixed)
- Its own metrics namespace (metrics are prefixed)

This enables simulating multiple distributed system nodes in tests where
each node needs network isolation while sharing the same simulated time
and task scheduler.

Closes #2417
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

///
/// This shares the ephemeral port counter and listeners with other networks
/// created from the same base, but uses a different IP for its own connections.
pub fn with_ip(&self, ip: Ipv4Addr) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Require an IP (no defaults)

The Instance::context() method was incorrectly cloning the tree directly
with self.context.tree.clone() instead of creating a proper child tree
via Tree::child(). This caused multiple contexts from the same instance
to share the same tree node, leading to panics when spawning from
multiple contexts.

Fixed by using Tree::child() to create proper child trees, matching the
behavior of the standard Context::clone() method.

Added tests to verify:
- Dropping one context does not affect others
- Multiple contexts from the same instance can spawn independently
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 13, 2025

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: 5f23038
Status: ✅  Deploy successful!
Preview URL: https://b5f32c03.monorepo-eu0.pages.dev
Branch Preview URL: https://claude-issue-2417-0167rvrpew.monorepo-eu0.pages.dev

View logs

// Return a context with the instance namespace as its label
// Create a proper child tree for supervision
let (child, _) = Tree::child(&self.context.tree);
let name = format!("i{}", self.namespace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling encode() on this spawned context won't lead to isolated metrics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we wan't joined (just a guarantee not to conflict)?

It would be cool to have both...

Add encode() methods to Instance and Manager:
- Instance::encode() returns only that instance's metrics (filtered by prefix)
- Manager::encode() returns all metrics from all instances

This ensures proper isolation where each instance only sees its own
metrics, while the manager can observe the complete picture.

Updated test to verify:
- Instance1's encode() only shows instance1's metrics
- Instance2's encode() only shows instance2's metrics
- Manager's encode() shows metrics from all instances
@patrick-ogrady patrick-ogrady marked this pull request as draft December 13, 2025 02:33
- Manager now tracks allocated IPs and panics on duplicate allocation
- Instance::encode() strips the instance prefix from metric names,
  returning clean metrics without the namespace prefix
- Manager::encode() still returns all metrics with their full prefixes

Added test for duplicate IP panic behavior.
Migrate the link, bandwidth, and transmitter modules from p2p::simulated
to runtime::simulated, making them generic over peer and channel types.
This allows the network simulation infrastructure to be reused across
different contexts (e.g., the multihead runtime with Ipv4Addr peers).

Changes:
- Add runtime/src/simulated/ module with Link, bandwidth, and transmitter
- Make transmitter generic over peer type P and channel type C
- Re-export runtime's Link from p2p::simulated::ingress
- Update p2p::simulated::transmitter to use type aliases wrapping runtime
- Delete p2p/src/simulated/bandwidth.rs (now in runtime)
- Simplify multihead module by removing incompatible LinkAwareNetwork wrapper
- Add required dependencies (num-rational, num-traits, rand_distr) to runtime
@codecov
Copy link

codecov bot commented Dec 13, 2025

Codecov Report

❌ Patch coverage is 94.94128% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.51%. Comparing base (b9d9c51) to head (19e283a).

Files with missing lines Patch % Lines
runtime/src/simulated/transmitter.rs 95.83% 28 Missing ⚠️
runtime/src/deterministic.rs 92.65% 23 Missing ⚠️
runtime/src/network/deterministic.rs 76.92% 3 Missing ⚠️
runtime/src/simulated/link.rs 97.89% 2 Missing ⚠️
@@            Coverage Diff             @@
##             main    #2492      +/-   ##
==========================================
- Coverage   92.52%   92.51%   -0.02%     
==========================================
  Files         336      338       +2     
  Lines       96743    96999     +256     
==========================================
+ Hits        89510    89734     +224     
- Misses       7233     7265      +32     
Files with missing lines Coverage Δ
p2p/src/simulated/ingress.rs 99.33% <ø> (ø)
p2p/src/simulated/mod.rs 98.05% <ø> (ø)
p2p/src/simulated/transmitter.rs 100.00% <100.00%> (+2.93%) ⬆️
runtime/src/lib.rs 96.95% <ø> (-0.14%) ⬇️
runtime/src/simulated/bandwidth.rs 95.01% <ø> (ø)
runtime/src/simulated/link.rs 97.89% <97.89%> (ø)
runtime/src/network/deterministic.rs 94.87% <76.92%> (-3.66%) ⬇️
runtime/src/deterministic.rs 95.08% <92.65%> (-0.95%) ⬇️
runtime/src/simulated/transmitter.rs 95.83% <95.83%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9d9c51...19e283a. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add a high-level Router struct to runtime::simulated that combines link
management with message scheduling. The Router handles latency sampling,
success rate determination, and bandwidth-limited delivery.

Refactor p2p::simulated to use the runtime's Router, making it a thinner
wrapper that focuses on peer identity and network connections. The
DeliveryLink struct now only handles actual network delivery since
scheduling logic moved to the Router.
Add test examples demonstrating how to use multihead runtime instances
with authenticated networks. Each peer runs on its own Instance with:
- A unique IP address (same port, different IPs)
- Isolated storage namespace
- Isolated metrics prefix

The lookup test (test_multihead_connectivity) demonstrates full peer-to-peer
messaging where each peer sends its public key to all others.

The discovery test (test_multihead_instances) demonstrates instance creation
and isolation verification, showing how discovery networks can be configured
with unique IPs per instance.
Add two tests to runtime::simulated::router demonstrating that:

1. `messages_only_flow_between_linked_ips`: Messages only flow between
   IPs that have explicit links configured. Links are unidirectional
   (A→B doesn't imply B→A), and removing a link stops message flow.

2. `network_partition_simulation`: Shows how to simulate network
   partitions by removing cross-partition links. Nodes within each
   partition can still communicate, but cross-partition messages fail.

These tests confirm that the Router properly enforces link topology
when used with IP addresses as peer identifiers.
///
/// Link can be called before a peer is registered or bandwidth is specified.
pub async fn add_link(&mut self, sender: P, receiver: P, config: Link) -> Result<(), Error> {
pub async fn add_link(&mut self, sender: P, receiver: P, link: Link) -> Result<(), Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, it is probably better to just remove all link management from p2p::simulated and use runtime for it?

Simplify p2p::simulated by removing explicit link management. Peers in
the same peer set can now communicate directly without needing to call
add_link/remove_link.

Key changes:
- Remove Link struct and link-related methods from p2p::simulated
- Remove bandwidth.rs and transmitter.rs from p2p::simulated
- Move link/bandwidth simulation logic to runtime::simulated module
- Add Router struct to runtime::simulated with add_link, update_link,
  remove_link, and bandwidth limiting capabilities
- Update all dependent crates to remove explicit link calls
- Update AGENTS.md documentation to reflect new architecture

The network simulation (latency, jitter, bandwidth limiting) is now
handled at the runtime layer via runtime::simulated::Router, enabling
the upcoming multi-headed runtime to provide realistic network conditions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[runtime/deterministic] Add Support for Multi-Headed Runtime

4 participants