-
Notifications
You must be signed in to change notification settings - Fork 169
[runtime] Add multi-headed deterministic runtime support #2492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add a Manager and Instance API for creating multiple isolated runtime instances that share a single deterministic executor. Each instance has: - Its own IP address for network operations - Its own storage namespace (partitions are prefixed) - Its own metrics namespace (metrics are prefixed) This enables simulating multiple distributed system nodes in tests where each node needs network isolation while sharing the same simulated time and task scheduler. Closes #2417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// | ||
| /// This shares the ephemeral port counter and listeners with other networks | ||
| /// created from the same base, but uses a different IP for its own connections. | ||
| pub fn with_ip(&self, ip: Ipv4Addr) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Require an IP (no defaults)
The Instance::context() method was incorrectly cloning the tree directly with self.context.tree.clone() instead of creating a proper child tree via Tree::child(). This caused multiple contexts from the same instance to share the same tree node, leading to panics when spawning from multiple contexts. Fixed by using Tree::child() to create proper child trees, matching the behavior of the standard Context::clone() method. Added tests to verify: - Dropping one context does not affect others - Multiple contexts from the same instance can spawn independently
Deploying monorepo with
|
| Latest commit: |
5f23038
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://b5f32c03.monorepo-eu0.pages.dev |
| Branch Preview URL: | https://claude-issue-2417-0167rvrpew.monorepo-eu0.pages.dev |
| // Return a context with the instance namespace as its label | ||
| // Create a proper child tree for supervision | ||
| let (child, _) = Tree::child(&self.context.tree); | ||
| let name = format!("i{}", self.namespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling encode() on this spawned context won't lead to isolated metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we wan't joined (just a guarantee not to conflict)?
It would be cool to have both...
Add encode() methods to Instance and Manager: - Instance::encode() returns only that instance's metrics (filtered by prefix) - Manager::encode() returns all metrics from all instances This ensures proper isolation where each instance only sees its own metrics, while the manager can observe the complete picture. Updated test to verify: - Instance1's encode() only shows instance1's metrics - Instance2's encode() only shows instance2's metrics - Manager's encode() shows metrics from all instances
- Manager now tracks allocated IPs and panics on duplicate allocation - Instance::encode() strips the instance prefix from metric names, returning clean metrics without the namespace prefix - Manager::encode() still returns all metrics with their full prefixes Added test for duplicate IP panic behavior.
Migrate the link, bandwidth, and transmitter modules from p2p::simulated to runtime::simulated, making them generic over peer and channel types. This allows the network simulation infrastructure to be reused across different contexts (e.g., the multihead runtime with Ipv4Addr peers). Changes: - Add runtime/src/simulated/ module with Link, bandwidth, and transmitter - Make transmitter generic over peer type P and channel type C - Re-export runtime's Link from p2p::simulated::ingress - Update p2p::simulated::transmitter to use type aliases wrapping runtime - Delete p2p/src/simulated/bandwidth.rs (now in runtime) - Simplify multihead module by removing incompatible LinkAwareNetwork wrapper - Add required dependencies (num-rational, num-traits, rand_distr) to runtime
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## main #2492 +/- ##
==========================================
- Coverage 92.52% 92.51% -0.02%
==========================================
Files 336 338 +2
Lines 96743 96999 +256
==========================================
+ Hits 89510 89734 +224
- Misses 7233 7265 +32
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Add a high-level Router struct to runtime::simulated that combines link management with message scheduling. The Router handles latency sampling, success rate determination, and bandwidth-limited delivery. Refactor p2p::simulated to use the runtime's Router, making it a thinner wrapper that focuses on peer identity and network connections. The DeliveryLink struct now only handles actual network delivery since scheduling logic moved to the Router.
Add test examples demonstrating how to use multihead runtime instances with authenticated networks. Each peer runs on its own Instance with: - A unique IP address (same port, different IPs) - Isolated storage namespace - Isolated metrics prefix The lookup test (test_multihead_connectivity) demonstrates full peer-to-peer messaging where each peer sends its public key to all others. The discovery test (test_multihead_instances) demonstrates instance creation and isolation verification, showing how discovery networks can be configured with unique IPs per instance.
Add two tests to runtime::simulated::router demonstrating that: 1. `messages_only_flow_between_linked_ips`: Messages only flow between IPs that have explicit links configured. Links are unidirectional (A→B doesn't imply B→A), and removing a link stops message flow. 2. `network_partition_simulation`: Shows how to simulate network partitions by removing cross-partition links. Nodes within each partition can still communicate, but cross-partition messages fail. These tests confirm that the Router properly enforces link topology when used with IP addresses as peer identifiers.
p2p/src/simulated/ingress.rs
Outdated
| /// | ||
| /// Link can be called before a peer is registered or bandwidth is specified. | ||
| pub async fn add_link(&mut self, sender: P, receiver: P, config: Link) -> Result<(), Error> { | ||
| pub async fn add_link(&mut self, sender: P, receiver: P, link: Link) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, it is probably better to just remove all link management from p2p::simulated and use runtime for it?
Simplify p2p::simulated by removing explicit link management. Peers in the same peer set can now communicate directly without needing to call add_link/remove_link. Key changes: - Remove Link struct and link-related methods from p2p::simulated - Remove bandwidth.rs and transmitter.rs from p2p::simulated - Move link/bandwidth simulation logic to runtime::simulated module - Add Router struct to runtime::simulated with add_link, update_link, remove_link, and bandwidth limiting capabilities - Update all dependent crates to remove explicit link calls - Update AGENTS.md documentation to reflect new architecture The network simulation (latency, jitter, bandwidth limiting) is now handled at the runtime layer via runtime::simulated::Router, enabling the upcoming multi-headed runtime to provide realistic network conditions.
Add a Manager and Instance API for creating multiple isolated runtime
instances that share a single deterministic executor. Each instance has:
This enables simulating multiple distributed system nodes in tests where
each node needs network isolation while sharing the same simulated time
and task scheduler.
Closes #2417