Skip to content

Conversation

ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Sep 5, 2025

This PR improves error handling between host and guest functions to prevent memory leaks and ensure more reliable deserialization.

Previously, when a host function (invoked from the guest) returned an error, the error was reported immediately without unwinding the guest stack. This left guest-side allocations in an inconsistent state, leading to memory leaks on subsequent entries into the guest.

In addition, error reporting relied on a fragile mechanism: guest errors were manually serialized into a buffer, and the host would attempt to detect them by trying to deserialize an error. If deserialization succeeded, an error was assumed to have occurred. This approach is risky because there was nothing preventing GuestError and FunctionCallResult from possibly having the same serialized format since they are completely separate.

Changes

  • Guest/host function calls now always return a FunctionCallResult, which explicitly represents either Ok or Err.
  • If host function returns an error, it's serialized back into the guest, and the guest will properly unwind, and report it back to the host, fixing a memory leak.

TODO:

For C guests: If host function returns an error, guest will panic when trying to read an expected good value (because there is only an error instead), and leak memory as a result. Will fix this in follow up PR

Closes #826
Closes #497

@ludfjig ludfjig force-pushed the host_error_leak_fix branch 2 times, most recently from d346b32 to 2e45037 Compare September 8, 2025 18:07
@ludfjig ludfjig added the kind/bugfix For PRs that fix bugs label Sep 8, 2025
@ludfjig ludfjig force-pushed the host_error_leak_fix branch 4 times, most recently from 2e83739 to fac5e92 Compare September 9, 2025 17:55
- Add all.fbs to include all schema files in one place
- Restructure function_call_result.fbs to use Result-like union
- Add HostError variant to ErrorCode enum in guest_error.fbs
- Update flatbuffer generation command in Justfile to use all.fbs
- Update documentation for new generation process

Signed-off-by: Ludvig Liljenberg <[email protected]>
Update all generated Rust code based on the new schema definitions.
This includes new types for error handling and result structures.

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Update function_types.rs to handle Result-like return values
- Simplify guest_error.rs wrapper implementation
- Update util.rs for new generated types
- Update mod.rs  for new generated types

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Remove guest_err.rs from hyperlight_host (replaced by new error handling)
- Remove guest_err.rs from hyperlight_guest_bin (replaced by new error handling)
- Update func/mod.rs to remove obsolete import

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Update initialized_multi_use.rs to use new Result-like error handling
- Update mem/mgr.rs to handle host function errors properly
- Update sandbox/outb.rs for new error propagation pattern

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Update guest/host_comm.rs to use new Result-like return values
- Update guest_bin/call.rs to properly handle host function errors
- Update guest_bin/lib.rs to remove obsolete error handling import and make GUEST_HANDLE public (for use in C-API)
- Update guest_capi/error.rs to support new error types

Signed-off-by: Ludvig Liljenberg <[email protected]>
Update sandbox_host_tests.rs to use the new Result-like error handling pattern.

Signed-off-by: Ludvig Liljenberg <[email protected]>
Update Cargo.lock and Cargo.toml files to reflect the dependency changes
needed for the new error handling implementation.

Signed-off-by: Ludvig Liljenberg <[email protected]>
@ludfjig ludfjig force-pushed the host_error_leak_fix branch from fac5e92 to 8818cf5 Compare September 9, 2025 17:56
Signed-off-by: Ludvig Liljenberg <[email protected]>
Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! overall this looks like an improvement but others should probably take a look as this was my first time reviewing this code.

@@ -81,10 +82,12 @@ pub(crate) fn call_guest_function(function_call: FunctionCall) -> Result<Vec<u8>

// This function is marked as no_mangle/inline to prevent the compiler from inlining it , if its inlined the epilogue will not be called
// and we will leak memory as the epilogue will not be called as halt() is not going to return.
//
// This function may panic, as we have no other ways of dealing with errors at this level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a tricky change, although the previous version looks to have just ignored errors?

@@ -374,6 +374,7 @@ fn host_function_error() -> Result<()> {
assert!(
matches!(&res, HyperlightError::GuestError(_, msg) if msg == "Host function error!") // rust guest
|| matches!(&res, HyperlightError::GuestAborted(_, msg) if msg.contains("Host function error!")) // c guest
|| matches!(&res, HyperlightError::StackOverflow()) // c guest. TODO fix this. C guest leaks when host func returns error guest panics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have an issue for this? was this the case prior?

// .map_err(|e| anyhow!("Failed to get ReturnValue from bytes: {:?}", e))?;
// function_call_result_fb.try_into()
// }
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this?

/// The builder should not be reused after a call to encode, since this function
/// does not reset the state of the builder. If you want to reuse the builder,
/// you'll need to reset it first.
pub fn encode<'a>(&self, builder: &'a mut flatbuffers::FlatBufferBuilder) -> &'a [u8] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some sanity tests like we do with

fn read_from_flatbuffer() -> Result<()> {
?


for _ in 0..1000 {
let res = init_sandbox
.call::<i32>("GuestMethod1", msg.to_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we also add a test that demonstrates how a guest would successfully handle a host function and continue processing with this new type? Do we have that already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bugfix For PRs that fix bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory Leak in Guest on Error Calling Host Function Confusing ERROR tracing when not actually an error
2 participants