Skip to content

Conversation

eddyashton
Copy link
Member

First change to solidify snapshot fetching, hoping to be iteratively backportable:

Change the redirect response header from Location: /node/snapshot/XXX to Location: https://1.2.3.4/node/snapshot/XXX. This prevents the client returning to the load balancer and being silently redirected to a different node, and is consistent with other nodefrontend-specific redirects.

Requires a frustratingly large diff, I'll justify/explain that inline.

@eddyashton eddyashton requested a review from a team as a code owner October 17, 2025 15:24
@Copilot Copilot AI review requested due to automatic review settings October 17, 2025 15:24
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes snapshot redirect responses to use full URLs instead of relative paths, ensuring clients connect directly to the correct node rather than potentially being redirected through a load balancer. The change updates both the redirect generation logic and the client-side handling to use full URLs.

  • Modifies the node frontend to generate full HTTPS URLs in redirect responses for snapshot endpoints
  • Updates the snapshot fetching client code to properly handle redirect URLs using CURL's built-in redirect following
  • Updates tests to validate the new full URL redirect behavior

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/node/rpc/node_frontend.h Generates full HTTPS URLs for snapshot redirects using node's published address
src/snapshots/fetch.h Refactors redirect handling to use CURL's CURLINFO_REDIRECT_URL and follows redirects iteratively
src/http/curl.h Changes URL parameter from rvalue reference to const reference for reusability
tests/e2e_operations.py Updates tests to expect full URLs in redirect responses and uses path for subsequent requests

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +1942 to +1943
auto redirect_url =
fmt::format("https://{}/node/snapshot/{}", address, snapshot_path);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core goal - changing this value.

}

const auto& address =
info->rpc_interfaces[interface_id.value()].published_address;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: This is a buggy pattern - what if the interface isn't present on this node? In this specific instance it should be fine (we're currently only asking for an interface on ourselves, based on a request that we are parsing). But we'll fix this (and other uses of the same pattern) separately, and before this redirects to other nodes which may not have the same interfaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be an error at this point, there is no reliable way to guess a suitable alternative.

Comment on lines +53 to +54
// Make initial requests, following redirects to a specific snapshot,
// resulting in final path and snapshot size
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff here looks big but semantic change is smaller.

In pseudocode, previously:

string snapshot_path;
{
  request = do_head_request(initial_url);
  assert(request.response == REDIRECT);
  snapshot_path = request.headers["Location"];
}

size_t content_size;
{
  request = do_head_request(snapshot_path);
  assert(request.response == OK);
  content_size = request.headers["Content-size"];
}

do_actual_fetch_loop(snapshot_url, content_size);

We need to change the calling pattern slightly to handle the location header being a full URL, not just a path, and potentially on a different host. Curl's FOLLOW_REDIRECT would do the magic, but annoyingly won't then tell us the final path! So we do the loop ourself, updating snapshot_url each time, and extract content_size from the first non-redirect:

string snapshot_url = initial_url;
size_t content_size;
while (redirect_count < max_redirects)
{
  request = do_head_request(initial_url);
  if (request.response == OK)
  {
    content_size = request.headers["Content-size"];
    break;
  }
  
  assert(request.response == REDIRECT);
  snapshot_url = curl_helper_to_get_location_as_full_url(request);
  ++redirect_count;
}

do_actual_fetch_loop(snapshot_url, content_size);

content_size);
if (ec != std::errc())
{
throw std::runtime_error(fmt::format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear on why it matters to get the size ahead of time, and why the HEAD requests needs to exist. Isn't there a way to go straight for the snapshot, with a range restriction, and if we get lucky get it right back, or otherwise be redirected?
Why do we need HEAD first to get the size?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard HTTP flow seems to be to return a 416 if the Range cannot be satisfied, but if the Range is strictly bigger than the document, we could opt to return the bytes we have without error.

If say a connecting node asks for bytes=0-1000, and the snapshot is only 500 bytes, we could just return 500 bytes. If it's longer we can ask for the next chunk etc. We can return a Content-Range on the first response to give the full size and avoid an error on the last chunk (Client: "give me 0-1000", Server: "Here is 0-1000, by the way the document is 1700 long", Client: "Ok, give me 1001-1700 now then").

It's not standard, but it's going to save at least one round trip and possibly more - we always gate that behaviour behind a User-Agent if we want to preserve compatibility with standard clients (which we don't expect will call this interface).

Copy link
Member

@achamayou achamayou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely goes in the right direction, but I think we want a clear error when we have an interface mismatch. Also I wonder if we can do something more efficient (see comment above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants