-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDN-friendly HTTP API #20
Comments
Putting the query text in a header value I understand the reason for it is to retain all the benefits that come along with the caching semantics of
So, it's unexpected, and therefore you probably couldn't expect an intermediate cache to respect it. But AFAICT, we don't expect the intermediate cache to do anything with the query text anyway, right? Or is it necessary because we want to That Stackoverflow discussion also points out that Elasticsearch uses |
I got this working with Varnish (see https://github.com/splitgraph/seafowl/blob/20-cdn-friendly-http-api/src/http.rs#L109): Setup: Varnish running on :80, proxying to Seafowl vcl 4.1;
backend seafowl {
.host = "127.0.0.1";
.port = "3030";
}
sub vcl_recv {
if (req.method != "GET" && req.method != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
}
}
sub vcl_backend_response {
set beresp.ttl = 10s;
set beresp.grace = 1h;
if (beresp.status >= 400) {
set beresp.ttl = 0s;
set beresp.grace = 0s;
}
} Attempt 1: query doesn't match the hash
Correct hash
Same curl again (varnish is in grace period, so gives us the result but sends a query to Seafowl, uses the etag even though we didn't pass one)
curl with changed etag -- note varnish ignores the etag that we pass
Update the dataset in Seafowl by inserting an empty row to it
Make the same request again to Varnish (twice, since the first time the reply is in grace)
Note the changed table version and the different etag (which caused the query to execute)
|
It's a good question. For reference, some limits that I got from CDN docs:
The request body limits are much higher (e.g. 100MB on the CF Free plan). Do we trust intermediate caches to forward the GET request body more than X-... headers?
The current implementation doesn't do that, but there is a case where, if the query doesn't match the hash in the URL, it will return a 400 error. This seems to be cached by Varnish by default (breaking that URL hash). I had to explicitly tell it to not cache 4xx responses; an alternative is varying on X-Seafowl-Query which we can't do if the query is the body. |
Here's an interesting thing I found yesterday: Cloudflare, by default, without getting (metered) page rules involved, caches assets based on their extension and doesn't cache HTML files: https://developers.cloudflare.com/cache/about/default-cache-behavior/#default-cached-file-extensions. So having a URL like |
(keeping it open, branch merged to keep main up to date) |
After sketching out the docs and thinking about what a good first tutorial would be, as well as how people would add data to Seafowl, this looks like a decent HTTP API for this:
some other considerations:
|
Makes sense to me. That's basically in line with how Elasticsearch handles it (in the sense of offering the more "normal" API for basic curl usage). I guess the risk is that if developers write their initial hello world demo with the We might be able to make the |
That's a decent idea actually, though the 301 would have to indicate to the requester to not only resubmit to |
I'm not aware of a way for the server to provide the client with a new body to send. But a 307 redirect will reuse the method and body of the request, so it's not perfect but it works if the client provides a copy of the query in both the body and URL parameter (or even only in the body). |
There is 303 See Other (wiki) which is interesting but doesn't seem to allow changing the body of the next request. Worth a read though |
I'm closing this to get the satisfaction of ticking something off and because this is now usable (see https://observablehq.com/@mildbyte/hello-seafowl). I created some follow-up issues which we might address before the first release: |
Implement a read-only HTTP API that obeys HTTP cache semantics and can benefit from any CDN / cache like Varnish:
Client sends a query like:
The server receives the query, checks it's a SELECT query, checks that the hash matches, executes it. It sends it back with an ETag that is a function of the versions of all tables that participate in this query:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#directives
When the client's browser queries the data again, it can then pass the ETag to see if the query results have changed:
Intermediate caches/CDNs might not even forward the query to the origin, instead serving it from cache if it's not stale. If it reaches the server, the server can cheaply revalidate the entry by recomputing the etag and responding with a
304 Not Modified
if the tables in the query haven't changed their versions (without having to execute the query).This only works for
SELECT
queries.INSERT
and other writes should come in viaPOST
requests.The text was updated successfully, but these errors were encountered: