-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching layer #1073
Comments
This is interesting. |
The entire URL would be key |
Clearly, this is ideal for static endpoints that have immutable response. But, if you add to the interface it is possible to pass a custom asynchronous cache revalidation function, then this will already be a cool smart cache layer, the revalidation function should receive all the request parameters and return true or false, if it returns true then the client can give the old response if false, then you need to call the router handler which will return a new response. Thank you, this is a good idea |
All public HTTP web APIs like those you access when browsing the stock market would be ideal candidates for having at least 30 second cache. It's not just static endpoints, it's literally all endpoints that deal with public data. Every single financial app that lets you browse some market would be ideal use.
This kills the point, though. Calling into JS to decide if you don't need to call into JS will not fix anything. The idea is to entirely avoid calling any JS in the hot path, only doing so at sparse intervals to update the cache. |
Arguments against the cache would be that:
The arguments for it would be:
|
Yes, you are probably right, but in our case you cannot give away the cache if it is no longer valid, the cache lifespan is not constant and depends on external factors, so in this case a revalidation function is needed.
I understand, but I see greater potential for optimizing the validation function, inside this function there is no need to include the onAbort handler, there is no need to call cork(), i.e. minus 2 calls from C++ to JS. It’s also very cool if you, on your part, make requests cork, i.e. If you called a function revalidation once and its promise has not yet been resolved, but exactly the same requests came from clients (match url), then you do not call the function revalidated, but continue to wait for the resolve of the promise from the first call of this function, this greatly optimizes the count. Such optimization cannot be done with endpoint handlers, but it can be done with revalidation function, that's why I suggested it, it makes sense if you, on your part, optimize its call (without onAbort() it, without cork(), without repeated calls if promised pending) |
The cache absolutely cannot call any JS whatsoever. That kills the entire purpose. What you can do is:
The cache must operate entirely separate from any JS, so that optimal performance can be delivered. |
I don’t argue, without calling the JS this is an ideal case, it gives +50%, but if the revalidation function gives +20% compared to calling the endpoint handler, it also makes sense. |
So, the API would be something like this? require('uWebSockets.js')
.App()
.get(
'/some-route',
(res, req) => {
// Implementation here.
},
{
// Cache on the "Authorization" header in addition to the URL.
cacheHeaders: ['Authorization'],
// Cache for thirty seconds.
cacheTtl: 1000 * 30,
},
); Would there be other options to control cache size, by number of entries or by memory used? |
Something like that yes |
I propose to auto-assign caching HTTP headers, if the |
It is also possible to automatically generate a response |
I came up with a better option: when creating caching routes, you need to return the controller object of this router so that you can change the contents of the response body and possibly the response headers. Then we will be able to update the cache without calling JS functions on every request, this is an excellent performance and makes it possible to always send actual cache const route = CachingApp().get('/some', (res, req) => body, { ...cacheOptions });
// Later in the application life cycle
// when an event occurs that requires the cache to be updated
// we simply update the cache of the desired route
route.setHeader(newHeder);
route.setBody(newBody); |
|
This behavior will be specific to only CachingApp and CachingSSLApp |
I'm thinking about how the cache would work if something like middlewares were to be added. Maybe it would just append, just like it would without the cache. So middlewares would also have the same arguments: expiry and headers. And they would follow the same logic: if cached, just use cache and continue. Ok, sounds simple. |
Problem with the cache is that, it boosts performance A LOT. But all the "web frameworks" built on uWS.js use their own router and middlewares and crap like that, so they (which would be the most affected by a cache) can't use the cache. So eventually, middlewares needs to be added to uWS.js itself, so that these sluggish derivatives can go away, or at least drastically reduce their involvement. And for middlewares to work, this automatic copy of Request needs to be added. So there are 3 parts to this:
All of this probably warrants a v21 release |
@uasan you are overcomplicating it. Invalidating the cache is just App.invalidate("the pattern string") or similar |
Oh heck no. Async middlewares will never be supported. They are complete lunacy. Sync middlewares should be easy to add. |
Actually, we already support sync middlewares. They are the same as adding a handler that calls setYield(true). So it would be simple syntactic sugar to add App.use. Lol |
But why would you even need sync middlewares? They are literally the exact same thing as functions. Web devs need to learn what functions are, and just start putting common tasks in functions, and call them. It really is that simple. |
TLDR; I can't solve all the problems in the world. People need to help themselves, and if they refuse to change their ways. Well, good luck then. I can only add a cache, that benefits uWS.js users. Whatever people build on top is not my problem. |
Ok, "web frameworks" that use uWS.js can still enjoy the cache, but not with as fine grained control: They can enable cache per-app. As they most likely do
to get inputs. They could also make it slightly more fine grained by having "cache namespaces" instead of just any("/*") they could do:
and they could add as many of them as needed. |
App.invalidate("/the pattern string") Yes, you are right, I am completely satisfied with this. |
Regarding middlewares, I think those who use this pattern in their servers do not need the full performance of uWS, it is important for them to use the familiar pattern and be faster Express, that is their goal. If you try to implement their patterns you will drown in issues from them ) |
You probably want a way to invalidate a very specific part of the cache, not a whole route (as it could be holding many users via parameter match). So as long as thre is some function App.invalidate(method, URL) or whatever, which is super easy to make, should be solved. How exactly it will look like is undecided. |
Yes, ideally it would be possible to invalidate using a route mask and a specific url the second is more necessary. It is also important to do if there is no cache or it is invalid, only 1 request calls the route handler 1 time, until the async handler is resolved, parallel requests to this url should wait for the cache to appear and not call the handler again. |
makes sense, but that will probably not be in first revision |
More questions about limits and cache evictions.
For your business, it is best to make a cache as a free feature, but with low limits; the paid version is the same cache but with higher limits. |
The confusing part is also nginx entirely separate from the app logic but not uws, since we have to define the cache layer options in the JS side. Nginx allows serving static files automatically from the external proxy. But, here proxy is also the uws itself which makes me remembered the inception movie. Well, I got what you mean. After making some brainstorming: import uws from 'uWebSockets.js';
interface MatchedParams
{
ip: string;
url: string;
qs: string;
headers: Record<string, string>;
bodyLength: number;
}
const app = uws
.App()
.cacheRules('key', {
matches(params: MatchedParams): boolean
{
if (params.url === '/my.js') return true;
return false;
},
methods: ['GET'],
statusCodes: [200, 204],
excludeHeaders: [],
maxEntries: Infinity,
revalidate: Infinity, // the revalidate the cache entry
expires: 5, // in seconds
minUses: 5,// same as in nginx(proxy_cache_min_uses), if it hits "minUses" times then cache it
defaultStorage: 'memory', // which means the memory has the highest priority if fs params also set
memory:
{
totalSize: Infinity, // in KB
maxSize: Infinity, // bodyLength
adapter: { type: 'redis', ... } ,
},
fs:
{
totalSize: Infinity, // in KB
maxSize: Infinity, // bodyLength
path: '/my_static_cache',
},
})
.cacheRules('another_key', {
// another params
...
});
// Purge the cache somewhere, anytime
app.purgeCache('key'); On the above example if NGINX supports many things which allow us to validate the cache, like variables or something like that. So, not sure if we can solve this without using matches function. I guess, it's going to execute in the JS stack, right? |
No, you're not following. The cache is entirely separate from JS. No JS code must ever be run, in hot path |
I think many, like me, will look for opportunities to control the cache, but the @uNetworkingAB is right that the cache only makes sense if you don’t call the JS code, this means we must work with the cache, not the It’s just that uWS needs to push changes, and not try handle requests. |
@uNetworkingAB I am tried snippet from CachedHelloWorld.js but seems it is does not work.
|
It will print that it is using cache and you will get a uWS.CachedHttpResponse if you have latest. But macos is broken on GitHub Actions right now so can't update binaries |
|
@uNetworkingAB Tested on Debian bullseye and it is crashes. Seems related to Code above on #1073 (comment) comment vscode ➜ /workspaces/uws (master) $ curl http://localhost:4000/no-cache --output -
{"status":"ok"}vscode ➜ /workspaces/uws (master) $ curl http://alhost:4000/no-cache --output -
curl: (52) Empty reply from server
vscode ➜ /workspaces/uws (master) $ Restarting 'uws-app.js'
Registering cached get handler
Listening at 4000
file:///workspaces/uws/uws-app.js:9
return res.cork(() => {
^
TypeError: res.cork is not a function
at file:///workspaces/uws/uws-app.js:9:16
Node.js v22.5.1 |
Removed
|
Consider using the provided example. |
Example works. And if use only |
I know, I made the example. |
Yes, now i know why it is not working, found at here, thank you |
Can you post your benchmark results of it? |
Yes, sure
Between 20% to 25% performance boost |
20% - 25% the increase in relation to a simple JS synchronous response, if you cache the response of the SQL request, then the increase will be an order of magnitude greater |
How would you do that ( cache response of the sql request ) <- can you give an example of a situation? In which context would you use that ? |
One example is if you write a service that provides public data like the price of gold. With the cache you can write it very lazily with direct calls to the source, without fearing poor performance. The cache makes your lazy coding fast (faster than what you could do in JS yourself). 10 seconds cache is more than enough to make it go from crappy to supercharged, and you need no business logic other than some crappy getter client. |
It's basically rate limiting, the cache. So you can use it to rate limit slow endpoints. |
Well, this is a classic case, the database is usually a bottleneck, it makes sense to cache queries. The concept is simple respondHTPP(await sql`SELECT ....`); You send only 1 request to database, many clients will receive this response from the cache. When the data in the database has changed, you can use the subscription and notification method Postgres, to invalidate the cache |
Yes and then you have basically reimplemented https://rethinkdb.com/ 😄 |
@uNetworkingAB there is also a useful caching mode, now many people use proxies, cloudflare or nginx, they can cache, but they need to give the 304 status if the |
You just add Etag and If-None-Match to the list of headers. Or just don't use cache for those endpoints |
The whole point is in the 304 status, from the server to the proxy or/and browse, zero traffic in the response body and client private cache. |
Cache has no logic. If you want to handle such a case, you need to invalidate the cache by adding Etag and If-None-Match to the cache key and implement that logic in your handler. |
304 status by Etag, implemented all caching servers, calling the JS code for this purpose reduces the efficiency of caching, Etag is not business logic, it is the logic of HTTP caching which is clearly described in the HTTP protocol |
IMHO it's possible to build public cache (no auth, headers, query validation) with // example quite bad but simple code
const { App, DeclarativeResponse } = require("uWebSockets.js");
const { setTimeout } = require("node:timers/promises");
const app = new App();
app.listen(3000, () => {});
// http://127.0.0.1:3000/price-of/au
app.get("/price-of/:thing", async (res, req) => {
const ac = new AbortController();
const signal = ac.signal;
res.onAborted(() => ac.abort());
const thing = req.getParameter("thing");
const path = req.getUrl();
const data = await getPriceInfo(thing, ac.signal);
pricesToCheck.push(thing);
setJsonCache(path, data);
if (!signal.aborted) {
res.cork(() => {
res.writeStatus("200 OK");
res.writeHeader("content-type", "application/json");
res.end(JSON.stringify(data));
});
}
});
// Set cache using DeclarativeResponse / Uint8Array
function setJsonCache(path, data) {
app.get(
path,
new DeclarativeResponse()
.writeHeader("content-type", "application/json")
.end(JSON.stringify(data))
);
}
// Fetch from DB or something
async function getPriceInfo(thing, signal) {
setTimeout(5, null, { signal }); // 5ms
return {
id: thing,
value: parseFloat((Math.random() * 100_000).toFixed(4)),
date: new Date().toISOString(),
};
}
// Refresh cache every 5s
const pricesToCheck = [];
setInterval(async () => {
for (const thing of pricesToCheck) {
setJsonCache(`/price-of/${thing}`, await getPriceInfo(thing));
}
}, 5_000); |
This is understandable, but the closer the cache is to the consumer (private cache in the browser, public in the load balancer), the better and faster, between the server and the consumer, if possible, just 304 statuses without the response body should be transmitted, because the consumer already has this body |
I'm poisoned by the microservices environment - no http response caching on clients side to avoid memory leaks (which is in many cases just stupid). |
Without 304 status, the cache will work, but there will be increased memory consumption, network load, the result is lower speed. It's a pity that TechEmpower does not have a cache test with and without status 304, because solutions with status 304 will always be faster than solutions without it, a low position in the benchmark rating could motivate @uNetworkingAB ) |
any news? |
Another solution to excessive JS calls could be to simply add a caching layer that stores whatever the JS callback wrote, for a duration of X seconds, seding it back from cache until next update.
This could be entirely separate, as a wrapper of SSLApp / App like CachingApp, CachingSSLApp. Like a proxy but built-in.
It could be good for benchmarks while not having to change existing code. It also makes it possible to build real world examples like "what's the price of gold" being updated every 30 seconds, hitting the cache all other times.
Alternatively the API could be that you just add an integer after the handler, for how many seconds the cache is valid, defaulting to 0. Then you could mix cached endpoints with non-cached ones seamlessly.
The text was updated successfully, but these errors were encountered: