-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change SET LOCAL gucs to set_config #1600
Conversation
Turns out that the Using this setup and this k6 script(n=5), I'm getting:
It could be that the Not sure why |
Ouch. Good thing you're testing that.
Maybe it's also because the WITH set AS (
SELECT set_config.... -- just like you did before
)
SELECT NULL FROM set;
Hm. I suppose this could work quite well for all the custom gucs we have. I'm not sure how that would interact with the postgres guc's, though. Role especially - that would need some careful testing with privileges etc... not sure which of those are used at planning time already maybe? In the main query you could do something like: SELECT set_config(key, value, true) FROM json_each_text('{ ... json object with key value pairs ... }'); And then the variables could be put in the query as a json object, just like the body. Of course the pre request would have to be called in here as well, but that should be no problem. |
No luck :(
Looks interesting. I'll revisit this later. The logging improvements(other PR) are looking like a certain improvement for a new release. |
But that's interesting as well. Seems like the added complexity of the It could be, that even when we add this to the main query, this still increases the queries complexity and would in fact be slower. But that's something we'd have to test. In any case I would assume that joining the pre-request to the main query should certainly help, because it is executed with |
6bebfbc
to
e21824d
Compare
I didn't get an increase in performance even when enabling prepared statements for the Perhaps the json approach(#1600 (comment)) could work, but only using I'll close this now, but I'll keep the branch for historical purposes. |
It’s not about the speed for these statements, it’s about security. All the “free form” not “strictly parsed” input should be sent as a parameter, not an inline string. There is a lot of “junk” comming in through headers. |
This is a good change, just like the prepared statements for the main query (but not for the same reason) |
I agree with @ruslantalpa.
Can you clarify that a bit? "increase in performance" compared to what? The original implementation for |
I just did some more testing with only the SQL side of this. The json approach is not going to help. Both with and without json the result without prepared statements is about the same. This might change with prepared statements a bit in favour of json, because the statement would always be the same - but both will still be slower then the But not even considering peformance: The json approach essentially poses the same risk for injection. Either by injecting errors into the json and maybe getting secrets as part of the error message - or by replacing the "role" value with something else. JSON certainly seems less prone to that, because the format is simpler - but still, parametrized statements are the real deal. So from a security perspective, we should definitely go for the prepared statement approach you used here. Each My expectation is, that the best we could do would be to put the set configs into a materialized CTE in front of the main query. This should give us about 50 req/s more. We should not treat this as "making things slower". I feel preparing the |
@wolfgangwalther Yes, I got around 150req/s drop on the SET LOCALs. This was never about security. The So, the main argument for parameterization on our case is performance. Why should we include a change that just makes things slower? To put it simply: Can anyone come up with an input that defeats |
You're of course right, that this is probably more of a theoretical question, not a practical one. Although "guaranteed safe input" is a bit too strong in my opinion. Or if that's the case, then "server-side parametrized queries guarantee this even more" is also true. The thing is, even with the best escaping mechanism implemented - there is still a difference between escaping literals and using parametrized queries. A bug in escaping can have fatal consequences. Parametrizing is just a very different concept. I don't even want to try to come up with an example to defeat Ultimately this might not be only a question of security, but also about trust in postgrest to be secure. If we were able to say: "All our queries are properly parametrized", this would immediately make clear that injections are not possible. I feel it is worth the 50-100 req/s (if optimized in the main query). #294 is also related. |
"escape literal" functions were present in every major language for decades yet the debate was settled a long time ago, to never use them, especially for "user input" when it's not strictly parsed (parsed as we do with sql identifiers). pgFmtLit was "informally" tested by begriffs and me, it was more like: from begriffs "it looks right and it looks like the C code from PG", and from me "hey i ran this automated injection tool and it did not break anything", it was never tested by "true" experts in this field, the kind that can dig a lot lower then the function definition (haskell compiler, llvm, processor code ...), and none of us has the "smarts" to say with 100% certainty that it's safe. If there is a problem with this function, the reason no one found any problems is because no one (smart in this field) had the interest to look, since postgrest was not such a big thing until recently, and there is always the risk that even if they do, they might have the incentive to not disclose that info. to put it simply from my point of view :): i don't trust this function (not even the PG implementation), i trust i telling PG what is a query and what is a parameter. There risk/benefit case here is decisively in the parameterized statements favor. just read what @wolfgangwalther ... what he said |
@wolfgangwalther This is the key point, and I absolutely agree. If we can put this on our docs(say in the Under the Hood page) then that would give PostgREST users a safety guarantee(escaping doesn't look as good, I concur).
I'll reopen. Let's see if this can be made faster. |
We will have to be very careful with moving the I had a similar situation today in a non-PostgREST (yeah, those still exist...) project, where I wanted to change |
@wolfgangwalther Saw the same problem here as well. I tried changing the query to your suggestion: WITH set AS (
SELECT set_config.... -- just like you did before
)
SELECT NULL FROM set; And though it seemed a bit faster, it gave a lot of errors on the test suite( |
Interesting - but that was not on the main query but still separate, right? |
I think the problem with this is, that it is just not executed at all. The "select" part of the This would also explain, why it is faster. Doing nothing is faster :D. Does adding |
I did a few SQL-only tests. I would not expect any CTE to help with performance as long as it's a separate query. I previously did tests as well, but my JSON test was faulty before. I had a better one now and the performance of that is really good actually! Almost as fast as select
set_config(key, value, true)
from
json_each_text($1); The JSON would look like this: {
"search_path": "test, public",
"role": "postgres",
"request.jwt.claim.role": "postgres",
"request.method": "GET",
"request.path": "/projects",
"request.header.host": "localhost:3000",
"request.header.connection": "keep-alive",
"request.header.upgrade-insecure-requests": "1",
"request.header.user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36",
"request.header.sec-fetch-user": "?1",
"request.header.accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"request.header.sec-fetch-site": "none",
"request.header.sec-fetch-mode": "navigate",
"request.header.accept-encoding": "gzip, deflate, br",
"request.header.accept-language": "en-US,en;q=0.9,es;q=0.8",
"app.settings.jwt-secret": "a_test",
"app.settings.other": "another",
"app.settings.test": "a_test"
} Of course this depends on how fast you can build that JSON in haskell. But building JSON should not be slow as it's just string concat.. right? If this performs well, we might need to slightly adjust it though.. we should not send |
I've been bitten by Aeson before, if it turns out to be slow we could try with jsonifier.
Yes, in fact I think the main thing that should be inside the JSON are the headers(which can vary).
I'll try with Aeson for now. Let's see.. |
I did some explain analyze tests for comparing the select * from timeit(100,
$$
select
set_config('search_path', 'test', true)
, set_config('role', 'postgres', true)
, set_config('request.jwt.claim.role', 'postgres', true)
, set_config('request.jwt.claim.role', 'postgres', true)
, set_config('request.method', 'GET', true)
, set_config('request.path', '/projects', true)
, set_config('request.header.host', 'localhost:3000', true)
, set_config('request.header.user-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36', true)
, set_config('request.header.sec-fetch-user', '?1', true)
, set_config('request.header.accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3', true)
, set_config('request.header.sec-fetch-site', 'none', true)
, set_config('request.header.sec-fetch-mode', 'navigate', true)
, set_config('request.header.accept-encoding', 'navigate', true)
, set_config('request.header.accept-language', 'en-US,en;q=0.9,es;q=0.8', true)
, set_config('app.settings.jwt-secret', 'a_test', true)
, set_config('app.settings.other', 'another', true)
, set_config('app.settings.test', 'a_test', true)
$$,
$$
select set_config(key, value, true) from json_each_text('{ "search_path": "test, public", "role": "postgres", "request.jwt.claim.role": "postgres", "request.method": "GET", "request.path": "/projects", "request.header.host": "localhost:3000", "request.header.connection": "keep-alive", "request.header.upgrade-insecure-requests": "1", "request.header.user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36", "request.header.sec-fetch-user": "?1", "request.header.accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", "request.header.sec-fetch-site": "none", "request.header.sec-fetch-mode": "navigate", "request.header.accept-encoding": "gzip, deflate, br", "request.header.accept-language": "en-US,en;q=0.9,es;q=0.8", "app.settings.jwt-secret": "a_test", "app.settings.other": "another", "app.settings.test": "a_test" }'::json)
$$); Results:
Besides the query, we also need to create the json on Haskell(additional operation). So I'm pretty much certain that we'll not gain performance down this route. |
e21824d
to
ac8cc13
Compare
Just asked about SET LOCAL vs SELECT set_config on pg freenode. They basically told me: SET LOCAL is a utility statement with relatively low overhead, whereas SELECT means doing the whole executor startup/shutdown dance as well as returning a result Chat logsteve-chavez | Hi guys. So I have an application that uses `SET LOCAL "some.var"` for application | variables. When changing these `SET LOCALs` to use `SELECT set_config(key,val,true)` I | get slower execution times. I was wondering, is SET LOCAL faster than SELECT set_config? RhodiumToad | mmmmaybe. RhodiumToad | did you try preparing the select? steve-chavez | RhodiumToad: Hey! Yes, I also did that and it's also slower. In fact I converted to | set_config for preparing the query. RhodiumToad | SET LOCAL is a utility statement with relatively low overhead, whereas SELECT means | doing the whole executor startup/shutdown dance as well as returning a result RhodiumToad | I wonder how it would compare using the function-call fastpath to call set_config RhodiumToad | not many clients support that tho steve-chavez | Ahh.. that makes sense. Even with prepare I get slower execution times. steve-chavez | Could I speed up the `set_config` in any way? steve-chavez | any other way* RhodiumToad | hm, maybe try wrapping it in a plpgsql procedure and using CALL. RhodiumToad | no idea if that would be any better. |
Anyway, I think it should be good to merge. @wolfgangwalther WDYT? |
Well... that's exactly what I did, where I found that the JSON query was very fast. I used a slightly different approach, though: To compare with For the repetition I just did, I also added the two queries you had "bare", so without function call. I ran the queries 10,000 times, because they are really fast. But the results were not much different from running 100 times. I pasted the full code here: Here are the results for me with
A couple of things stand out:
Overall, I think running the queries bare is giving wrong results. However, even with that we have quite a difference in relative numbers between the two queries: bare json is ~2x slower for me, while it's 4-5x slower for Steve. But that could be due to lower I am running on PG 12.5. @steve-chavez, you? Can you give the code I posted above a run and see whether that still reports different numbers for both of us? |
@wolfgangwalther Just ran your pastebin sample. Here are my results:
(Ran them on pg 11.5) |
Btw, yesterday I checked the queries with pg_stat_statements and saw that the total time of the
So maybe this is not about that SQL, perhaps the slowness is in the haskell code. |
Hm. Those numbers look pretty similar to mine. Not sure what happened on your first run with
Just looking at the pastebin you posted: That query has 24 parameters. Maybe those hasql |
That's weird. Because as in this PR, I'm only parametrizing the 2nd(the value). I'll see what's going on there. Not sure if it's a |
Argh. When I said this, I likely made the test on the wrong version(the unprepared one I believe). At the time I probably looked at this PR from the wrong approach: wanted more performance, instead of more security. So anyway, I've repeated the tests and this is what I get. Locally, with a
On an AWS
SET LOCAL is still faster, but just slightly over a prepared |
Yeah, true. I didn't have a deeper look at the code, yet. Of course, you're not parametrizing the |
As stated in #1609 I gave Here are the numbers for current master, this branch and the commit right before we introduced many more prepared statements. The request is just a simple
So this confirms your results. Parametrizing each However, I still think the json approach could be a bit tiny faster - probably most notably with workloads that run different queries. Different queries will result in different prepared queries in the current implementation, while the json query would always stay the same allowing more re-use. Not sure how much of that would be lost because of json encoding in haskell, though. |
The first argument in |
Note: I just pushed the cirrus fix directly to master, because all other branches are still failing with the low memory error. So this is taken care of now. |
4d0fd6e
to
691d604
Compare
That would indeed happen on different number of headers/cookies/claims. The number of headers can vary easily. This one is kinda like the IN vs ANY optimization I made on #1633 (comment). But I'm also not sure of the json encoding speed. I'll merge this one. I'll leave the json optimization for another PR :) |
the "set env" query can be merged with the main query, the important part is that the main query (and any other query) that needs to be executed after the "set" CTE, need to select from "set" CTE like so
|
The query I had in #1600 (comment) was roughly like so:
In theory, this is exactly what you say: Everywhere I need the new values I need to reference When doing this with smaller examples like you showed here, I always had the expected results as well. But not on the big set. It might have been caching issues, because I tried different things in the same transaction. Still, this feels like something that we need to be 100% sure about. Maybe we can check back wiith PG developers about this. |
I asked on the pgsql general mailing list and Tom Lane told me better not to do this: https://www.postgresql.org/message-id/flat/cf303847-0bd1-4eca-2b3c-3055416df8bb%40technowledgy.de. |
Changes how we set the GUCs from using SET LOCALs to using set_config.
We go from this:
To this:
Note that it saves around ~5ms.
I'm working on an updated benchmark and this might get us a few more requests per second.