Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change SET LOCAL gucs to set_config #1600

Merged
merged 2 commits into from
Dec 8, 2020

Conversation

steve-chavez
Copy link
Member

Changes how we set the GUCs from using SET LOCALs to using set_config.

We go from this:

SET LOCAL search_path = 'test', 'public';SET LOCAL "role" = 'postgres';SET LOCAL "request.jwt.claim.role" = 'postgres';
SET LOCAL "request.method" = 'GET';SET LOCAL "request.path" = '/projects';SET LOCAL "request.header.host" = 'localhost:3000';
SET LOCAL "request.header.connection" = 'keep-alive';SET LOCAL "request.header.upgrade-insecure-requests" = '1';
SET LOCAL "request.header.user-agent" = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36';
SET LOCAL "request.header.sec-fetch-user" = '?1';SET LOCAL "request.header.accept" = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3';
SET LOCAL "request.header.sec-fetch-site" = 'none';SET LOCAL "request.header.sec-fetch-mode" = 'navigate';
SET LOCAL "request.header.accept-encoding" = 'gzip, deflate, br';SET LOCAL "request.header.accept-language" = 'en-US,en;q=0.9,es;q=0.8';
SET LOCAL "app.settings.jwt-secret" = 'a_test';
SET LOCAL "app.settings.other" = 'another';
SET LOCAL "app.settings.test" = 'a_test';

-- Time: 0.265 ms
-- Time: 0.231 ms
-- Time: 0.249 ms
-- Time: 0.300 ms
-- Time: 0.213 ms
-- Time: 0.283 ms
-- Time: 0.351 ms
-- Time: 0.196 ms
-- Time: 0.230 ms
-- Time: 0.338 ms
-- Time: 0.251 ms
-- Time: 0.236 ms
-- Time: 0.208 ms
-- Time: 0.333 ms
-- Time: 0.265 ms
-- Time: 0.359 ms
-- Time: 0.263 ms
-- Time: 0.305 ms
-- Time: 0.273 ms
-- Time: 0.451 ms
-- Time: 0.350 ms
-- Time: 0.359 ms

-- Total: 6.309 ms

To this:

select
  set_config('search_path', 'test', true)
, set_config('role', 'postgres', true)
, set_config('request.jwt.claim.role', 'postgres', true)
, set_config('request.jwt.claim.role', 'postgres', true)
, set_config('request.method', 'GET', true)
, set_config('request.path', '/projects', true)
, set_config('request.header.host', 'localhost:3000', true)
, set_config('request.header.user-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36', true)
, set_config('request.header.sec-fetch-user', '?1', true)
, set_config('request.header.accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3', true)
, set_config('request.header.sec-fetch-site', 'none', true)
, set_config('request.header.sec-fetch-mode', 'navigate', true)
, set_config('request.header.accept-encoding', 'navigate', true)
, set_config('request.header.accept-language', 'en-US,en;q=0.9,es;q=0.8', true)
, set_config('app.settings.jwt-secret', 'a_test', true)
, set_config('app.settings.other', 'another', true)
, set_config('app.settings.test', 'a_test', true);

--   Time: 1.841 ms

Note that it saves around ~5ms.

I'm working on an updated benchmark and this might get us a few more requests per second.

@steve-chavez
Copy link
Member Author

Turns out that the set_configs are slower than the SET LOCALs, according to some load tests I've done.

Using this setup and this k6 script(n=5), I'm getting:

  • set_config: 63 req/s less on average.
  • do $$begin set_config... $$: 120 req/s less on average.

It could be that the do begin overhead is because of more parsing time for pg.

Not sure why set_config is slower, maybe because SET LOCAL is more low level and set_config is a wrapper function?

@wolfgangwalther
Copy link
Member

Turns out that the set_configs are slower than the SET LOCALs, according to some load tests I've done.

Ouch. Good thing you're testing that.

Not sure why set_config is slower, maybe because SET LOCAL is more low level and set_config is a wrapper function?

Maybe it's also because the SELECT returns a result. You could try something like:

WITH set AS (
  SELECT set_config.... -- just like you did before
)
SELECT NULL FROM set;

Doing the gucs on theWITH might be worth exploring. Perhaps we can even include them on our main query somehow.

Hm. I suppose this could work quite well for all the custom gucs we have. I'm not sure how that would interact with the postgres guc's, though. Role especially - that would need some careful testing with privileges etc... not sure which of those are used at planning time already maybe?

In the main query you could do something like:

SELECT set_config(key, value, true) FROM json_each_text('{ ... json object with key value pairs ... }');

And then the variables could be put in the query as a json object, just like the body.

Of course the pre request would have to be called in here as well, but that should be no problem.

@steve-chavez
Copy link
Member Author

WITH set AS (
SELECT set_config.... -- just like you did before
)
SELECT NULL FROM set;

No luck :(
I'm getting 86 req/s less with this variant.

SELECT set_config(key, value, true) FROM json_each_text('{ ... json object with key value pairs ... }');

Looks interesting. I'll revisit this later. The logging improvements(other PR) are looking like a certain improvement for a new release.

@wolfgangwalther
Copy link
Member

WITH set AS (
SELECT set_config.... -- just like you did before
)
SELECT NULL FROM set;

No luck :(
I'm getting 86 req/s less with this variant.

But that's interesting as well. Seems like the added complexity of the SELECT statement causes a further decrease. I would assume that it's not the set_config call itself that is taking longer - but the fact that a SELECT is used compared to just SET (those are probably implemented a lot simpler).

It could be, that even when we add this to the main query, this still increases the queries complexity and would in fact be slower. But that's something we'd have to test.

In any case I would assume that joining the pre-request to the main query should certainly help, because it is executed with SELECT anyway. I see you have some "pre-request included benchmark" planned as well - so that might give us some idea on that.

@steve-chavez
Copy link
Member Author

I didn't get an increase in performance even when enabling prepared statements for the set_configs .

Perhaps the json approach(#1600 (comment)) could work, but only using set_configs definitely didn't help.

I'll close this now, but I'll keep the branch for historical purposes.

@ruslantalpa
Copy link
Contributor

It’s not about the speed for these statements, it’s about security. All the “free form” not “strictly parsed” input should be sent as a parameter, not an inline string. There is a lot of “junk” comming in through headers.

@ruslantalpa
Copy link
Contributor

ruslantalpa commented Nov 28, 2020

This is a good change, just like the prepared statements for the main query (but not for the same reason)

@wolfgangwalther
Copy link
Member

I agree with @ruslantalpa.

I didn't get an increase in performance even when enabling prepared statements for the set_configs .

Can you clarify that a bit? "increase in performance" compared to what? The original implementation for SET or the other approaches for set_config? I understand the latter, so the prepared set_config is in fact still slower than the current SET, right?

@wolfgangwalther
Copy link
Member

Perhaps the json approach(#1600 (comment)) could work, but only using set_configs definitely didn't help.

I just did some more testing with only the SQL side of this. The json approach is not going to help. Both with and without json the result without prepared statements is about the same. This might change with prepared statements a bit in favour of json, because the statement would always be the same - but both will still be slower then the SET.

But not even considering peformance: The json approach essentially poses the same risk for injection. Either by injecting errors into the json and maybe getting secrets as part of the error message - or by replacing the "role" value with something else. JSON certainly seems less prone to that, because the format is simpler - but still, parametrized statements are the real deal.

So from a security perspective, we should definitely go for the prepared statement approach you used here. Each set_config as their own parameter. I did some rough calculations with the timings I got from SQL - those correspond well with ~100 req/s less you get on AWS. The question is: How can we get this to be fast?

My expectation is, that the best we could do would be to put the set configs into a materialized CTE in front of the main query. This should give us about 50 req/s more.

We should not treat this as "making things slower". I feel preparing the SET is part of "preparing everything", so this is part of the other PR before. So instead of getting a benefit of 500 req/s, we might only get a benefit of 450 req/s. Thats fine.

@steve-chavez
Copy link
Member Author

I understand the latter, so the prepared set_config is in fact still slower than the current SET, right?

@wolfgangwalther Yes, I got around 150req/s drop on the SET LOCALs.

This was never about security. The pgFmtLit function already guaranteed safe inputs. We've been working with that for many versions.

So, the main argument for parameterization on our case is performance. Why should we include a change that just makes things slower?

To put it simply: Can anyone come up with an input that defeats pgFmtLit? Or PQescapeLiteral for that matter?

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Nov 28, 2020

This was never about security. The pgFmtLit function already guaranteed safe inputs. We've been working with that for many versions.
[...]
To put it simply: Can anyone come up with an input that defeats pgFmtLit? Or PQescapeLiteral for that matter?

You're of course right, that this is probably more of a theoretical question, not a practical one. Although "guaranteed safe input" is a bit too strong in my opinion. Or if that's the case, then "server-side parametrized queries guarantee this even more" is also true. The thing is, even with the best escaping mechanism implemented - there is still a difference between escaping literals and using parametrized queries. A bug in escaping can have fatal consequences. Parametrizing is just a very different concept.

I don't even want to try to come up with an example to defeat pgFmtLit, because I will surely fail.

Ultimately this might not be only a question of security, but also about trust in postgrest to be secure. If we were able to say: "All our queries are properly parametrized", this would immediately make clear that injections are not possible. I feel it is worth the 50-100 req/s (if optimized in the main query). #294 is also related.

@ruslantalpa
Copy link
Contributor

"escape literal" functions were present in every major language for decades yet the debate was settled a long time ago, to never use them, especially for "user input" when it's not strictly parsed (parsed as we do with sql identifiers).
The sql injection is number 1 problem in OSAP top 10, why take the risk, for a 5% drop in performance (which i doubt it's even that)?

pgFmtLit was "informally" tested by begriffs and me, it was more like: from begriffs "it looks right and it looks like the C code from PG", and from me "hey i ran this automated injection tool and it did not break anything", it was never tested by "true" experts in this field, the kind that can dig a lot lower then the function definition (haskell compiler, llvm, processor code ...), and none of us has the "smarts" to say with 100% certainty that it's safe. If there is a problem with this function, the reason no one found any problems is because no one (smart in this field) had the interest to look, since postgrest was not such a big thing until recently, and there is always the risk that even if they do, they might have the incentive to not disclose that info.

to put it simply from my point of view :): i don't trust this function (not even the PG implementation), i trust i telling PG what is a query and what is a parameter. There risk/benefit case here is decisively in the parameterized statements favor.

just read what @wolfgangwalther ... what he said

@steve-chavez
Copy link
Member Author

If we were able to say: "All our queries are properly parametrized", this would immediately make clear that injections are not possible. I feel it is worth the 50-100 req/s (if optimized in the main query).

@wolfgangwalther This is the key point, and I absolutely agree. If we can put this on our docs(say in the Under the Hood page) then that would give PostgREST users a safety guarantee(escaping doesn't look as good, I concur).

The question is: How can we get this to be fast?

I'll reopen. Let's see if this can be made faster.

@steve-chavez steve-chavez reopened this Nov 28, 2020
@wolfgangwalther
Copy link
Member

We will have to be very careful with moving the SET LOCAL to the main query.

I had a similar situation today in a non-PostgREST (yeah, those still exist...) project, where I wanted to change LC_TIME to format some month names, just for one query / transaction. Brave as I was, I tried the CTE approach. The result was, that some of my rows had german names and some of them english - in the same query. I don't even want to imagine what could happen when doing the same thing with SET ROLE... :)

@steve-chavez
Copy link
Member Author

@wolfgangwalther Saw the same problem here as well. I tried changing the query to your suggestion:

WITH set AS (
SELECT set_config.... -- just like you did before
)
SELECT NULL FROM set;

And though it seemed a bit faster, it gave a lot of errors on the test suite(postgrest_test_authenticator errors).

@wolfgangwalther
Copy link
Member

Interesting - but that was not on the main query but still separate, right?

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Dec 3, 2020

WITH set AS (
SELECT set_config.... -- just like you did before
)
SELECT NULL FROM set;

I think the problem with this is, that it is just not executed at all. The "select" part of the SELECT is executed lazily and since the outer query does not select any of the columns from the inner query...

This would also explain, why it is faster. Doing nothing is faster :D.

Does adding MATERIALIZED help? Doing SELECT * on the outer query should certainly help, but that is exactly what we want to avoid...

@wolfgangwalther
Copy link
Member

I did a few SQL-only tests. I would not expect any CTE to help with performance as long as it's a separate query.

I previously did tests as well, but my JSON test was faulty before. I had a better one now and the performance of that is really good actually! Almost as fast as SET LOCAL in SQL. I think it can even be just as fast in PostgREST because of prepared statements:

select
  set_config(key, value, true)
from
  json_each_text($1);

The JSON would look like this:

{
  "search_path": "test, public",
  "role": "postgres",
  "request.jwt.claim.role": "postgres",
  "request.method": "GET",
  "request.path": "/projects",
  "request.header.host": "localhost:3000",
  "request.header.connection": "keep-alive",
  "request.header.upgrade-insecure-requests": "1",
  "request.header.user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36",
  "request.header.sec-fetch-user": "?1",
  "request.header.accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
  "request.header.sec-fetch-site": "none",
  "request.header.sec-fetch-mode": "navigate",
  "request.header.accept-encoding": "gzip, deflate, br",
  "request.header.accept-language": "en-US,en;q=0.9,es;q=0.8",
  "app.settings.jwt-secret": "a_test",
  "app.settings.other": "another",
  "app.settings.test": "a_test"
}

Of course this depends on how fast you can build that JSON in haskell. But building JSON should not be slow as it's just string concat.. right?

If this performs well, we might need to slightly adjust it though.. we should not send search_path and role just like that with the other params - otherwise those values could - in theory - still be manipulated through SQL injection... but let's test the performance first!

@steve-chavez
Copy link
Member Author

Of course this depends on how fast you can build that JSON in haskell. But building JSON should not be slow as it's just string concat.. right?

I've been bitten by Aeson before, if it turns out to be slow we could try with jsonifier.

we should not send search_path and role just like that with the other params - otherwise those values could - in theory - still be manipulated through SQL injection

Yes, in fact I think the main thing that should be inside the JSON are the headers(which can vary).

but let's test the performance first!

I'll try with Aeson for now. Let's see..

@steve-chavez
Copy link
Member Author

I did some explain analyze tests for comparing the select set_configs query vs the select set_configs from json_each and found that the latter is slower. To confirm this, I used Wolfgang's timeit function:

select * from timeit(100,
$$
select
  set_config('search_path', 'test', true)
, set_config('role', 'postgres', true)
, set_config('request.jwt.claim.role', 'postgres', true)
, set_config('request.jwt.claim.role', 'postgres', true)
, set_config('request.method', 'GET', true)
, set_config('request.path', '/projects', true)
, set_config('request.header.host', 'localhost:3000', true)
, set_config('request.header.user-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36', true)
, set_config('request.header.sec-fetch-user', '?1', true)
, set_config('request.header.accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3', true)
, set_config('request.header.sec-fetch-site', 'none', true)
, set_config('request.header.sec-fetch-mode', 'navigate', true)
, set_config('request.header.accept-encoding', 'navigate', true)
, set_config('request.header.accept-language', 'en-US,en;q=0.9,es;q=0.8', true)
, set_config('app.settings.jwt-secret', 'a_test', true)
, set_config('app.settings.other', 'another', true)
, set_config('app.settings.test', 'a_test', true)
$$,
$$
select set_config(key, value, true) from json_each_text('{ "search_path": "test, public", "role": "postgres", "request.jwt.claim.role": "postgres", "request.method": "GET", "request.path": "/projects", "request.header.host": "localhost:3000", "request.header.connection": "keep-alive", "request.header.upgrade-insecure-requests": "1", "request.header.user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36", "request.header.sec-fetch-user": "?1", "request.header.accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", "request.header.sec-fetch-site": "none", "request.header.sec-fetch-mode": "navigate", "request.header.accept-encoding": "gzip, deflate, br", "request.header.accept-language": "en-US,en;q=0.9,es;q=0.8", "app.settings.jwt-secret": "a_test", "app.settings.other": "another", "app.settings.test": "a_test" }'::json)
$$);

Results:

query plan_avg execution_avg
set_config 0.028364 ms 0.014586 ms
set_config json_each 0.050322 ms 0.080428 ms

Besides the query, we also need to create the json on Haskell(additional operation). So I'm pretty much certain that we'll not gain performance down this route.

@steve-chavez
Copy link
Member Author

Just asked about SET LOCAL vs SELECT set_config on pg freenode. They basically told me:

SET LOCAL is a utility statement with relatively low overhead, whereas SELECT means doing the whole executor startup/shutdown dance as well as returning a result

Chat log
steve-chavez | Hi guys. So I have an application that uses `SET LOCAL "some.var"` for application       
             | variables. When changing these `SET LOCALs` to use `SELECT set_config(key,val,true)` I   
             | get slower execution times. I was wondering, is SET LOCAL faster than SELECT set_config? 
 RhodiumToad | mmmmaybe.                                                                                
 RhodiumToad | did you try preparing the select?                                                                                                          
steve-chavez | RhodiumToad: Hey! Yes, I also did that and it's also slower. In fact I converted to      
             | set_config for preparing the query.                                                      
 RhodiumToad | SET LOCAL is a utility statement with relatively low overhead, whereas SELECT means      
             | doing the whole executor startup/shutdown dance as well as returning a result            
 RhodiumToad | I wonder how it would compare using the function-call fastpath to call set_config        
 RhodiumToad | not many clients support that tho                                                        
steve-chavez | Ahh.. that makes sense. Even with prepare I get slower execution times.                  
steve-chavez | Could I speed up the `set_config` in any way?                                            
steve-chavez | any other way*                                                                           
 RhodiumToad | hm, maybe try wrapping it in a plpgsql procedure and using CALL.                         
 RhodiumToad | no idea if that would be any better.                                                     

@steve-chavez
Copy link
Member Author

Anyway, I think it should be good to merge. @wolfgangwalther WDYT?

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Dec 5, 2020

I did some explain analyze tests for comparing the select set_configs query vs the select set_configs from json_each and found that the latter is slower. To confirm this, I used Wolfgang's timeit function:

Well... that's exactly what I did, where I found that the JSON query was very fast. I used a slightly different approach, though: To compare with SET LOCAL directly and because you can't EXPLAIN SET LOCAL, I wrapped every one of those different options in a function. This creates the same overhead. I choose a VOLATILE function, because this forces PG to re-run the function each time.

For the repetition I just did, I also added the two queries you had "bare", so without function call. I ran the queries 10,000 times, because they are really fast. But the results were not much different from running 100 times.

I pasted the full code here:
https://pastebin.com/1qkQpUPq

Here are the results for me with n=10000:

query plan execution total
set1(): set local 0.011 0.021 0.032
set2(): set_config 0.033 0.048 0.081
set3(): json_each 0.015 0.034 0.048
set4(): json_each prepared 0.005 0.027 0.032
set_config 0.007 0.010 0.018
json_each 0.004 0.018 0.022

A couple of things stand out:

  • The function calls do add an overhead compared to the "bare" queries. Not surprising. That's why I added them to all 4 options.
  • The "bare" queries have a lot faster plan times - this looks much like those query plans are cached. The plan times are very much comparable to the prepared query - so that makes a lot of sense.
  • The same applies for the execution times - without the function wrapper, the set_config is fast than json_each. But with the wrapper it's the other way around. I'm pretty sure that the bare query is cached somehow.
  • The total time of json_each prepared is the same as set local - that's what I was reporting earlier.
  • The difference in total time between set local and set_confg aligns with the decrease in req/s on the aws load test. I think the numbers we see here are real.

Overall, I think running the queries bare is giving wrong results. However, even with that we have quite a difference in relative numbers between the two queries: bare json is ~2x slower for me, while it's 4-5x slower for Steve. But that could be due to lower n as well.

I am running on PG 12.5. @steve-chavez, you? Can you give the code I posted above a run and see whether that still reports different numbers for both of us?

@steve-chavez
Copy link
Member Author

@wolfgangwalther Just ran your pastebin sample. Here are my results:

┌────────────────────────┬────────────────────────┬────────────────────────┐
│        plan_avg        │     execution_avg      │       total_avg        │
├────────────────────────┼────────────────────────┼────────────────────────┤
│ 0.06022010000000000000 │ 0.10559520000000000000 │ 0.16581530000000000000 │
│ 0.14403730000000000000 │ 0.21926060000000000000 │ 0.36329790000000000000 │
│ 0.05731850000000000000 │ 0.14283680000000000000 │ 0.20015530000000000000 │
│ 0.02109200000000000000 │ 0.11160300000000000000 │ 0.13269500000000000000 │
│ 0.02998940000000000000 │ 0.05250830000000000000 │ 0.08249770000000000000 │
│ 0.01480250000000000000 │ 0.07916370000000000000 │ 0.09396620000000000000 │
└────────────────────────┴────────────────────────┴────────────────────────┘

(Ran them on pg 11.5)

@steve-chavez
Copy link
Member Author

Btw, yesterday I checked the queries with pg_stat_statements and saw that the total time of the set locals is about the same time as the select set_config(..), set_config(..). I pasted the results here https://pastebin.com/nkMPjs6A (badly formatted).

-- set locals
0.054802 + 0.075372 + 0.00975 + 0.059812 + 0.052282 + 0.052081 + 0.0043 + 0.087992 = 0.396391

-- Also, it's surprising that they came as different statements
-- I thought all would run in one go(because of the `;` separator)

-- set config
0.375921

So maybe this is not about that SQL, perhaps the slowness is in the haskell code.

@wolfgangwalther
Copy link
Member

@wolfgangwalther Just ran your pastebin sample. Here are my results:

Hm. Those numbers look pretty similar to mine. Not sure what happened on your first run with timeit() then. Maybe it was really the low n. Looking at the numbers from both of us, my bet is still on the prepared json_each to be the best.

So maybe this is not about that SQL, perhaps the slowness is in the haskell code.

Just looking at the pastebin you posted: That query has 24 parameters. Maybe those hasql Snippets will slow down with so many of them? You could test this theory very quick: In each of those set_config($9, $1, $10), the third argument does not need to be prepared at all. This is no user input. In fact it is always true, right? Is there any difference in performance when you hardcode true here?

@steve-chavez
Copy link
Member Author

set_config($9, $1, $10)

That's weird. Because as in this PR, I'm only parametrizing the 2nd(the value). I'll see what's going on there. Not sure if it's a pg_stat_statements thing.

@steve-chavez
Copy link
Member Author

Yes, I got around 150req/s drop on the SET LOCALs.

Argh. When I said this, I likely made the test on the wrong version(the unprepared one I believe). At the time I probably looked at this PR from the wrong approach: wanted more performance, instead of more security.

So anyway, I've repeated the tests and this is what I get.

Locally, with a db-pool=1, running 1 min tests:

test master set-configs
1 584.020996/s 575.862611/s
2 587.405064/s 574.643421/s
3 586.816592/s 577.845985/s

On an AWS t3a.nano, with a db-pool=10(default), running 1 min tests:

test master set-configs
1 2078.474632/s 2045.558764/s
2 2071.123305/s 2049.89745/s
3 2073.203621/s 2044.819097/s

SET LOCAL is still faster, but just slightly over a prepared set_config.

@wolfgangwalther
Copy link
Member

set_config($9, $1, $10)

That's weird. Because as in this PR, I'm only parametrizing the 2nd(the value). I'll see what's going on there. Not sure if it's a pg_stat_statements thing.

Yeah, true. I didn't have a deeper look at the code, yet. Of course, you're not parametrizing the true :). So this seems to be the way pg_stat_statements groups queries for reporting. It makes no sense to treat every query with a slightly different constant value as different for that.

@wolfgangwalther
Copy link
Member

As stated in #1609 I gave vegeta a go today. While req/s are the right metric for the AWS tests, after a lot of trial and error, I concluded that for devtools and CI the best metric would be some kind of peak performance, not average. I will go into detail about this in the other issue. With vegeta the best metric for peak performance is the minimal latency measured ("the fastest response").

Here are the numbers for current master, this branch and the commit right before we introduced many more prepared statements. The request is just a simple GET http://postgrest/orders?select=name. 30s each branch.

branch min latency [µs]
pre prep 263
master 188
set-config 196

So this confirms your results.

Parametrizing each set_config separately is best way in terms of security. It's the easiest to prove it's properly parametrized. Given the numbers we have here, I would say the trade-off is ok.

However, I still think the json approach could be a bit tiny faster - probably most notably with workloads that run different queries. Different queries will result in different prepared queries in the current implementation, while the json query would always stay the same allowing more re-use. Not sure how much of that would be lost because of json encoding in haskell, though.

@wolfgangwalther
Copy link
Member

Anyway, I think it should be good to merge. @wolfgangwalther WDYT?

The first argument in set_config needs to be parametrized as well. But other than that, it looks good to merge.

@wolfgangwalther
Copy link
Member

Note: I just pushed the cirrus fix directly to master, because all other branches are still failing with the low memory error. So this is taken care of now.

@steve-chavez
Copy link
Member Author

However, I still think the json approach could be a bit tiny faster - probably most notably with workloads that run different queries. Different queries will result in different prepared queries in the current implementation, while the json query would always stay the same allowing more re-use. Not sure how much of that would be lost because of json encoding in haskell, though.

That would indeed happen on different number of headers/cookies/claims. The number of headers can vary easily. This one is kinda like the IN vs ANY optimization I made on #1633 (comment). But I'm also not sure of the json encoding speed.

I'll merge this one. I'll leave the json optimization for another PR :)

@steve-chavez steve-chavez merged commit 11d62a8 into PostgREST:master Dec 8, 2020
@ruslantalpa
Copy link
Contributor

the "set env" query can be merged with the main query, the important part is that the main query (and any other query) that needs to be executed after the "set" CTE, need to select from "set" CTE like so


begin;

with set as (
  select 
    set_config('my.var', '1', true),
    set_config('role', 'anonymous', true),
    'test' as t
),
main as (
    select
        set.t,
        current_setting('my.var', true) as myvar,
        current_user as u
    from set
)
select
    set.t,
    current_setting('my.var', true) as myvar,
    current_user as u,
    main.u AS m_u,
    main.myvar AS m_myvar
from set, main;

commit;

@wolfgangwalther
Copy link
Member

the "set env" query can be merged with the main query, the important part is that the main query (and any other query) that needs to be executed after the "set" CTE, need to select from "set" CTE like so

The query I had in #1600 (comment) was roughly like so:

with set as (
  select 
    set_config('lc_time', 'de_DE@utf-8', true)
),
main as (
  select
    <date formatting function here, that uses lc_time>
    other stuff
  from set, other tables
)
select
  main.*,
  other stuff
from main;

In theory, this is exactly what you say: Everywhere I need the new values I need to reference set first. But then I had mixed month names returned by that date formatting function, some english, some german. I did try both materialized and not materialized on the set, but I think materialized is much better / needed.

When doing this with smaller examples like you showed here, I always had the expected results as well. But not on the big set. It might have been caching issues, because I tried different things in the same transaction. Still, this feels like something that we need to be 100% sure about. Maybe we can check back wiith PG developers about this.

@wolfgangwalther
Copy link
Member

the "set env" query can be merged with the main query, the important part is that the main query (and any other query) that needs to be executed after the "set" CTE, need to select from "set" CTE like so

I asked on the pgsql general mailing list and Tom Lane told me better not to do this: https://www.postgresql.org/message-id/flat/cf303847-0bd1-4eca-2b3c-3055416df8bb%40technowledgy.de.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants