-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant performance regression going from Postgres 13 → 14+ in test code #506
Comments
@finestructure Can you help me investigate this? Can you provide me a SQL query and a matching database that becomes slower with Postgres 14? |
I'll try and come up with something that involves less of our stack (and who knows, maybe I'll find the cause isn't PostgresNIO in the process 😅). |
So the good news is that a perf test I created here: https://github.com/finestructure/pg-perf-regression.git does not show any performance regression in PostgresNIO. I've got more details in the original issue. As I conclude there, I think that leaves us with two possible explanations for the x3 performance regression:
I'll close this issue for now while I try to figure out which it is - or if it's something else entirely 😅 Any ideas what else I could try greatly appreciated! |
I believe I know why I couldn't reproduce the issue at first. I've got some details here and will try to create a reproducer that better reflects what we're doing. Some details here: SwiftPackageIndex/SwiftPackageIndex-Server#3360 (comment) |
I'll start by saying that this issue is quite bizarre 😅 There seem to be multiple factors at play. For one, the issue seems to happen when running in an XCTest. That's why I couldn't reproduce it standalone at first. There are also varying degrees of slowdown in an SQL query depending on how many times another SQL query ran before it. Best I can tell this is not a measurement effect. I see this same slowdown in our actual tests. I've set up a pure PostgresNIO project here with my tests that resemble our unit tests: https://github.com/finestructure/pg-perf-regression.git Simply running
should yield all the figures. I've plotted my results here https://www.icloud.com/numbers/0945R88oKsGdVM5kY89hT7eMA#Issue_3360_perf_regression and am attaching screenshots below for easier viewing. The core bit of the test is this bit here:
The timing is only done within Observations are:
Please double check my work here! While I've run these tests and variants hundreds of times now, and have found them reproducible, it's easy to get stuck in a groove and overlook something obvious. I think I'm testing the right thing and in a sensible fashion, and it does reflect our real-world observations, but it could still well be that I'm timing the wrong thing or something like that. It's important to note that this is not a micro-benchmark regressing. We're a seeing large (3x) real world increase in our test runtime to the degree that we keep testing with Pg 13 locally for now. |
The numbers change a little but show generally the same trends when running in release mode:
I've added the figures and graphs in new tabs in the numbers document: https://www.icloud.com/numbers/0945R88oKsGdVM5kY89hT7eMA#Issue_3360_perf_regression |
The latest test is essentially just running DROP DATABASE IF EXISTS \(databaseName) WITH (FORCE)
CREATE DATABASE \(databaseName) followed by DROP DATABASE IF EXISTS \(snapshot) WITH (FORCE)
CREATE DATABASE \(snapshot) TEMPLATE \(original) I wouldn't expect changes to |
Fair enough. I wouldn't expect dropping and creating the database to be affected by vacuum. So the plot thickens... |
Just to follow up on discussions we had at the conference, @fabianfett, and to recap where we're at with this rather bizarre and confusing issue!
Hope that helps tracking this down! Short of converting our tests to run in parallel (which is tricky when running against db instances), I don't see a way for us to "fix" this regression on our end. |
Hi Sven 👋 thanks for digging so deep into this issue. I brought this up with my team.
As this only affects test code, we don't plan to investigate this issue in the near future, as we currently have higher priority work that we need to handle. Of course PRs from the community are always welcome and we'd be happy to review those, if they solve this issue. |
I just want to stress that my tests don't allow that conclusion. Tests is where we've first seen the regression and it's the only place I've been able to demonstrate it so far but that doesn't mean it only affects tests. |
I've been investigating a big run time regression in our test suite after switching the Postgres version in our docker container from 13 to 16, which I've subsequently narrowed down to happening when going from 13 to 14.
It's been suggested that this might be due to how PostgresNIO is adopting changes in Postgres 14, perhaps "pipelining support".
There are more details in the tracking issue on our end (the one linked above) which I'll briefly summarise here:
Here are the figures from the issue:
SPI test suite, running locally on a Mac Studio:
There's a clear regression going from Postgres 13 → 14.
This is not the case when running plain SQL against those db versions (with copies of our production data, so it's a real world scenario):
I don't know if or how this could be an issue in PostgresNIO but it doesn't seem to be a db issue. Are there any other tests I could be running to help figure out where the slowdown could be occurring?
The text was updated successfully, but these errors were encountered: