-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[for discussion] Integrate a load test into the local dev tools and CI #1609
Conversation
I love the idea. That will really be helpful. I wonder how comparable the loadtest will be in CI from run to run - but that's something that we should be able to find out pretty quickly. If it were possible to somehow run both the current master build and the PR build at the same time - that could increase comparability a lot and also allow some automatic comparisons, that lead to a test failure above/below a certain threshold? I have no idea about the whole CI pipeline that is set up (and especially the nix stuff), but there seems to be a bit of caching involved everywhere. Is caching possible across runs? And can you set up your own caches? If yes to both, something like the following could maybe work:
So basically the current build of master would always be kept in a different cache to be able to run against the current PR build. Even if circle CI performance would be different between runs, running both builds at the same time should eliminate that. Maybe I'm completely off here and this doesn't work at all... just wanted to throw that out! |
Oh and I'm not sure you know about the stuff steve is doing, see #1600 (comment) ? Maybe those can play together? |
@wolfgangwalther Agree, it would be great to get a direct performance comparison! E.g. in nixpkgs, the build bot Ofborg generates a report that is accessible directly in Github: https://github.com/NixOS/nixpkgs/pull/99650/checks?check_run_id=1211150408 No idea how they are doing this though, will need to look into it. By the way, results from this first run in CI (a bit hidden in the output right now, need to see how to make it more visible):
--> 222 reqs/s and reponse time of 1.9s at 99.99% I'll fix the styling fail and we'll see what comes out in a second run :-) @wolfgangwalther happy to guide you through the Nix and CI setup if you're interested, would also be good to get your pointers where we can add to the documentation of that setup! |
Second run:
I rebased on the Postgres 13 PR that got merged earlier, so let me re-run once more! Edit:
Not very consistent so far, might need to run the test for longer to even things in CI out. |
Also love the idea!
The benchmark I'm doing is mostly about how many req/s we can support on different EC2 instances. It shouldn't conflict with this PR. I use k6 for my load tests, but it's cool to use locust on our CI :).
This should definitely be the goal! Failing on the threshold would let us know of drops in performance when changes happen. Including a couple more scenarios on this PR would be ideal:
The above are done on a different db(Chinook). Using any of the tables from our fixtures for those would be cool. |
Co-authored-by: Steve Chavez <[email protected]>
b8690b0
to
8eef2c6
Compare
when: always | ||
- run: | ||
name: Build all derivations from default.nix | ||
command: nix-build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nix-build
is now run after all the tests in the pipeline. How are the tests now run against the new build? It looks like they are run against the cached version from cachix, which would always be the last build from master? What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @wolfgangwalther , I'll need to add a comment to document this. This setup works correctly and makes the tests run a bit earlier in the job, failing faster if needed.
nix-env -iA
in the earlier step also uses nix-build
under the hood. At this point, only the things that are really needed to run the tests are built or pulled from the cache if available. nix-env -iA
will never install anything stale from the cache.
This nix-build
call at this step then just builds whatever is not yet built or pulled from the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool. Just had a comment on that previous step, that it should maybe be renamend then. Deleted it again, after I realized there are actually 2 steps that have nix-env -iA
in them. So a comment + rename of one of the steps would sure help, I guess.
I understand that we could basically run the tests and nix-build
in parallel after we are done with Install testing scripts
? Have you tried doing that with circleCI's "workspaces" feature?
It looks like you could split this into 3 jobs, where the test and build jobs run in parallel after the first one. I guess you could just copy the whole /nix
folder between those jobs via persist_to_workspace / attach_workspace.
See here for reference:
https://circleci.com/blog/persisting-data-in-workflows-when-to-use-caching-artifacts-and-workspaces/
https://circleci.com/blog/deep-diving-into-circleci-workspaces/
One benefit of that would be that those jobs would both show up separately in the check list on github. I think this could then be extended to all the tests against different pg versions - to have them show up one by one on the check list. Only if that's wanted, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using workspaces is a cool idea! We'll need to test how this performs, as the /nix/
folder can easily be a few GB large, having a few GHC versions for the static builds etc. Not sure if 'saving' and 'loading' the workspace could be a bottleneck (it would be with the CircleCI caching feature, but maybe workspaces work differently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the
/nix/
folder can easily be a few GB large, having a few GHC versions for the static builds etc
Shouldn't those be fetched only at the nix-build
stage?
But yeah performance needs to be tested. Very likely that it won't actually perform better.
Another question: Looking at the report at https://app.circleci.com/pipelines/github/PostgREST/postgrest/426/workflows/6fb87532-cdd6-4936-a3ca-bfb181d088b3/jobs/6779. The step "Run the test specs against PostgreSQL 9.5" took > 2 minutes. Looking at the output it shows that the lib and test-suite are rebuilt completely. However I understood, that this should have happenend in the step before ("Install testing scripts") already, because of Is the same thing happening here twice? Or is |
Will also need to clarify/document that somewhere :-) Nix builds the whole environment required for running the tests (including all library dependencies, ghc, cabal-install and postgres), but the tests themselves are run using The |
Thanks, I think I understand it a whole lot better now. |
@monacoremo I was surprised to see, that when running I'm trying
I expect the performance to be significantly less without prepared statements, as that's what Steves tests have shown. I guess the results from Is there anything that I need to do differently? Ah. I did something differently now: I stashed your changes and applied them to the master branch to be able to launch So this means that the executable that is used with the load/memory/io-tests is actually build when launching nix-shell. That seems to limit the use of Any way we can change this? |
It looked like the numbers were quite consistent at the beginning. Ranged around +- 5 req/s. I expected to get better numbers, when running longer, so I took a couple of 30 min runs. Variance did not change. That's not what I would have expected, that questions reliability (repeatability) of those tests. I wonder whether we can implement proper A/B testing where we always compare two postgrest executables with each other. Ideally in an alternating setting, so something like 5 sec A, 5 sec B, ... repeat for 10 times or something. That should be a lot more independent of current machine load and give a delta of req/s as a result. This delta should be much better to infer conclusions from.
Another question: Why is there nginx running at all? Can the clients not make the requests directly to PostgREST? I understand that nginx (or another reverse proxy) is probably part of most setups - but we can't do much about it's performance, at least in this repo. Including it feels like taking resources (and focus) off of testing PostgREST itself. |
Haven't looked at this for a while, it's next up on my list. There are a few interesting options here for maximum performance tests, like wrk. I got 3000rps on my local machine with it. Thinking about including a balanced load test with locust, and a max unbalanced load test with another tool. |
nginx helped make postgrest more stable with very highly concurrent loads, if I remember correctly - will test again We will definitely be able to run the incremental build, now that we've figured out cabal v2-exec. A/B testing could be possible based on our nightly releases - let's think about this. |
I did a few tests as well last week, but haven't been able to write them up. some interesting numbers for sure. not exactly sure what to make of those. |
I tested it without the nginx stuff and it's working fine. Performance is actually the same, so it does not seem to hurt. The log output is a lot better (cleaner), because all the error logging from nginx is gone. For the simple case, I don't think we need nginx. I also experimented a bit with a different number of users. Comparing last week's master branch and the commit right before prepared statements were introduced with 5s each step: Data set
Up to 30 users the numbers are exactly the same. Although a bit more variation, the same trend is still continued up to 55 users. From 60 users on the variation starts to increase massively. It makes no sense that perf first decreases then increases again with more users - so this is random variation for sure. I observed this variation with a lot longer run times as well. The trend is clear, though: The performance improvement with prepared statements is clearly visible across runs. However:
From the plot, one can tell that this makes the current locust test basically useless, because even with a huge performance difference you could end up randomly in a spot as shown with I'm pretty sure that the reason is that with lower numbers of users locust is not able to saturate PostgREST, so we have idle times and all requests can be served. So the number of requests is basically the number of requests that locust can send, not that PostgREST can handle. Clearly locust requires a lot of resources and when we increase the number of users, locust and PostgREST have to share those, since we're running everything on the same machine. The most likely cause is just random scheduling differences between runs, I assume. We can't really fight those. Plus, when we want to run those tests in CI, we only have 2 cores available (I looked that up, I think it was 2 - very few for sure). I tried a few other things as well, to reduce variation - with no success. There's one other thought I had: Why are we testing with a multi-user setup? We are forcing postgrest to use only 1 database connection with For our dev and CI tools use-case, where we want to know how our "request to query" code performs - it would be best to just use a single threaded test runner, that performs 1 request, waits for the response and then sends the next request. This would need to be implemented most efficiently - so no python :D. We can then just hammer PostgREST with one query for 5s, then the next query for 5s etc. to get an idea of different code paths. We could do this with a simple
I think what I asked for above, is what you mean by "unbalanced" load test, right? I will now narrow down the list of benchmark tools in the linked repo to those that might make sense for us. I am only looking for:
Only very few of the tools do support http/2, unix sockets and all request methods. Some came close, but only one of the tools looks really promising to solve all of that: https://github.com/tsenart/vegeta. Some highlights of that:
This seems to be - if at all supported - trouble everywhere else. Overriding the host address in target URLs is exactly what is needed, though - no fiddling with URIs to support sockets...
That will be a good one to limit us to 1 CPU for CI.
If I understand correctly, we can just write our test-cases as plain text files in HTTP message format. This is awesome, because it has exactly no boilerplate code at all.
I will try to give |
That's a nice find @wolfgangwalther. I also read about locust being slow before, but I didn't think it'd make that much of a difference.
If vegeta has issues, I can vouch for k6. For my local |
Although We have a lot of unrelated factors that contribute to the performance of a single request (other processes running, cpu scheduling, even automatically adjusted cpu frequency and stuff like that) that create a lot of noise. Every once in a while there will be an outlier with huge response times - and that will kill the whole average. I found that the median response time (reported as 50th percentile) was already much more reliable. After upgrading nix and In my tests, I used this PR and made a couple of changes:
I didn't test again with locust, but I would assume if we could find some kind of peak performance metric there, this would also be more reliable. However, I think the upside for vegeta as our test-runner is quite high, see my comment above about criteria.
I couldn't find any unix socket support mentioned in the docs. But apart from that, it looks like it supports everything we need here as well. The "1 virtual user" would probably work as well. However, the virtual user concept + the requests written in javascript surely add overhead. With vegeta I am getting ~ 2400 req/s with a single process and no concurrency. For peak performance measurements this is good, because the more requests we can make, the higher the chance to find that "true minimum". Also, to be fair, I did not run k6 on my machine and don't know how well it performs here. |
Nightly builds could be a problem, because this will be the static build - while for the actual test we want to switch to the incremental build, right? Maybe we can do that on a commit basis. So This could be done by something like this (inspired from here), in parts pseudo-code as comments:
Default for sha1 would be "master". So all PRs would automatically compare against current master. |
That'is really interesting @wolfgangwalther. I've just found that the median is pretty stable on my tests on #1600 (comment). I'll share the k6 output of 5 runs here, see the med=1.41ms, 578.878085/sdata_received..............: 5.3 MB 176 kB/s data_sent..................: 1.9 MB 65 kB/s failed requests............: 0.00% ✓ 0 ✗ 17433 http_req_blocked...........: avg=5.57µs min=3.21µs med=5.09µs max=516.57µs p(90)=6.57µs p(95)=7.61µs http_req_connecting........: avg=23ns min=0s med=0s max=415.78µs p(90)=0s p(95)=0s http_req_duration..........: avg=1.52ms min=957.23µs med=1.41ms max=145.69ms p(90)=1.66ms p(95)=1.76ms http_req_receiving.........: avg=98.42µs min=49.37µs med=97.74µs max=704.7µs p(90)=125.98µs p(95)=136.43µs http_req_sending...........: avg=31.93µs min=14.22µs med=28.37µs max=527.87µs p(90)=50.95µs p(95)=58.36µs http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...........: avg=1.39ms min=838.26µs med=1.28ms max=145.33ms p(90)=1.53ms p(95)=1.61ms http_reqs..................: 17433 578.878085/s iteration_duration.........: avg=1.7ms min=1.1ms med=1.59ms max=146.79ms p(90)=1.86ms p(95)=1.95ms iterations.................: 17433 578.878085/s vus........................: 1 min=1 max=1 vus_max....................: 1 min=1 max=1 med=1.41ms, 582.899933/sdata_received..............: 5.4 MB 178 kB/s data_sent..................: 2.0 MB 65 kB/s failed requests............: 0.00% ✓ 0 ✗ 17555 http_req_blocked...........: avg=5.4µs min=3.29µs med=4.89µs max=503.17µs p(90)=6.38µs p(95)=7.39µs http_req_connecting........: avg=23ns min=0s med=0s max=404.73µs p(90)=0s p(95)=0s http_req_duration..........: avg=1.51ms min=971.63µs med=1.41ms max=81.52ms p(90)=1.66ms p(95)=1.75ms http_req_receiving.........: avg=98.34µs min=49.09µs med=97.35µs max=669.27µs p(90)=126.5µs p(95)=136.29µs http_req_sending...........: avg=31.99µs min=15.15µs med=28.66µs max=538.74µs p(90)=51.16µs p(95)=57.94µs http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...........: avg=1.38ms min=856.02µs med=1.28ms max=81.39ms p(90)=1.52ms p(95)=1.61ms http_reqs..................: 17555 582.899933/s iteration_duration.........: avg=1.69ms min=1.1ms med=1.6ms max=81.72ms p(90)=1.85ms p(95)=1.95ms iterations.................: 17555 582.899933/s vus........................: 1 min=1 max=1 vus_max....................: 1 min=1 max=1 med=1.41ms, 570.758333/sdata_received..............: 5.2 MB 174 kB/s data_sent..................: 1.9 MB 64 kB/s failed requests............: 0.00% ✓ 0 ✗ 17189 http_req_blocked...........: avg=5.63µs min=3.19µs med=5.18µs max=328.34µs p(90)=6.81µs p(95)=7.77µs http_req_connecting........: avg=12ns min=0s med=0s max=216.94µs p(90)=0s p(95)=0s http_req_duration..........: avg=1.54ms min=982.29µs med=1.41ms max=184.67ms p(90)=1.7ms p(95)=1.86ms http_req_receiving.........: avg=101.12µs min=50.62µs med=100.81µs max=702.54µs p(90)=129.31µs p(95)=140.84µs http_req_sending...........: avg=32.45µs min=14.74µs med=29.39µs max=632.58µs p(90)=50.31µs p(95)=58.57µs http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...........: avg=1.41ms min=884.56µs med=1.28ms max=184.53ms p(90)=1.55ms p(95)=1.71ms http_reqs..................: 17189 570.758333/s iteration_duration.........: avg=1.73ms min=1.11ms med=1.6ms max=184.84ms p(90)=1.9ms p(95)=2.07ms iterations.................: 17189 570.758333/s vus........................: 1 min=1 max=1 vus_max....................: 1 min=1 max=1 med=1.4ms, 583.66937/sdata_received..............: 5.4 MB 178 kB/s data_sent..................: 2.0 MB 65 kB/s failed requests............: 0.00% ✓ 0 ✗ 17577 http_req_blocked...........: avg=5.56µs min=3.18µs med=5.14µs max=347.51µs p(90)=6.79µs p(95)=7.74µs http_req_connecting........: avg=14ns min=0s med=0s max=261.8µs p(90)=0s p(95)=0s http_req_duration..........: avg=1.51ms min=941.61µs med=1.4ms max=153.5ms p(90)=1.64ms p(95)=1.73ms http_req_receiving.........: avg=98.86µs min=47.42µs med=98.56µs max=633.2µs p(90)=126.63µs p(95)=136.6µs http_req_sending...........: avg=32.62µs min=15.17µs med=29.45µs max=507.07µs p(90)=52.19µs p(95)=58.61µs http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...........: avg=1.38ms min=833.4µs med=1.26ms max=153.3ms p(90)=1.5ms p(95)=1.59ms http_reqs..................: 17577 583.66937/s iteration_duration.........: avg=1.69ms min=1.07ms med=1.58ms max=153.7ms p(90)=1.83ms p(95)=1.93ms iterations.................: 17577 583.66937/s vus........................: 1 min=1 max=1 vus_max....................: 1 min=1 max=1 med=1.42ms, 573.985382/sdata_received..............: 5.3 MB 175 kB/s data_sent..................: 1.9 MB 64 kB/s failed requests............: 0.00% ✓ 0 ✗ 17285 http_req_blocked...........: avg=5.62µs min=3.31µs med=5.17µs max=399.4µs p(90)=6.57µs p(95)=7.66µs http_req_connecting........: avg=14ns min=0s med=0s max=251.85µs p(90)=0s p(95)=0s http_req_duration..........: avg=1.54ms min=951.16µs med=1.42ms max=286.92ms p(90)=1.67ms p(95)=1.76ms http_req_receiving.........: avg=99.29µs min=49.7µs med=98.4µs max=444.76µs p(90)=127.47µs p(95)=137.89µs http_req_sending...........: avg=32.45µs min=15.21µs med=29.05µs max=579.37µs p(90)=51.42µs p(95)=59µs http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...........: avg=1.4ms min=840.34µs med=1.28ms max=286.77ms p(90)=1.53ms p(95)=1.62ms http_reqs..................: 17285 573.985382/s iteration_duration.........: avg=1.72ms min=1.08ms med=1.6ms max=287.18ms p(90)=1.86ms p(95)=1.96ms iterations.................: 17285 573.985382/s vus........................: 1 min=1 max=1 vus_max....................: 1 min=1 max=1 (For my tests, the minimum response time, Prior to the prepared statements change I'm getting a stable |
This is an experiment on integrating a load test into the local dev tools and CI.
To run the loadtest locally, run
postgrest-loadtest
in nix-shell. This will run for 60s by default.This is based on https://locust.io/, but happy to try other frameworks out if you have suggestions!
The locustfile could be extended with more complex reads, writes etc. that can then be selected via tags to simulate different workloads.
The Postgres database, PostgREST server, Nginx reverse proxy and the load all run on the same machine with this setup, which makes it easy to quickly run locally and in CI, but it impairs comparability.
Todos:
postgrest-loadtest
an option in nix-shell in order to maintain fast nix-shell startup without cachix