-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request handling periodically blocked #2
Comments
Thanks for the report! Do you know roughly how many objects are being created? Are they all in the same bucket and/or same account? And which docker tag are you using - I did some local load testing where I created about 80,000 small objects in a single bucket, and the serialization of the data took about 14 seconds. Obviously this would vary massively depending on system load - but it did indeed prevent any API requests from being handled while serialization was ongoing. The fact that persistence blocks API requests while it's running is not intentional - part of the reason it happens is due to some overzealous locking, so I can improve that. It should also be feasible to switch to a more efficient serialization mechanism. localstack-persist currently uses jsonpickle, which is fantastic for observability and debuggability, but not great for performance. I whipped up a proof-of-concept that used the pickle module, and that only took 0.5 seconds for the same data, so a 28x speedup! I'll need to clean it up a bit and handle backward-compatibility properly before merging it in properly, but I have a basic implementation working so I don't foresee any major problems. |
localstack-persist v3.0.2 should improve this, as persistence no longer blocks request handling 🙂 However, it can still use up most of the available CPU while it's occurring - you may want to try setting |
No problem (sorry for the wall of text btw 😅). I don't really know how many objects are created each test run, but I'm pretty sure it's well below the amount tested (80 000). It's all in the same bucket and account. I used 3.0.1 (when it was tagged latest).
Yeah, I suspected it was something like that (not that I know where to look in the source but anyway).
Awesome!🥇 |
Fantastic! I'll take it for a spin. |
I tested 3.0.2 (which prints Both versions were tested with and without the The tests took roughly the same time to finish, and I ran them all multiple times (both multiple times in a row and several runs with a new container in between). I'd like to say it feels more stable, though the tests took about the same time to finish both with and without I got the 504 Gateway Timeout a several times, but it was more or less random and seemed to happen a lot less (I think I could consistently trigger it before). There was no obvious pattern beside the timeout happening around the time of state being persisted (stuck for a minute and then the test script fails). I noticed that the Container CPU chart in Docker Desktop went down to about 0.03% for both my API container and the localstack container (it basically flatlined) while the API was waiting for a response (and the last log statement was about persisting state), so it still seems something is being blocked. I'm working from home now, but I'll try it out at the office as well. I want to see if it's possible to set a timeout in the S3 client so I can get a proper error before nginx aborts the request. |
When you get a chance could you try running it again with the environment variable |
I enabled the debug flag and ran lots of tests, but it didn't really say much. However, I may have found the cause, and it's on the client side. When initializing the S3 client. it is possible to specify two different timeouts ( It's not entirely clear what the It seems like it is sufficient to set the For reference, this is how one might initialize the S3 client using the PHP SDK (v3):
The Also, the S3 client has a retry mechanism with (by default) three retries per call, so it could be that when the timeout is set, one of the retries succeeds (as opposed to waiting indefinitely on the first try). That could possibly explain why the tests kept running instead of failing when the timeout is triggered. |
As I mentioned in #1, I have a set of tests which makes requests to a PHP-based API that in turn uses S3 for file storage.
Since the tests make actual requests (i.e. no functionality is mocked), each request have to finish within about one minute or the nginx server in front of the API times out.
Now, the tests creates a lot of requests to test the various API endpoints in different ways. The requests are done synchronously, so there should only be one request in flight at a time. Also, not all requests make use of S3, but those that do usually make multiple calls to S3.
For example, checking if a directory exists, create it if it doesn't and then upload a file to that directory (while S3 is not a file system, the AWS PHP SDK has functionality that allows working with S3 as if it is a regular file system using familiar operations). This is why there are a lot of HeadObject operations in the log below (404 -> the file or directory does not exist).
So to the issue: when running the tests and looking at the test script logs, it is clear that the API request is blocked for a while (the test log output stops). Most of the time it returns a response (after whatever blocked it finishes).
Looking at the localstack logs, it stops at the same time the test stops, and it seems to coincide with the
Persisting state of service s3...
log entry.It's not clear what state is persisted, but it usually blocks the callee for a little while (a couple of seconds or so, not a big deal). The problem is that sometimes it is stuck a much longer time (>1 minute), which is enough for the nginx server in front of the API to abort the request and return a 504 Gateway Timeout to the client (in this case, the test script). I verified this by raising the timeouts in the nginx server until I could run the same tests multiple times without a timeout.
In the logs below, I marked the time the last test client request was blocked. After about a minute the request is aborted, but I believe the PHP script continues executing, which is why some S3 requests are made afterwards.
The text was updated successfully, but these errors were encountered: