Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading 100k Entities, frontend reports 504 and stays on preview screen, upload has actually succeeded #691

Open
lognaturel opened this issue Jul 24, 2024 · 3 comments

Comments

@lognaturel
Copy link
Member

lognaturel commented Jul 24, 2024

Problem description

I uploaded https://drive.google.com/file/d/1y2Z9ZwHcX60FRW5F2vxbolooj3F6-bgY/view?usp=drive_link which has 100k Entities. After a minute, I saw a 504 at the top of the modal with the append button. I exited the modal and saw my Entities were successfully created.

URL of the page

https://staging.getodk.cloud/#/projects/93/entity-lists/entities_100k/entities

Expected behavior

In the case of a 504, I think we should close the modal if possible. Because there's no duplicate detection, a user is at high risk of uploading the same Entities twice and then they're stuck with them.

Alternately, could the server send back something to say it's still working on it?

Central version shown in version.txt

versions:
e49518adb84f88d7bc6c3626fc77584dfc935435 (v2024.1.0-6-ge49518a)
+988780e2d439894ef6e5af15692a8916c7c8d8e5 client (v2024.1.0-23-g988780e2)
+fb96aa3333d3442ea43365755766262f21de3969 server (v2024.1.0-23-gfb96aa33)

Browser version

Around when did you see the problem (in UTC)?

Other notes (if any)

@matthew-white
Copy link
Member

matthew-white commented Jul 24, 2024

If the upload succeeded, then I don't think that Backend itself would have returned an error response. Otherwise the database transaction should have been rolled back. I think that nginx is returning a 504 after a set amount of time without waiting for Backend to finish.

After a minute, I saw a 504

Could it have been 2 minutes? That's how nginx is configured. Was the error message "Something went wrong: error code 504." ? If so, that's another sign that it's nginx. An error from Backend would be more specific.

If it's nginx, here are a couple of ideas for how to address it:

  • Turn off or increase the nginx timeout for all non-GET requests
  • Have the entity upload endpoint trickle a response, similar to what the /v1/backup endpoint does, so that nginx doesn't time it out

I'm not sure that closing the modal would necessarily help, because the user could just reopen the modal or even refresh the page in order to try again. Backend would still be working on the original request. getodk/central-backend#663 is another example of how Backend can be working on concurrent requests even after a 504 response.

@lognaturel
Copy link
Member Author

lognaturel commented Jul 25, 2024

"Something went wrong: error code 504."

Yes, exactly. I’m quite sure it’s nginx, there was nothing in the service log.

Trickling a response like the backup endpoint would make a lot of sense.

I’m less sure of the implications of modifying the timeout.

@matthew-white
Copy link
Member

matthew-white commented Jul 25, 2024

I’m less sure of the implications of modifying the timeout.

I feel like we've considered this idea before, though I don't remember why we didn't make this change. How long does it take Postgres to time out? Would it be reasonable to change the nginx timeout to match the Postgres timeout, at least for non-GET requests?

Trickling a response like the backup endpoint would make a lot of sense.

With the backup endpoint, we trickle random data that winds up in the backup .zip file. I don't think we'd want to return random data or a .zip file from the upload endpoint. But I think the current response from the upload endpoint is {"success":true}, and we could trickle that out, returning a character every minute or so. Maybe that would be the easiest change to make.


For some reason, I'm a little surprised that an upload request would take as long as you're seeing. Maybe we knew that already and I'm just forgetting. 😅 Once we allow the request to take more than 2 minutes, I'm wondering whether we should do more to signal to the user that the request really is in progress and that they definitely shouldn't refresh the page and try again. For example, after 30 seconds or a minute, we could change "Processing file..." to "Still processing...", or we could show an alert that mentions "don't refresh".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants