-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[production] Production site is intermittently timing out #722
Comments
Thanks for this report. In the past (see #489), we have also had issues with bots/scrapers hitting the download links that have caused performance issues. May be worth checking for this.Generally, if users are trying to download a lot of data and it breaks, we get a polite email asking if we're ok (our users are lovely!). I agree we need to rethink how we handle downloads as this has caused huge headaches. The our basic use case, which could need updating, is that users want all the data for a country and aren't particularly interested in per-entity slices. My inclination is to sidestep the programmatic aspects and simply offer up copy of the spreadsheet used to create the data in WWIC. So, after import is complete, place the import spreadsheet as a static asset the user can just grab. That needs some thinking about as well though. |
I've disabled downloads, but performance is still suffering. Consulting the access logs, we're getting a lot of traffic to search pages from Bing and Petal bots, specifically with assorted parameters for sort and number of rows. Short of barring crawling of search results entirely, I think adding the |
Thanks, we're fine being indexed of course but this seems quite excessive. If the additional
Beginning to wonder if #357 isn't beginning to bite a bit as well; making ourselves more discoverable, offering better metadata, to search engines. |
Yes to revisiting SEO. In addition to improving the metadata, I assume we're allowing crawling of the search results in order to expose links to and thereby index personnel, units, and incidents, but we could achieve that without the performance hit by adding a site map. In the meantime, I've added the nofollow directive to most links on the search page. I'll give the bots in question a little time to act right and, if they don't comply by tomorrow morning (8 a.m. Central), I'll go ahead and block them from crawling the site. |
Looks to me like the bot-caused performance issue is resolved by the addition of
What else do you see in this issue? |
The big task I see is performance tuning across the site. That would entail setting up automated load testing based on expected number of users (analytics should give us a good idea of what's normal, as well as some signal re: extremes, e.g., traffic spikes after you promote a launch) and making improvements, such as caching, to accommodate those patterns of use. N.b., SEO improvements can also net performance benefits if they allow us to, e.g., disallow crawling of the search pages. There are some changes in progress, namely the search refactor and upcoming migration to Heroku, that will affect site performance, so I would wait to tune until after those are complete. I'll also add that I wonder if the issue with downloads was a red herring and it isn't that downloads are particularly heavy / sluggish but that heavy bot traffic was causing sluggish requests across the site. With that said, I do like your idea of making the source sheets available, especially now that a canonical version of location data will be available and per-country slices would be more useful to users than the current entity-level slices. |
We're getting alerts of timeouts again, and I'm still seeing the problematic bot traffic. I'm thinking it's time to block Bing and PetalBot, at least temporarily. (We might decide to re-allow them if we disable search result crawling.) |
Okay. Bing and PetalBot begone! |
Done and done. Looking snappy! Given the huge difference blocking the bots has made, @tlongers, I'm even more convinced downloads were not the culprit. I've turned them back on and confirmed that they're just as snappy as the rest of the site. That's not to say we can't improve upon them later. 🙂 If the site is still stable in a few days, say Monday AM my time, I'll go ahead and close this issue. Meanwhile, I think we have a few things to spin off here:
Have I missed anything? |
These look great, @tlongers, thank you so much for your issue farming. 🐮🤠 |
We've set up downtime notifications for our properties and it looks like WWIC is timing out from time to time. (We're at 99+ percent uptime over the last 30 days, FWIW.)
I started to look into this and noticed we were getting exceptions like this every time a download was requested:
Also, download files were empty. Looking at #678, it looked like the materialized views backing downloads weren't created, so I ran the
make_materialized_views
command manually. That addressed the errors and fixed downloads. We should double check that, that management command gets run when it needs to be so views exist when we expect them to.Even with that fix, download requests are pretty sluggish. So, I looked at the config for the process that runs the app, and it seems like we're only using one worker to fulfill requests:
gunicorn -w 1 -t 180 --log-level debug -b 127.0.0.1:8000 sfm_pc.wsgi:application
If there's sufficient traffic when a download is requested, this could lead to a bottleneck and some requests may time out. I think we should address this in the Heroku migration rather than investing additional time into our current setup (especially since we're up 99 percent of the time). We might also consider ways of making the downloads more efficient or pushing them off into asynchronous work so they don't block other requests.
The text was updated successfully, but these errors were encountered: