-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Document how to deal with bots on live sites #2286
Comments
We've experienced this as well and it's brought out site to its knees. |
Same. Tiktok ignores robots.txt. We have one sight that was getting several hits per second before we stuck a user agent filter in. |
Drupal specific info here: https://dev.acquia.com/blog/automated-bot-traffic-strategies-handle-and-manage-it |
Suggestions from tech call below: Blocking bots by user agent:
Stopping legit bots from crawling facets:
Remaining questions:
|
Nginx allows for multiple conf files. We could add an include in nginx.conf to point to a file in /var/www/drupal which would eliminate the need for a separate mount. |
fwiw, I've found the patch for facets at https://www.drupal.org/node/2937191 useful; it converts the facets into actual checkboxes instead of the default that renders them as links (followable by bots) that get converted to checkboxes by js. |
@kayakr that would be awesome if we could get that patched into the facets module. I really like the checkboxes for facets but am having the same issue with bots. |
I just wanted to share a link to a presentation created by the Islandora community itself that was given during the October 2024 "OPen Meeting" on the topic of bots. Maybe we can use the presentation as a source of content for use in the official docs. At the very least we can link to a PDF version of this document. "Bots - Islandora Open Meeting - Oct 29 2024" |
We should write up some docs about how to block bots from crawling a live site that is set up with Docker. This has come up a few times in Slack and it would be good to have something to explain how to deal with it.
One option some of us have been using is to edit drupal.defaults.conf to return a 403 based on user agent. I have done this by adding the following to my Dockerfile, but you could also mount the conf file and edit it manually:
It would also be nice to document how to block by IP address using Docker.
Related, but possibly a separate issue, is that bots are getting stuck looping over facets. I'm seeing this on my site with legit bots as well, like bingbot. If there is a way to prevent this we should document that as well.
The text was updated successfully, but these errors were encountered: