Skip to content

Conversation

@Uzlopak
Copy link
Contributor

@Uzlopak Uzlopak commented Sep 26, 2025

In the fallout of issue #4588 I was in the FSF IRC channel and discussed the issue and we discussed about potential solutions. cURL`s user agent is also blocked by many servers. The reason is, that bad implemented crawlers can be identified by it.

I offered, that we can add a page in to the documentation regarding crawling and that we recommend to change the user agent to some random value, but that would not solve potential bots which ignore robots.txt.

It was further discussed, that doing a curl request against 'bgp.tools' curl https://bgp.tools results in following response:

Requests from default user agents are not allowed, please set a descriptive user-agent. For example: 'acmeco bgp.tools - [email protected]', This is so that if your program gets out of control I can contact you.

Setting a dishonest user agent will result in bans.

I used this information to write a reasonable documentation page regarding crawling.

I used chatGPT and claude to refine the text.

I just added the test to ensure that my example in the doc is not a fantasy, lol.

Realistically seen, the default user agent can always be banned. We should anyway recommend to use a custom user agent. And instead of giving some recommendation like "generate something random as user agent" we give them some reasonable guidance. A malicious crawler will anyway use some fake user agent and a valid crawler will set a reasonable user agent.

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@7321451). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4590   +/-   ##
=======================================
  Coverage        ?   92.94%           
=======================================
  Files           ?      106           
  Lines           ?    32973           
  Branches        ?        0           
=======================================
  Hits            ?    30648           
  Misses          ?     2325           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@TechnologyClassroom
Copy link

I think this documentation is well written and good advice for people building crawlers.

@metcoder95 metcoder95 merged commit 6d912de into main Oct 8, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants