Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Write a script or scrapy spider that retrieves a list of (mastodon) instances. #40

Open
berkes opened this issue Mar 3, 2021 · 0 comments
Labels
fedifind Issues related to the intermediate "Fedi Find" project. scrapy task
Milestone

Comments

@berkes
Copy link
Contributor

berkes commented Mar 3, 2021

Write a script or scrapy spider that retrieves a list of (mastodon) instances.

Details

For #37, as input for #38 we need an updated list of mastodon instances. Later extendable with pleroma, friendica and other fediverse instances. The source should allow fetching this data (i.e. don't just copy the first fediverse-list as it may not allow copying this list).

This list acts as input (a textfile, json or STDIN) to the spider in #38 that finds "for-hire" profies on that instance.

Deliverable

  • A command that fetches instances and presents those in plain text. Preferably to STDOUT so the integration can choose to pipe it elsewhere or redirect into a file.
  • If this command requires tooling and environment setup, we need additional commands to set this environment up (in CI en on a server). But simpler (i.e. bash+curl, or a single binary) is preferred over pipenv. rbenv, nodejs/npm/npx and so on.
  • It should run in reasonable time: i.e. not take days and gigabytes of bandwith to fetch a list, but rather minutes or seconds.
  • It should not hammer servers: if it needs to crawl (i.e. ScraPy), it must set conservative delays.
  • It should advertise itself transparently to the server in a HTTP header. So that admins of services can contact us instead of just blocking us.
@berkes berkes added the task label Mar 3, 2021
@berkes berkes added this to the Search milestone Mar 3, 2021
@berkes berkes added fedifind Issues related to the intermediate "Fedi Find" project. scrapy labels Mar 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
fedifind Issues related to the intermediate "Fedi Find" project. scrapy task
Projects
None yet
Development

No branches or pull requests

1 participant