Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Intermediate Search: Implement a crawler that spans mastodon instances. #38

Open
berkes opened this issue Mar 3, 2021 · 0 comments
Open
Labels
fedifind Issues related to the intermediate "Fedi Find" project. scrapy task
Milestone

Comments

@berkes
Copy link
Contributor

berkes commented Mar 3, 2021

Expand the crawler that can crawl a list of mastodon instances and extracts public profiles with #for-hire tags.

Details

For the intermediate product "for hire search", we need to extend the ScraPy spider to crawl across multiple mastodon instances. Currently it only crawls one instance in a Proof of concept. TODO: release this scraper proof of concept in a flockingbird repo.

"Intermediate search" is explained in #37.

Deliverable

  • Given a list (in JSON, text or STDIN), we crawl each instance on that list for public profiles
  • As in the Proof of concept scraper, we only index public data.
  • As in the Proof of concept scraper, we adhere to noindex, robots.txt, etc.
  • It returns a JSON document either per instance, or of all instances, structural similar to the proof of concept scraper.
@berkes berkes added task fedifind Issues related to the intermediate "Fedi Find" project. labels Mar 3, 2021
@berkes berkes added this to the Search milestone Mar 3, 2021
@berkes berkes added the scrapy label Mar 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
fedifind Issues related to the intermediate "Fedi Find" project. scrapy task
Projects
None yet
Development

No branches or pull requests

1 participant