You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.
Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag.
Details
This spider should get a list of instances where it starts (seeds) and follow across instances to fetch toots/updates for a certain hashtag (e.g. #vacancy, #job etc.).
Deliverable
It should try to denormalize toots. When instance "example.com" has a toot by '@[email protected]" and "example.org" has this toot too, it should appear only once in the datafile.
If an update is manually re-tooted (i.e. text copied as a new update) it may appear multiple times. Denormalizing based on content of an update is not important.
Boosts and or replies should be ignored (for now).
If tooling is required to setup the environment (pipenv etc) a command should be presented how to get this running for devs and CI.
It should be one command, so that integration is easy. Preferably a command that runs and then stops over a deamon.
ScraPy is preferered as other parts of this project use that already.
The text was updated successfully, but these errors were encountered:
Usage: target/debug/hunter2 [options]
Options:
-h, --help print this help menu
-r, --register register hunter2 with your instance.
-f, --follow follow live updates.
-p, --past fetch past updates.
Using this, I've filled an initial MeiliSearch index. It now runs on 178.62.220.231 (This will change, will go down, and will be replaced with a proper, https backed, domain-named, instance).
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
fedifindIssues related to the intermediate "Fedi Find" project.scrapytask
Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag.
Details
This spider should get a list of instances where it starts (seeds) and follow across instances to fetch toots/updates for a certain hashtag (e.g. #vacancy, #job etc.).
Deliverable
The text was updated successfully, but these errors were encountered: