Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a setting to run a spider against a specific archive #29

Open
leewesleyv opened this issue Jan 21, 2025 · 1 comment
Open

Provide a setting to run a spider against a specific archive #29

leewesleyv opened this issue Jan 21, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@leewesleyv
Copy link
Collaborator

Ideally we want to be able to optionally invoke the spider to run against the most recent archive by passing a CLI option or environment variable (picked up in settings).

@leewesleyv leewesleyv added the enhancement New feature or request label Jan 21, 2025
@wvengen
Copy link
Member

wvengen commented Jan 21, 2025

Since the spider knows where to locate object storage, it is relatively easy to figure this out. If a system talking to Scrapy needs to figure this it by itself, it needs to know container storage details.

As an addition to this feature, one could also perhaps provide a date/timestamp to locate the last archive before that (but that may depend on the configured storage path, so could be tricky).

@wvengen wvengen changed the title Provide a CLI option or setting to run a spider against a specific archive Provide a setting to run a spider against a specific archive Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants