Skip to content

[Change]: Possibility to crawl a page and only archive the hops out #2807

@thsm-kb

Description

@thsm-kb

Browsertrix Host

Self-Hosted

What change would you like to see?

I would like to be able to crawl a page as crawl scope 'single page' with one hop out and not archive the seedpage.

This will provide me with the ability to maintain online lists, of pages that should be harvested, without adding noise to the archive.

For example:
I maintain a list domain.dk/somelist.htm I use that url as seed and get all the links on that list, but I do not archive domain.dk/somelist.htm

If above is not possible, I would instead like the option on excludes, to select per exclude if it should be excluded prior to of after link extraction.

Additional details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementRequests a change to a featureideaIdea for a feature in consideration

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions