Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding crawlee's EnqueueStrategy config #176

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

muzafferkadir
Copy link

@muzafferkadir muzafferkadir commented Sep 6, 2024

PR Description

Summary:
This pull request introduces changes to the configuration schema and the crawling logic to enhance the flexibility of the crawling strategy. For more information: https://crawlee.dev/api/core/enum/EnqueueStrategy#All
Changes Made:

  1. Updated Configuration Schema (config.ts):

    • Added crawlStrategy field to the configuration schema.
      • This field allows specifying the Crawlee strategy for checking certain parts of the URLs found.
      • Possible values are "all", "same-origin", "same-hostname", and "same-domain".
      • This field is optional.
  2. Updated Crawling Logic (core.ts):

    • Integrated the crawlStrategy configuration into the PlaywrightCrawler setup.
      • The strategy parameter in enqueueLinks now uses the config.crawlStrategy value if provided.
      • Ensures that the crawling strategy defined in the configuration is applied during the crawling process.

Impact:

  • These changes provide greater control over the crawling behavior, allowing users to specify how URLs are handled based on their origin and domain.

Examples:

  • When crawlStrategy is set to "same-origin", the crawler will only follow links within the same origin.
  • When crawlStrategy is set to "all", the crawler will follow all links regardless of their origin.

@steve8708
Copy link
Contributor

thanks @muzafferkadir ! looks like build is failing, so will need that in to merge. otherwise, great update

@muzafferkadir
Copy link
Author

thanks @muzafferkadir ! looks like build is failing, so will need that in to merge. otherwise, great update

thanks, i updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants