Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize links in hcf-backend #27

Open
Nishant-Bansal-777 opened this issue Nov 9, 2023 · 0 comments
Open

Prioritize links in hcf-backend #27

Nishant-Bansal-777 opened this issue Nov 9, 2023 · 0 comments

Comments

@Nishant-Bansal-777
Copy link

          thanks for quick response, how exactly those slots can be leveraged to influence the order of request process? 

I have 1 spider with both producer and consumer settings as shown below:

{'HCF_AUTH': _scrapy_cloud_key,
'HCF_PROJECT_ID': project_id,
'HCF_PRODUCER_FRONTIER': 'frontier',
'HCF_PRODUCER_NUMBER_OF_SLOTS': 1,
'HCF_PRODUCER_BATCH_SIZE': 300,
'HCF_PRODUCER_SLOT_PREFIX': 'links',
'HCF_CONSUMER_FRONTIER': 'frontier',
'HCF_CONSUMER_SLOT': 'links0',}

I am running spider for an interval of 10 minutes at depth 1. For some urls it finishes before 10 minutes but some url's takes more time so consumer part of crawler is not consuming all the links. The issue I am facing is when I am running spider more than one time it is not starting from start url but from a url which is previously saved to frontier (reading frontier batch before start url). Also how many slots can be created inside a frontier?

Originally posted by @Nishant-Bansal-777 in #26 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant