Prioritize links in hcf-backend #27

Nishant-Bansal-777 · 2023-11-09T04:22:10Z

          thanks for quick response, how exactly those slots can be leveraged to influence the order of request process?

I have 1 spider with both producer and consumer settings as shown below:

{'HCF_AUTH': _scrapy_cloud_key,
'HCF_PROJECT_ID': project_id,
'HCF_PRODUCER_FRONTIER': 'frontier',
'HCF_PRODUCER_NUMBER_OF_SLOTS': 1,
'HCF_PRODUCER_BATCH_SIZE': 300,
'HCF_PRODUCER_SLOT_PREFIX': 'links',
'HCF_CONSUMER_FRONTIER': 'frontier',
'HCF_CONSUMER_SLOT': 'links0',}

I am running spider for an interval of 10 minutes at depth 1. For some urls it finishes before 10 minutes but some url's takes more time so consumer part of crawler is not consuming all the links. The issue I am facing is when I am running spider more than one time it is not starting from start url but from a url which is previously saved to frontier (reading frontier batch before start url). Also how many slots can be created inside a frontier?

Originally posted by @Nishant-Bansal-777 in #26 (comment)

The text was updated successfully, but these errors were encountered:

Gallaecio mentioned this issue Nov 9, 2023

Send request priority to the backend scrapinghub/scrapy-frontera#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritize links in hcf-backend #27

Prioritize links in hcf-backend #27

Nishant-Bansal-777 commented Nov 9, 2023

Prioritize links in hcf-backend #27

Prioritize links in hcf-backend #27

Comments

Nishant-Bansal-777 commented Nov 9, 2023