You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running spider for an interval of 10 minutes at depth 1. For some urls it finishes before 10 minutes but some url's takes more time so consumer part of crawler is not consuming all the links. The issue I am facing is when I am running spider more than one time it is not starting from start url but from a url which is previously saved to frontier (reading frontier batch before start url). Also how many slots can be created inside a frontier?
I have 1 spider with both producer and consumer settings as shown below:
{'HCF_AUTH': _scrapy_cloud_key,
'HCF_PROJECT_ID': project_id,
'HCF_PRODUCER_FRONTIER': 'frontier',
'HCF_PRODUCER_NUMBER_OF_SLOTS': 1,
'HCF_PRODUCER_BATCH_SIZE': 300,
'HCF_PRODUCER_SLOT_PREFIX': 'links',
'HCF_CONSUMER_FRONTIER': 'frontier',
'HCF_CONSUMER_SLOT': 'links0',}
I am running spider for an interval of 10 minutes at depth 1. For some urls it finishes before 10 minutes but some url's takes more time so consumer part of crawler is not consuming all the links. The issue I am facing is when I am running spider more than one time it is not starting from start url but from a url which is previously saved to frontier (reading frontier batch before start url). Also how many slots can be created inside a frontier?
Originally posted by @Nishant-Bansal-777 in #26 (comment)
The text was updated successfully, but these errors were encountered: