-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add keep_alive flag to crawler.__init__
#921
base: master
Are you sure you want to change the base?
Conversation
Add test for this usecase.
@pytest.mark.parametrize( | ||
('keep_alive', 'max_requests_per_crawl', 'should_process_added_request'), | ||
[ | ||
pytest.param(True, 1, True, id='keep_alive'), | ||
pytest.param(True, 0, False, id='keep_alive, but max_requests_per_crawl achieved'), | ||
pytest.param(False, 1, False, id='Crawler without keep_alive (default)'), | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test case with max_requests_per_crawl > 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
crawler_run_task = asyncio.create_task(crawler.run()) | ||
|
||
# Give some time to crawler to finish(or be in keep_alive state) and add new request. | ||
await asyncio.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't 1 second like... a lot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well any time related test is tricky if there is no event to wait for. How do I make sure that the crawler is alive because keep_alive = True and not just because it is randomly slow and takes time to shut down?
I could wrap basic_crawler.__is_finished_function in some mock and wait until it is called at least once, instead of wait time. It will be faster test, but it will leak implementation. Do you prefer that or some other option?
Description
Add keep_alive flag to
crawler.__init__
If True, this flag will keep crawler alive even when there are no more requests in queue. Crawler is then waiting for more requests to be added or to be explicitly stopped by
crawler.stop()
.Add test.
Issues
keep_alive
flag toBasicCrawler
#891