-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[434] Allow to filter jobs in ZyteJobsComparisonMonitor by close_reason #440
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #440 +/- ##
==========================================
+ Coverage 79.51% 79.54% +0.03%
==========================================
Files 76 76
Lines 3237 3242 +5
Branches 537 539 +2
==========================================
+ Hits 2574 2579 +5
Misses 593 593
Partials 70 70 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me!
We should add some unittest to cover the new addition, and it would be nice to run some test job in ScrapyCloud to verify these changes (Installing Spidermon in the requierements.txt from git+https://github.com/shafiq-muhammad/spidermon@shafiq-434
should get you these changes). After that's addressed we should be good to go 👍
Just a quick FYI @shafiq-muhammad @curita, I believe the pipeline is failing due to a regression from pytest 8.1.1 to 8.2.0. My validation was:
Hope this helps (and is correct regardless of environment). |
Yes, it looks like Pytest 8.2.0 release has caused a lot of troubles according to their issues page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, though we might want to change how we get the jobs depending on whether we want to get fewer API calls. I'm inclined to get fewer API calls, but let's wait for Victor's opinion.
Other than that, I suggested a small documentation addition, which should be quick to include.
|
||
if len(current_jobs) < MAX_API_COUNT or len(total_jobs) == number_of_jobs: | ||
for job in current_jobs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: This makes sense since it seems we can't filter by close_reason in client.spider.jobs.list()
.
I wonder if it makes sense to retrieve MAX_API_COUNT
count jobs each time from the API and append to total_jobs
the jobs that meet the criteria until we reach number_of_jobs
. This should result in fewer API calls, but those calls will be larger. What do you think, @VMRuiz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the default close reason is finished
I would keep it as it is. As the most common scenario will be that you fetch the last few jobs and those will have the finished status. Even in the case where a second call is needed - due to a few unexpected failed jobs in between - it could still be faster for lower volumes than making 1 very big call.
On the contrary, if it's expected that uncommon close reasons like timeout
are used, then it makes sense to fetch as many jobs as possible, as they will be few or even none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ready to me!
Adds limit for nested dict stats computation
Added a new setting
SPIDERMON_JOBS_COMPARISON_CLOSE_REASONS
to allowZyteJobsComparisonMonitor
to filter by the ScrapyCloud jobs with theirclose_reason
stat.