-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPY-68: Added multithreading support to neptune sync
#1897
base: dev/1.x
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev/1.x #1897 +/- ##
===========================================
- Coverage 77.53% 75.59% -1.95%
===========================================
Files 303 303
Lines 15384 15378 -6
===========================================
- Hits 11928 11625 -303
- Misses 3456 3753 +297
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing to consider here, no strong opinion:
Python is anyway single-threaded and those operations are IO-bound I think, so perhaps multi-threading approach should be replaced with concurrency and asyncio?
@PatrykGala
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor: | ||
futures = [ | ||
executor.submit( | ||
sync_selected_offline, | ||
backend=backend, | ||
base_path=base_path, | ||
container_names=[selected_offline], | ||
containers=containers.offline_containers, | ||
project_name=project_name, | ||
) | ||
for selected_offline in offline_selected | ||
] | ||
for future in concurrent.futures.as_completed(futures): | ||
future.result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this exact piece of logic (with slightly different parameters each time) is repeated so often it deserves a separate function ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will create a reusable function if we stick with this approach :)
project = get_project( | ||
project_name_flag=QualifiedName(project_name) if project_name else None, | ||
backend=backend, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this change is related to automatic formatting, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
if not project: | ||
raise CannotSynchronizeOfflineRunsWithoutProject | ||
|
||
for container in containers.offline_containers: | ||
container.sync(base_path=base_path, backend=backend, project=project) | ||
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if num_threads
is None
? I think we should perform an ordinary loop then, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ThreadPoolExecutor
uses it's defaults if num_workers
is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what if a user doesn't want to use threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neptune sync -n 1
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but
- setting up a thread might have some cost compared to ordinary for loop
- it's counter- intuitive that if I don't specify threads I still end up with threaded behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's anyway a design choice that the team should discuss internally. Save for that and the other points LGTM ;)
Before submitting checklist