Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPY-68: Added multithreading support to neptune sync #1897

Open
wants to merge 1 commit into
base: dev/1.x
Choose a base branch
from

Conversation

SiddhantSadangi
Copy link
Member

Before submitting checklist

  • Did you update the CHANGELOG? (not for test updates, internal changes/refactors or CI/CD setup)
  • Did you ask the docs owner to review all the user-facing changes?

@SiddhantSadangi SiddhantSadangi requested a review from a team as a code owner December 27, 2024 11:22
@SiddhantSadangi SiddhantSadangi self-assigned this Dec 27, 2024
@SiddhantSadangi SiddhantSadangi requested a review from a team December 27, 2024 11:22
@SiddhantSadangi SiddhantSadangi added this to the 1.14 milestone Dec 27, 2024
Copy link

codecov bot commented Dec 27, 2024

Codecov Report

Attention: Patch coverage is 71.42857% with 10 lines in your changes missing coverage. Please review.

Project coverage is 75.59%. Comparing base (be19b8e) to head (fecb9cc).

Files with missing lines Patch % Lines
src/neptune/cli/commands.py 16.66% 5 Missing ⚠️
src/neptune/cli/sync.py 82.75% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           dev/1.x    #1897      +/-   ##
===========================================
- Coverage    77.53%   75.59%   -1.95%     
===========================================
  Files          303      303              
  Lines        15384    15378       -6     
===========================================
- Hits         11928    11625     -303     
- Misses        3456     3753     +297     
Flag Coverage Δ
e2e ?
e2e-management ?
e2e-s3 ?
e2e-s3-gcs ?
macos 75.32% <71.42%> (-1.89%) ⬇️
py3.12 ?
py3.8 75.59% <71.42%> (-1.95%) ⬇️
ubuntu 75.46% <71.42%> (-1.91%) ⬇️
unit 75.59% <71.42%> (-0.01%) ⬇️
windows 74.50% <71.42%> (-1.95%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@AleksanderWWW AleksanderWWW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing to consider here, no strong opinion:

Python is anyway single-threaded and those operations are IO-bound I think, so perhaps multi-threading approach should be replaced with concurrency and asyncio?
@PatrykGala

Comment on lines +156 to +169
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
futures = [
executor.submit(
sync_selected_offline,
backend=backend,
base_path=base_path,
container_names=[selected_offline],
containers=containers.offline_containers,
project_name=project_name,
)
for selected_offline in offline_selected
]
for future in concurrent.futures.as_completed(futures):
future.result()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this exact piece of logic (with slightly different parameters each time) is repeated so often it deserves a separate function ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will create a reusable function if we stick with this approach :)

Comment on lines +63 to +66
project = get_project(
project_name_flag=QualifiedName(project_name) if project_name else None,
backend=backend,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this change is related to automatic formatting, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

if not project:
raise CannotSynchronizeOfflineRunsWithoutProject

for container in containers.offline_containers:
container.sync(base_path=base_path, backend=backend, project=project)
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if num_threads is None? I think we should perform an ordinary loop then, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadPoolExecutor uses it's defaults if num_workers is None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what if a user doesn't want to use threads?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neptune sync -n 1 ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but

  • setting up a thread might have some cost compared to ordinary for loop
  • it's counter- intuitive that if I don't specify threads I still end up with threaded behaviour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's anyway a design choice that the team should discuss internally. Save for that and the other points LGTM ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants