Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: use generic.copy in put() and get() #214

Closed
wants to merge 1 commit into from

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Jul 4, 2023

Fixes iterative/dvc#8964
Closes #212

  • Replaces fsspec fs.get()/fs.put() with our generic.copy implementation
    • Skips unneeded fsspec overhead in fs.expand_path (since we already do any needed recursive dir expansion via find())
    • Skips unwanted fsspec glob behavior (see original dvc issue) since we already deal with raw paths and not glob patterns
    • Ensures we use consistent sync/async batching (so get()/put() work the same as generic.transfer)
    • Makes get() atomic at the file level

@pmrowla pmrowla self-assigned this Jul 4, 2023
@pmrowla
Copy link
Contributor Author

pmrowla commented Jul 4, 2023

@hemker can you try this patch and verify that it resolves your issue with [] in S3 paths?

(Your existing PR works, but this way we don't have to deal with monkey-patching fsspec)

in your DVC env:

pip install 'dvc-objects @ git+https://github.com/iterative/dvc-objects.git@refs/pull/214/head'

executor = ThreadPoolExecutor(max_workers=jobs, cancel_on_error=True)
with executor:
list(executor.imap_unordered(get_file, from_infos, to_infos))
copy(
Copy link
Contributor Author

@pmrowla pmrowla Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use transfer here to get reflinking instead of copy in the event we are only doing local->local operation on an fs that supports reflinks, but using copy keeps us consistent with existing get()/put() behavior

@codecov-commenter
Copy link

codecov-commenter commented Jul 4, 2023

Codecov Report

Patch coverage has no change and project coverage change: +0.46 🎉

Comparison is base (76c3665) 64.48% compared to head (6efef88) 64.95%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #214      +/-   ##
==========================================
+ Coverage   64.48%   64.95%   +0.46%     
==========================================
  Files          25       25              
  Lines        1940     1926      -14     
  Branches      307      303       -4     
==========================================
  Hits         1251     1251              
+ Misses        634      620      -14     
  Partials       55       55              
Impacted Files Coverage Δ
src/dvc_objects/fs/base.py 54.40% <0.00%> (+2.29%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hemker
Copy link

hemker commented Jul 4, 2023

@pmrowla Thanks for your fast reply!
Tested the fix with our data - problem solved :)

@efiop
Copy link
Contributor

efiop commented Jul 21, 2023

Hey @pmrowla , great stuff! What's the status here, how do you want to proceed?

@pmrowla
Copy link
Contributor Author

pmrowla commented Jul 22, 2023

One of the fsspec maintainers was looking into fixing the issue upstream and adding support for escaping the glob characters, so I was waiting to see what happens on that end fsspec/filesystem_spec#1306

As we discussed a while back I'm still not sure it's a good idea for us to completely replace fsspec put/get with our own behavior, it might be better for us to just make DVC dependency.download() use our generic.copy directly (instead of calling fsspec fs.get)

@pmrowla
Copy link
Contributor Author

pmrowla commented Jul 24, 2023

Closing in favor of iterative/dvc#9753

@pmrowla pmrowla closed this Jul 24, 2023
@pmrowla pmrowla deleted the put-get-copy branch July 24, 2023 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File names containing [ ] in s3 causes import-url/update to fail
4 participants