Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for conditional / exclusive write #1693

Open
TomAugspurger opened this issue Sep 27, 2024 · 3 comments
Open

API for conditional / exclusive write #1693

TomAugspurger opened this issue Sep 27, 2024 · 3 comments

Comments

@TomAugspurger
Copy link
Collaborator

Over in zarr-developers/zarr-python#2262, we'd like to write a file but only if it doesn't already exist. On a local file system, this would be open(path, mode="xb"), which will fail with a FileExistsError if the file already exists.

Now that S3 supports conditional writes, it should be possible to implement this for s3fs, gcsfs (if_generation_match=0), and adlfs (overwrite=False).

Would there be any appetite for standardizing this behavior? I'm not sure what API is best, but I lean towards something like an overwrite: bool parameter to pipe and similar methods. We could also try to support mode=xb in some open-like methods, but I'm less sure about that.

@martindurant
Copy link
Member

If this is only to apply to open, then the mode= would be fine, and probably the check would happen at open time. But I think you mean for methods put/pipe, right? A bool argument on those methods and their one-file variants would be enough.

A couple of thoughts:

  • how does this interact with on_error, when trying to write multiple files to remote; is it like any other IO error? Probably yes; so other files would get written (concurrently), this would not act as a lock on the whole operation
  • on S3, I assume you are looking at If-None-Match; is if_generation_match=0 really the same, or does it mean "if no such filename ever existed"?

@martindurant
Copy link
Member

Do you know how this interacts with multi-part-uploads, where although many bytes might have been sent, the file is not really written to the remote path location until a final commit? At what point is the exists condition applied?

@TomAugspurger
Copy link
Collaborator Author

I'm not sure offhand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants