Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: await on size, assuming it can be an async function #1281

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mukhery
Copy link
Contributor

@mukhery mukhery commented May 29, 2023

Checking size may require a call to the async _info function. On line 277, _cp_file checks if f1.size is async (callback.set_size(await maybe_await(f1.size))), but then later it simply references f1.size which fails if this function is async. As such, this PR corrects the code to ensure f1.size references are wrapped in await maybe_await().

@mukhery
Copy link
Contributor Author

mukhery commented Jun 1, 2023

@martindurant what do you think?

@martindurant
Copy link
Member

I think you are right in your argument. However, I have a feeling that f.size is never async actually, so it won't matter. open_async/AsyncStreamedFiles could use work!

fsspec/generic.py Outdated Show resolved Hide resolved
@mukhery
Copy link
Contributor Author

mukhery commented Jun 1, 2023

I think you are right in your argument. However, I have a feeling that f.size is never async actually, so it won't matter. open_async/AsyncStreamedFiles could use work!

fsspec/s3fs#742 makes it async in order to call the async _info needed for _cp_file

@martindurant
Copy link
Member

Funny, because S3 is the only one where for sure we should know the size before starting the download: the content-size header is always populated and, possibly outside compressive transcoding, will be correct.

@mukhery
Copy link
Contributor Author

mukhery commented Jun 10, 2023

Funny, because S3 is the only one where for sure we should know the size before starting the download: the content-size header is always populated and, possibly outside compressive transcoding, will be correct.

Perhaps I should change https://github.com/fsspec/filesystem_spec/blob/master/fsspec/generic.py#L277 to just remove the assumption that f1.size could be async? Or do you think this still might be useful for other implementations even though it isn't an issue for s3?

@martindurant
Copy link
Member

maybe_await should never be harmful and not add any overhead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants