Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse 0 fill value as "" for str dtype #2798

Closed
wants to merge 1 commit into from

Conversation

moradology
Copy link
Contributor

@moradology moradology commented Feb 4, 2025

This PR implements fill_value parsing for v2 metadata such that if the dtype specified is str, and 0 is the provided fill_value, this value is replaced with ""

Closes #2792

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

As a side note, I was a bit surprised at how many places there were that independently seemed to be concerned with metadata parsing. This... was slightly confusing, as values undergo multiple parse_* steps after having been parsed already (so that it is unclear without printing stack and dropping into a debugger when e.g. dtype was the provided dtype vs when it was the already parsed dtype. In this case, as the dtype of object is produced by parse_dtype in init_array, the only place where it was possible to check on the dtype specified by the user was at that point (way up the stack from calls into v2.py). Not sure I quite understand the organization (though I'm sure there are good reasons for it; perhaps historical and compatibility related) and I'd be very interested in a rough sketch of the motivation that I'm surely failing to see

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 4, 2025
Comment on lines 105 to 107
# No zarr_format available here...
# fill_value_parsed = parse_fill_value(fill_value, dtype, zarr_format=2)
fill_value_parsed = fill_value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to make of this. The fill value is parsed way up the stack from here or else the dtype we have to inspect is lost. Should I just toss this out?

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 4, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Feb 5, 2025

As a side note, I was a bit surprised at how many places there were that independently seemed to be concerned with metadata parsing. This... was slightly confusing, as values undergo multiple parse_* steps after having been parsed already (so that it is unclear without printing stack and dropping into a debugger when e.g. dtype was the provided dtype vs when it was the already parsed dtype. In this case, as the dtype of object is produced by parse_dtype in init_array, the only place where it was possible to check on the dtype specified by the user was at that point (way up the stack from calls into v2.py). Not sure I quite understand the organization (though I'm sure there are good reasons for it; perhaps historical and compatibility related) and I'd be very interested in a rough sketch of the motivation that I'm surely failing to see

Sorry that all the parsing was confusing, we should definitely avoid parsing the same things twice! Could you open an issue for the values that are getting parsed multiple times?

@@ -150,9 +150,11 @@ def parse_shapelike(data: int | Iterable[int]) -> tuple[int, ...]:
return data_tuple


def parse_fill_value(data: Any) -> Any:
def parse_fill_value(fill_value: Any, dtype: Any, zarr_format: ZarrFormat) -> Any:
if zarr_format == 2 and (dtype is str or dtype == "str") and fill_value == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs at least a comment explaining why we are doing this. what was zarr-python 2's behavior in this situation?

@moradology
Copy link
Contributor Author

Looks like this has been superseded by 2799

@moradology moradology closed this Feb 5, 2025
@moradology moradology deleted the fix/0fill-str-v2 branch February 5, 2025 14:37
@LDeakin
Copy link
Contributor

LDeakin commented Feb 6, 2025

@moradology #2799 is resolving a separate issue (None fill values) and does not supersede this PR. But there is uncertainty over how #2792 should be handled, as the behaviour has changed between zarr-python 2/3, see #2792 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in string data type handling with 0 fill value
3 participants