Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The non-humanly-readable fields may have never existed in the OpenSearch fork. #423

Open
dblock opened this issue Jul 16, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@dblock
Copy link
Member

dblock commented Jul 16, 2024

What is the bug?

Coming from opensearch-project/opensearch-go#284 (comment), the spec contains both humanly readable and non-humanly-readable versions of many fields. Fields of type _common:BysteSize are no longer returned by the API as they got replaced by the same property suffixed with _in_bytes pointing to type number.

The core engine seems to do something clever about it. For example. These are added with humanlyReadableField. It looks like there still may be cases where you have a non-humanly-readable field, or are there?

Similar issue for component/schemas/_common:Percentage. It is also oneOf for number or string. This is used for the cluster health endpoint. Which returns the value as number for yaml and json.

What is the expected behavior?

Figure out whether non-humanly-readable fields are still possible, when they were deprecated/removed, either annotate them with x-version-* or delete them if they never existed in the OpenSearch fork.

@dblock dblock added bug Something isn't working untriaged labels Jul 16, 2024
@dblock dblock removed the untriaged label Jul 16, 2024
@dblock dblock changed the title [BUG] Remove non-humanly-readable fields. [BUG] The non-humanly-readable fields may have never existed in the OpenSearch fork. Jul 18, 2024
@Jakob3xD
Copy link
Contributor

Did some research and found out, that there is a human parameter. When this parameter is set it also returns the the memory_size field. The question now is, if the string can also be something else then a string if there is some sort of unit parameter. Therefore the fields are valid but we should look if they type can be more specific then oneOf string, number.

GET <some-index>/_stats?human
      "fielddata": {
        "memory_size": "0b",
        "memory_size_in_bytes": 0,
        "evictions": 0
      }
GET <some-index>/_stats
      "fielddata": {
        "memory_size_in_bytes": 0,
        "evictions": 0
      }

@Jakob3xD
Copy link
Contributor

If I understand the java code correctly than the human fields should be of type string all the time.
At least for fielddata.
The value and both fields are passed to the builder.humanReadableField() function.
https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/fielddata/FieldDataStats.java#L121
The builder.humanReadableField() function checks if human parameter is true and converts the value into a string.
https://github.com/opensearch-project/OpenSearch/blob/main/libs/core/src/main/java/org/opensearch/core/xcontent/XContentBuilder.java#L976

What I don't understand is what is happening with the rawValue as I don't understand the java code.
https://github.com/opensearch-project/OpenSearch/blob/main/libs/core/src/main/java/org/opensearch/core/xcontent/XContentBuilder.java#L978

@Jakob3xD
Copy link
Contributor

@dblock I am looking though the spec and the ByteSize type is referenced inconsistent.
Some fields that are suffixed with _in_bytes are directly of type number or reference the ByteSize.
The same fields without the suffix reference to ByteSize.
This combination of string and number is bad.

Therefore I suggest following options:

  1. Remove the ByteSize type and set everything directly to the actual type of string or number
  2. Add ByteSizeHuman of type string and use it for all string byte representations, make ByteSize of type number and use it for byte representations as number.

Depending if we prefer reverencing a schema that is of a single type or directly using the type.

I am not 100% sure what the expected benefit of referencing a schema of a single type is. I could only thing of having one place to change the type for all but I am not sure this will ever happen?

@dblock
Copy link
Member Author

dblock commented Jul 24, 2024

I personally prefer a type like ByteSizeHuman or HumanByteSize (not sure what our preference is, would look at other types) because it tells client generators to treat the type as something semantically meaningful and add logic to it. For example, a field of ByteSize could then be converted to KB via .value.toKb() in certain languages, which would not be semantically possible if it's just a number or a string. My thinking was similar for nullable strings.

I'd defer to @nhtruong of what we should do, and I think we should update https://github.com/opensearch-project/opensearch-api-specification/blob/main/DEVELOPER_GUIDE.md to the effect of what's recommended.

@nhtruong
Copy link
Collaborator

Important to note that human is a global parameter that's accepted by all endpoints.

According to @Jakob3xD's finding, I do think we need 2 separate types

  • Bytes for number because the name already tells you what unit the number represents
  • StorageSize for string as it will have to include the unit to make sense. This datatype is a string with pattern:
type: string
pattern: '\d+(\.\d+)?\s(b|kb|mb|gb|tb|pb)'

The pattern will be ignored by the clients but our JSON Schema validator will use it to test it against the response schemas.

Like @dblock said, we should have a section for dealing with Units for the Dev's Guide (This includes time units too).

@Jakob3xD
Copy link
Contributor

* `Bytes` for number because the name already tells you what unit the number represents

Bytes is already defined as string with enum for the bytes parameter. Do we want to rename it to ByteUnit like it is for TimeUnit?

@nhtruong
Copy link
Collaborator

nhtruong commented Jul 26, 2024

I guess we're back to ByteSize. It was just a suggestion. I basically just agreed with your proposed about splitting it into more explicit number and string types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants