-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] inconsistency between C++ implementation and the statistics schema #45560
Comments
Good catch! The example was wrong. |
"column" in examples have duplicated values for the same column but "column" must have only one value for each column.
I think there is another issue in Record Bath statistics arrow/cpp/src/arrow/record_batch.h Line 289 in ec3d283
the link does not exist no more |
There is another issue in a test regarding Record batch statistics. arrow/cpp/src/arrow/record_batch_test.cc Lines 1395 to 1424 in ec3d283
This test is about min approximate value but title is about max approximate value |
I think one thing that is not clear about schema statistics specification is what I should expect from nested type statistics. I know that nested type statistics have not been implemented yet. link but nested types can have statistics like a simple type. Should the implementation accept all of that? for example is it acceptable that a struct in record batch to have max_approximate? |
Good catch! Could you open a separated issue for this? Let's fix it in a separated PR. |
Yes. We may need to add |
let imagine a type like |
I think things connected to C++ implementation would likely better be solved in this issue. I will probably pick it up after I finish another issue.(One issue has already been assigned to me.) |
We have Lines 714 to 727 in c7a9100
Lines 705 to 712 in c7a9100
It seems that they can represet the max value. |
The issue is for enhancing the current feature. But the test isn't an enhancement. It's a bug fix. I think that we should use a separated issue/pull request. I think that we can merge the latter soon because the fix will be straightforward. If we mix them, we will not be able to merge soon. Because the former will be complex. We can't merge only the latter. |
Okay I will do it by the way what about this?
|
Ah, let's work on it as another separated task too. We'll be able to merge it quickly. |
Okay I will do it |
### Rationale for this change "column" in examples have duplicated values for the same column but "column" must have only one value for each column. ### What changes are included in this PR? * Use "rowspan" for "column" in logical examples. * Remove duplicated values from "column" in physical examples. * Add missing "offsets" to "statistics" in physical examples. ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: #45560 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Issue resolved by pull request 45561 |
several things are remained which has not been addressed yet. |
We should work on it as #45474 not this.
Could you open a new issue for this? Let's work it as a separated task. |
I've just created. #45639 |
Describe the bug, including details regarding any error messages, version, and platform.
I have implemented the Example in c++ and as it is mentioned in this link I should expect a physical layout which has 9 elements in colunmn_field but I got only three elements in column_field
main.zip
Component(s)
Documentation
The text was updated successfully, but these errors were encountered: