Skip to content

Conversation

@klion26
Copy link
Member

@klion26 klion26 commented Nov 12, 2025

Which issue does this PR close?

What changes are included in this PR?

Fix the logic for VariantToNullArrowRowBuilder so that it respects the cast option

Are these changes tested?

added test

Are there any user-facing changes?

Noe

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Nov 12, 2025
Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich @liamzwbao Please help review this when you're free ,thanks.

if let Some(_) = value.as_null() {
Ok(true)
} else {
// Null type only accepts nulls
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose to return err because Arrays of type Null cannot contain a null bitmask. If we return NullArray::new(valid_value_count), then the length of the result may not be the same as the input(the other types will add None in the result)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we can't add a NULL mask to a NullArray. But I thought the idea was to let the normal strict-mode checking either error out or call (our fake) append_null that actually does the same thing as append_value (namely nothing). No null masks in sight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in the issue about counting appended values/nulls was kind of unrelated: technically there's no guarantee the number of appends matches the initial capacity estimate, which in arrow API is only an estimate. So we should ideally have both append_null and append_value increment some counter, whose final value dictates the size of the final array we create.

@klion26
Copy link
Member Author

klion26 commented Nov 17, 2025

@scovich Thanks for the review and sorry for the late reply; I had to handle some unforeseen circumstances last week.

I've addressed the comments in the new commit. Please take another look.

In the last commit, I assumed that before append_null here, it can't be accepted as the target type is Null(we can't distinguish the target type and the missing element), but it seems append_null when value.as_null.is_none() in safe mode is fine(we may don't need to distinguish the two types, and there is no null mask in NullArray).

let field = Field::new("typed_value", DataType::Null, true);
let options = GetOptions::new()
.with_as_type(Some(FieldRef::from(field)))
.with_cast_options(CastOptions {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use arrow::compute::CastOptions in variant_get and crate::CastOptions in cast_to_variant_with_options, not sure if we need to unify these in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we had similar discussion before. I vote for unifying them, and we don't really need the format_options in the arrow::compute::CastOptions

Copy link
Member Author

@klion26 klion26 Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will file an issue to track this
Filed an issue #8873

Comment on lines 713 to 720
if value.as_null().is_some() {
self.item_count += 1;
Ok(true)
} else if self.cast_option.safe {
self.append_null()?;
Ok(false)
} else {
Err(ArrowError::CastError(format!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define_variant_to_primitive_builder! macro handles all this already?
Why not use it?

|value| matches!(value, Variant::Null).then_some(0)

and then

fn append_value(&mut self, _dummy: i32) {
    self.item_count += 1;
}
fn append_null(&mut self) {
    self.item_count += 1;
}

(no need to capture the cast options, the macro handles that part)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can reuse the define_variant_to_primitive_builder! macro, updated.

@klion26 klion26 requested a review from scovich November 18, 2025 12:07
@klion26
Copy link
Member Author

klion26 commented Nov 18, 2025

@scovich I've updated the code. Please take another look, thanks

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 704 to 706
fn new() -> Self {
Self { item_count: 0 }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: Consider #[derive(Default)] and FakeNullBuilder::default() at L723 below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, it's more elegant, added in the new commit

Copy link
Contributor

@liamzwbao liamzwbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thanks for the fix

let field = Field::new("typed_value", DataType::Null, true);
let options = GetOptions::new()
.with_as_type(Some(FieldRef::from(field)))
.with_cast_options(CastOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we had similar discussion before. I vote for unifying them, and we don't really need the format_options in the arrow::compute::CastOptions

@klion26
Copy link
Member Author

klion26 commented Nov 19, 2025

@alamb Please take a final review when you're free, thanks.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- this looks good to me

@alamb alamb merged commit fea605c into apache:main Nov 19, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Variant to NullType conversion ignores strict casting

4 participants