Skip to content

Conversation

@alamb
Copy link

@alamb alamb commented Nov 7, 2025

This is a proposal for updating this PR

Merging this PR will update apache#18448

Basically the idea is to change the signature of the create_hashes so it can take any iterator of things that can turn into a &dyn Array rather than having multiple functions

I think it makes create_hashes a little more complicated, but the callsites are much easier to understand

This is also mostly backwards compatibable -- the only code I needed to change was tests

.with_precision_and_scale(20, 3)
.unwrap();
let array_ref = Arc::new(array);
let array_ref: ArrayRef = Arc::new(array);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the major user facing change -- if you explicitly create an Arc ... something to create_hashes, before this PR the type inference algorithm would automatically determine ArrayRef. Now you need to explicitly declare that you want an ArrayRef

&random_state,
hashes_buff,
)?;
let hashes = create_hashes([&left.columns()[0]], &random_state, hashes_buff)?;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes the callsites look much nicer

@martin-g
Copy link

martin-g commented Nov 7, 2025

Wrong target repo ?!
Did you mean apache/datafusion ?

@alamb
Copy link
Author

alamb commented Nov 7, 2025

Wrong target repo ?! Did you mean apache/datafusion ?

Hi @martin-g -- actually in this case I meant to make a PR in this repo (this is a PR to a branch in the pydantic fork -- so basically merging it will update the PR in the apache repo)

@martin-g
Copy link

martin-g commented Nov 7, 2025

OK! I noticed your comment at apache#18448 (comment) and thought that it might have been a mistake.

let mut hashes2 = vec![0; array.len()];
create_hashes_from_arrays(&[array.as_ref()], &random_state, &mut hashes2)
.unwrap();
create_hashes([array], &random_state, &mut hashes2).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works. The whole point is that I want to call create_hashes from a function where I only have &dyn Array, which does not implement AsRef<dyn Array>.

As far as I know there's no way to convert an &dyn Array into AsRef<dyn Arary without some sort of wrapper type:

struct AsRefWrapper<'a>(&'a dyn Array);

impl<'a> AsRef<dyn Array> for AsRefWrapper<'a> {
    fn as_ref(&self) -> &dyn Array {
        self.0
    }
}

Or something like that. Which seems... a bit annoying to have to do at each call site.
I think I tried the generics method and ended up with two functions because of this limitation.
Do you know a better way around this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

I think we should at least add a test for the specific usecase. Here is what I have:

    #[test]
    fn test_create_hashes_from_dyn_arrays() {
        let int_array: ArrayRef = Arc::new(Int32Array::from(vec![1, 2, 3, 4]));
        let float_array: ArrayRef =
            Arc::new(Float64Array::from(vec![1.0, 2.0, 3.0, 4.0]));

        // Verify that we can call create_hashes with only &dyn Array
        fn test(arr1: &dyn Array, arr2: &dyn Array) {
            let random_state = RandomState::with_seeds(0, 0, 0, 0);
            let hashes_buff = &mut vec![0; arr1.len()];
            let hashes =
                create_hashes([arr1, arr2], &random_state, hashes_buff).unwrap();
            assert_eq!(hashes.len(), 4,);
        }
        test(&*int_array, &*float_array);
    }

Maybe we can do this with a special trait -- I'll play around with it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a special trait for this -- I don't really know why it is necessary but it does seem to work 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creative solution!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"creative" could either be good or bad in this context 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think good! It's an internal implementation detail that users won't have to think about, but it allows us to make the improvements to this function for all callers without introducing a new API. It's good 👍🏻

@alamb alamb changed the title Use generics in create_hashses Use generics in create_hashes Nov 7, 2025
Arc::new(Float64Array::from(vec![1.0, 2.0, 3.0, 4.0]));

// Verify that we can call create_hashes with only &dyn Array
fn test(arr1: &dyn Array, arr2: &dyn Array) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a test showing calling create_hashes with only &dyn Arrays

// combine hashes with `combine_hashes` for all columns besides the first
let rehash = i >= 1;
hash_single_array(array, random_state, hashes_buffer, rehash)?;
pub trait AsDynArray {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume we can't implement this for T: AsRef<dyn Array>?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer is no:

conflicting implementations of trait `AsDynArray` for type `&dyn arrow::array::Array`
upstream crates may add a new impl of trait `std::convert::AsRef<(dyn arrow::array::Array + 'static)>` for type `&dyn arrow::array::Array` in future versions

🤷🏻

@adriangb adriangb merged commit e50907a into pydantic:refactor-create-hashes Nov 7, 2025
32 checks passed
@adriangb
Copy link
Member

adriangb commented Nov 7, 2025

Merged let's continue in apache#18448, thanks!

@alamb alamb deleted the alamb/create_hashes_cleaner branch November 7, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants