You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am learning DataFusion and tried to do the canonical big data version of hello world, word count, using DataFusion. I have been unsuccessful, and I am wondering if word count is even currently possible with DataFusion.
Typically word count involves a flat_map where you split each string based on the white space contained within each string.
There are two issues I am running into
creating a udf that goes from &str -> Vec<&str>. I cannot find an arrow::array that maps to a collection of string, which is preventing me from creating a udf that can perform the split.
Assuming I could get 1 to work, I am not aware of a method that is similar to flat_map that may be performed on a column. In sql, I believe this is called explode, which I can't find in the codebase, which makes me think flat_map style operations aren't possible.
My questions are:
Is word count currently possible in DataFusion? If so, how can perform the split and how can you perform a flat_map? If word count cannot be done, what would need to be implemented to make it possible?
The text was updated successfully, but these errors were encountered:
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-12293
I am learning DataFusion and tried to do the canonical big data version of hello world, word count, using DataFusion. I have been unsuccessful, and I am wondering if word count is even currently possible with DataFusion.
Typically word count involves a flat_map where you split each string based on the white space contained within each string.
There are two issues I am running into
creating a udf that goes from &str -> Vec<&str>. I cannot find an
arrow::array
that maps to a collection of string, which is preventing me from creating a udf that can perform the split.Assuming I could get
1
to work, I am not aware of a method that is similar to flat_map that may be performed on a column. In sql, I believe this is calledexplode
, which I can't find in the codebase, which makes me think flat_map style operations aren't possible.My questions are:
Is word count currently possible in DataFusion? If so, how can perform the split and how can you perform a flat_map? If word count cannot be done, what would need to be implemented to make it possible?
The text was updated successfully, but these errors were encountered: