Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assert failure for range window functions #14168

Merged
merged 2 commits into from
Sep 23, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions cpp/src/rolling/grouped_rolling.cu
Original file line number Diff line number Diff line change
Expand Up @@ -357,15 +357,28 @@ template <typename T>
struct device_value_accessor {
column_device_view const col; ///< column view of column in device

/// Checks that the type used to access device values matches the rep-type
/// of the order-by column.
struct is_correct_range_rep {
template <typename U> /// Order-by type.
constexpr bool operator()() const
{
return std::is_same_v<T, cudf::detail::range_rep_type<U>>;
}
};

/**
* @brief constructor
*
* @param[in] col_ column device view of cudf column
*/
explicit __device__ device_value_accessor(column_device_view const& col_) : col{col_}
{
cudf_assert(type_id_matches_device_storage_type<T>(col.type().id()) &&
"the data type mismatch");
// For non-timestamp types, T must match the order-by column's type.
// For timestamp types, T must match the range rep type for the order-by column.
cudf_assert((type_id_matches_device_storage_type<T>(col.type().id()) or
cudf::type_dispatcher(col.type(), is_correct_range_rep{})) &&
"data type mismatch when accessing the order-by column");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with retrieving the data (via element()) using the rep-type and doing operations on said rep-type. I would rather the actual type be read (via element()) and used directly.
I feel the original error is correct and this is a flaw in the design of this code perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we're comparing the rep-type values here is that the order-by columns are timestamps, and the preceding/following values are durations (e.g. 2 days ago).

For integral order-by columns, preceding/following are of the same types. But not for timestamps.

Copy link
Contributor Author

@mythrocks mythrocks Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That said, maybe I could stand to tighten this up a little more... I'll see about a stricter check for timestamps.

I did take a look. By this point, the duration types for preceding and following are normalized with the order-by's timestamp types already, upstream.

For this release, I'd rather we didn't dismantle the logic in range_window_bounds (i.e. using rep-types) as part of fixing this assert condition. I could tackle this as a follow-on, if that's agreeable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just realized that one unintended consequence of doing order-by comparisons without switching to the rep-types is that all those templates will be instantiated for all timestamp types, duration types, fixed-point, etc. :/

}

/**
Expand Down