Skip to content

Conversation

pfultz2
Copy link
Collaborator

@pfultz2 pfultz2 commented Sep 3, 2025

Motivation

Add initial unit tests for jit gpu kernels as we dont have any unit tests. This should cover the tests for #4194.

Technical Details

This adds a test driver to run unit tests. Currently it only works with one lane or all lanes being uniform, but we could extend it in the future to check across all lanes within a wavefront and show which lanes failed. I didnt implement this yet because its not needed right now.

Changelog Category

Improve test coverage(there isnt a category for this below).

    • Added: New functionality.
    • Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

@pfultz2 pfultz2 requested a review from causten as a code owner September 3, 2025 19:14
@pfultz2 pfultz2 self-assigned this Sep 3, 2025
@pfultz2 pfultz2 requested review from CharlieL7, Copilot, kahmed10, TedThemistokleous and shivadbhavsar and removed request for causten and CharlieL7 September 3, 2025 19:15
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds initial unit tests for JIT GPU kernels, providing a new test framework specifically designed for testing GPU kernel code. The main purpose is to improve test coverage for GPU kernel functionality, which previously had no unit testing infrastructure.

  • Creates a new GPU kernel testing framework with test driver and helper utilities
  • Adds comprehensive unit tests for array and algorithm functionality in GPU kernels
  • Implements a GPU-specific test harness that can compile and execute kernel tests on GPU

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/include/test.hpp Adds callback mechanism for test case selection and refactors test execution
test/gpu/kernels/main.cpp Main test driver for GPU kernel tests with compilation and execution logic
test/gpu/kernels/basic.cpp Basic sanity tests for the GPU testing framework
test/gpu/kernels/array.cpp Comprehensive tests for GPU array functionality
test/gpu/kernels/algorithm.cpp Extensive tests for GPU algorithm implementations
test/gpu/kernels/CMakeLists.txt Build configuration for GPU kernel tests
test/CMakeLists.txt Integration of GPU kernel tests into main test suite
src/targets/gpu/kernels/include/migraphx/kernels/test.hpp GPU-specific testing framework and macros
Multiple other files Supporting changes for compilation, kernel management, and utilities

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

template <class... Ts,
MIGRAPHX_REQUIRES(sizeof...(Ts) == N and (is_convertible<Ts, T>{} and ...))>
constexpr array(Ts... xs) : d{xs...}
constexpr array(Ts... xs) : d{static_cast<value_type>(xs)...}
Copy link
Preview

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit cast to value_type in the variadic constructor could cause silent truncation or precision loss. Consider using a concept or SFINAE to ensure safe conversions, or document the potential for data loss in conversions.

Copilot uses AI. Check for mistakes.

{
auto failures = migraphx::gpu::write_to_gpu(int32_t{0}, true);
compile();
k.launch(nullptr, options.global, options.local)(test_cases.at(case_name), failures.get());
Copy link
Preview

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using test_cases.at(case_name) could throw an exception if the case name is not found. Since this is in a test context, consider using find() with proper error handling or add a check to ensure the case exists before calling at().

Suggested change
k.launch(nullptr, options.global, options.local)(test_cases.at(case_name), failures.get());
auto it = test_cases.find(case_name);
if(it == test_cases.end())
{
std::cerr << "Test case not found: " << case_name << std::endl;
CHECK(false); // Fail the test
return;
}
k.launch(nullptr, options.global, options.local)(it->second, failures.get());

Copilot uses AI. Check for mistakes.

Comment on lines +111 to +113
if(suites.count(name) == 0)
continue;
run_suites.insert(suites.at(name));
Copy link
Preview

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code performs two lookups in the suites map: first with count() and then with at(). Consider using find() once and checking the iterator, or use suites.find(name) to avoid the double lookup.

Suggested change
if(suites.count(name) == 0)
continue;
run_suites.insert(suites.at(name));
auto it = suites.find(name);
if(it == suites.end())
continue;
run_suites.insert(it->second);

Copilot uses AI. Check for mistakes.

auto it = content.cbegin();
while(std::regex_search(it, content.cend(), m, case_re))
{
test_names.push_back(m[1].str());
Copy link
Preview

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a temporary string with m[1].str() and then copying it into the vector could be inefficient. Consider using emplace_back(m[1].str()) or directly constructing the string in place if possible.

Suggested change
test_names.push_back(m[1].str());
test_names.emplace_back(m[1].str());

Copilot uses AI. Check for mistakes.

friend constexpr Stream& operator<<(Stream& s, const lhs_expression& self)
{
const char* op = Operator::as_string();
if(op != nullptr or *op != '\0')
Copy link
Preview

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition op != nullptr or *op != '\0' has a logic error. If op is nullptr, dereferencing it with *op will cause undefined behavior. The condition should use and instead of or: if(op != nullptr and *op != '\0').

Suggested change
if(op != nullptr or *op != '\0')
if(op != nullptr && *op != '\0')

Copilot uses AI. Check for mistakes.

Comment on lines +25 to +28
file(GLOB KERNELS_TESTS CONFIGURE_DEPENDS *.cpp)
list(REMOVE_ITEM KERNELS_TESTS ${CMAKE_CURRENT_SOURCE_DIR}/main.cpp)

message("KERNELS_TESTS: ${KERNELS_TESTS}")
Copy link
Preview

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug message statements should typically be removed or made conditional before merging to production. Consider removing this message or wrapping it in a conditional check for debug builds.

Suggested change
file(GLOB KERNELS_TESTS CONFIGURE_DEPENDS *.cpp)
list(REMOVE_ITEM KERNELS_TESTS ${CMAKE_CURRENT_SOURCE_DIR}/main.cpp)
message("KERNELS_TESTS: ${KERNELS_TESTS}")
option(MIGRAPHX_VERBOSE "Enable verbose output for kernel tests" OFF)
file(GLOB KERNELS_TESTS CONFIGURE_DEPENDS *.cpp)
list(REMOVE_ITEM KERNELS_TESTS ${CMAKE_CURRENT_SOURCE_DIR}/main.cpp)
if(MIGRAPHX_VERBOSE)
message("KERNELS_TESTS: ${KERNELS_TESTS}")
endif()

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@CharlieL7 CharlieL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. How would we test larger kernels like TopK as a unit test?

{
// There is no way to easily exit with no error. We can terminate the
// current wavefront without an error, but if there is more wavefronts
// than we need to fallback to a trap which throws an error in HSA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// than we need to fallback to a trap which throws an error in HSA
// then we need to fallback to a trap which throws an error in HSA

Copy link
Collaborator

@TedThemistokleous TedThemistokleous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me too. Copilot begin a tad pedantic here but overall lets get this in and ensure we have test coverage

@TedThemistokleous TedThemistokleous added roadmap Tasks to finish for a release Continous Integration Pull request updates parts of continous integration pipeline labels Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Continous Integration Pull request updates parts of continous integration pipeline roadmap Tasks to finish for a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants