Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Simplify loading with CUDA #150

Open
marius311 opened this issue Jul 13, 2022 · 4 comments
Open

FR: Simplify loading with CUDA #150

marius311 opened this issue Jul 13, 2022 · 4 comments
Labels

Comments

@marius311
Copy link

marius311 commented Jul 13, 2022

For @tullio to generate a GPU version, you currently need using KernelAbstractions, CUDAKernels, CUDA before invoking the macro. This makes it a pain if you want to use Tullio in some library package which is GPU-optional and has some @require CUDA ... code. The only solution I've found is, in your own library package, have a @require CUDA and then re-include your own code which uses @tullio, so that Tullio generates the necessary GPU versions. This is pretty hacky / breaks precompilation / often doesn't play nicely with multiple processes for some reason.

It would be great if Tullio could be defined in a more standard way that didn't necessitate doing this.

@mcabbott
Copy link
Owner

This would be nice.

At the moment it calls KernelAbstractions.@kernel at macro-expansion-time, for the contents of a function which dispatches on Type{<:CuArray}. And @kernel wants, I think, to decide what code to generate based on what loops it sees.

It might be possible to do better if you instead stored what the macro sees in a type, and had a generated function write the code that's necessary, at compile time. This is (I think) what LoopVectorization does (and what ArrayMeta did). The main reason Tullio doesn't is that this is much harder to work on & debug. And I'm not certain it would help here -- or perhaps would need a similar re-write of KernelAbstractions.

Maybe someone else has a better idea, though?

@marius311
Copy link
Author

marius311 commented Aug 3, 2022

Yea I think the general strategy of the macro only stores needed info at parse-time and the kernel itself is built at compile-time is right, but I don't have anything else valuable to add that I'm sure you don't already know. Fwiw, maybe its helpful to others, my current solution to decorate any function that uses @tullio with

macro uses_tullio(funcdef)
    quote
        $(esc(funcdef))
        @init @require CUDA="052768ef-5323-5732-b1bb-66c8b64840ba" begin
            using KernelAbstractions, CUDAKernels, CUDA
            $(esc(funcdef))
        end
    end
end

which will redefine the function once CUDA is loaded. This is better than re-including entire files which leads to unnecessary invalidations / other errors.

@mcabbott
Copy link
Owner

This isn't a crazy idea. IIRC at some point Tullio defined the functions it needed globally, by calling eval. The name of the function was either gensym, or else a hash of its contents. With such a scheme, the CUDA version could be within an @require block.

The reason I switched it to the current behaviour of defining functions only in local scope, within a let block, was roughly to avoid eval and be a more normal macro. The global functions are harder to inspect, and with gensym you'll get a new one every time the macro is re-run; with hash I think I occasionally had confusion when updating the package & getting an old definition.

@mcabbott mcabbott added the GPU label Apr 28, 2023
@aryavorskiy
Copy link

aryavorskiy commented May 9, 2024

I'd like to address this issue one year after :) The macro provided by @marius311 had to be tweaked a little bit. Apparently, we have to use eval inside the @require block, otherwise the @tullio macro in the function will be expanded before user even has a chance to import CUDA.

This is a macro that worked for me:

using Logging
using Requires
macro uses_tullio(funcdef)
    Meta.isexpr(funcdef, :function) || error("Expected function definition")
    quote
        Base.@__doc__ $funcdef
        @init @require CUDA="052768ef-5323-5732-b1bb-66c8b64840ba" begin
            @require KernelAbstractions="63c18a36-062a-441e-b654-da1e3ab1ce7c" begin
                using .CUDA
                using .KernelAbstractions
                CUDA.allowscalar(false)
                eval($(Meta.quot(funcdef)))
                @info "Function definition `" * string($(funcdef.args[1].args[1])) * "` updated"
            end
        end
    end |> esc
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants