Skip to content

Conversation

@Fletterio
Copy link
Contributor

Description

Adds a new class for 2,3 and 4-dimensional morton codes, with arithmetic and comparison operators

Testing

TODO

TODO list:

Need to make sure all operators work properly before merging

Comment on lines 102 to 104
NBL_CONSTEXPR_STATIC uint16_t Stages = mpl::log2_ceil_v<Bits>;
[[unroll]]
for (uint16_t i = Stages; i > 0; i--)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop will never unroll, static const will never be constexpr as a plain variable in a function, @keptsecret got screwed over by this in the HLSL Path Tracer!

Better unroll by hand, or use mpl::log2_ceil_v<Bits> directly as an initializer to uint16_t i=0

struct MortonEncoder
{
template<typename decode_t = conditional_t<(Bits > 16), vector<uint32_t, Dim>, vector<uint16_t, Dim> >
NBL_FUNC_REQUIRES(concepts::IntVector<decode_t> && 8 * sizeof(typename vector_traits<decode_t>::scalar_type) >= Bits)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its actually >Bits+Dim not >=Bits because you will be left shifting the components, and last will have its MSB at Bits+Dim-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is for the decode_t, which immediately gets transformed to a vector of encode_t which does have enough Bits to hold the interleaved and shifted coordinates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk what I was thinking when I wrote this tbh, maybe the check should be the other way around?

8 * sizeof(typename vector_traits<decode_t>::scalar_type) <= max(Bits, 16)

to ensure you don't get an implicit truncation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just drop that altogether idk

Comment on lines 125 to 129
encode_t encoded = _static_cast<encode_t>(uint64_t(0));
array_get<portable_vector_t<encode_t, Dim>, encode_t> getter;
[[unroll]]
for (uint16_t i = 0; i < Dim; i++)
encoded = encoded | getter(interleaveShifted, i);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't count on compiler noticing that |0 is identity and can be optimzed out for emulated_uint64_t, so do

encode_t ecnoded = getter(interleaveShifted,0);
[[unroll]]
for (uint32_t i=1; i<Dim; i++)
   encoded = encoded | getter(interleaveShifted,i);

// ----------------------------------------------------------------- MORTON ENCODER ---------------------------------------------------

template<uint16_t Dim, uint16_t Bits, typename encode_t NBL_PRIMARY_REQUIRES(Dimension<Dim> && Dim * Bits <= 64 && 8 * sizeof(encode_t) == mpl::round_up_to_pot_v<Dim * Bits>)
struct MortonEncoder

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

morton::impl::Morton,, too many mortons

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 133 to 139
};

// ----------------------------------------------------------------- MORTON DECODER ---------------------------------------------------

template<uint16_t Dim, uint16_t Bits, typename encode_t NBL_PRIMARY_REQUIRES(Dimension<Dim> && Dim * Bits <= 64 && 8 * sizeof(encode_t) == mpl::round_up_to_pot_v<Dim * Bits>)
struct MortonDecoder
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not merge Decoder and Encoder into a single Transcoder ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

struct MortonDecoder
{
template<typename decode_t = conditional_t<(Bits > 16), vector<uint32_t, Dim>, vector<uint16_t, Dim> >
NBL_FUNC_REQUIRES(concepts::IntVector<decode_t> && 8 * sizeof(typename vector_traits<decode_t>::scalar_type) >= Bits)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing with >=Bits needing to be > Bits+Dim

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually same comments as for the interleaveShift function

Comment on lines 151 to 152
setter(decoded, i, encodedValue);
decoded = rightShift(decoded, _static_cast<vector<uint32_t, Dim> >(vector<uint32_t, 4>(0, 1, 2, 3)));
Copy link
Member

@devshgraphicsprogramming devshgraphicsprogramming Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just write setter(decoded, i, encodedValue>>i);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 60 to 76
NBL_HLSL_MORTON_SPECIALIZE_FIRST_CODING_MASK(2, 0x5555555555555555) // Groups bits by 1 on, 1 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 1, uint64_t(0x3333333333333333)) // Groups bits by 2 on, 2 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 2, uint64_t(0x0F0F0F0F0F0F0F0F)) // Groups bits by 4 on, 4 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 3, uint64_t(0x00FF00FF00FF00FF)) // Groups bits by 8 on, 8 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 4, uint64_t(0x0000FFFF0000FFFF)) // Groups bits by 16 on, 16 off

NBL_HLSL_MORTON_SPECIALIZE_FIRST_CODING_MASK(3, 0x9249249249249249) // Groups bits by 1 on, 2 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 1, uint64_t(0x30C30C30C30C30C3)) // Groups bits by 2 on, 4 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 2, uint64_t(0xF00F00F00F00F00F)) // Groups bits by 4 on, 8 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 3, uint64_t(0x00FF0000FF0000FF)) // Groups bits by 8 on, 16 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 4, uint64_t(0xFFFF00000000FFFF)) // Groups bits by 16 on, 32 off

NBL_HLSL_MORTON_SPECIALIZE_FIRST_CODING_MASK(4, 0x1111111111111111) // Groups bits by 1 on, 3 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 1, uint64_t(0x0303030303030303)) // Groups bits by 2 on, 6 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 2, uint64_t(0x000F000F000F000F)) // Groups bits by 4 on, 12 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 3, uint64_t(0x000000FF000000FF)) // Groups bits by 8 on, 24 off
NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 4, uint64_t(0x000000000000FFFF)) // Groups bits by 16 on, 48 off (unused but here for completion + likely keeps compiler from complaining)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ull sufficies on the mask literals please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +162 to +163
// If `Bits` is greater than half the bitwidth of the decode type, then we can avoid `&`ing against the last mask since duplicated MSB get truncated
NBL_IF_CONSTEXPR(Bits > 4 * sizeof(typename vector_traits<decode_t>::scalar_type))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that > should be a >= because if you have 16 bit morton (e.g. dim=2 stored in a uint32_t) getting decoded into a vector of uint16_t you'll have a shift by 8 in the final coding round

Copy link
Contributor Author

@Fletterio Fletterio Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the comparison is against half the bitwidth. For example if decoding to a vector of uint16_t this decision is made based on whether we have more than 8 bits.

For example if you have exactly 8 bits the last shift is by 4. Ignore hex, let's just say the encoded number is ABCDEFGH where each letter is just representing a binary value. Then in the last round you'll have decoded = 0000ABCD0000EFGH, decoded >> 4 = 00000000ABCD0000 (need 16 bits to hold two 8-bit Mortons) and the | between these looks like 0000ABCDABCDEFGH. Here to get the correct value I do need to mask off the highest 8 bits.

Now say you have more than half the bitwidth of the decode type. For example a 9bit Morton ABCDEFGHI being decoded to a uint16_t. Here since we have more than 8 bits, the last round is a shift by 8 so the spacing between bits is also 8, so decoded will look like decoded = 000000000000000A00000000BCDEFGHI (now need 32 bits to hold two 9-bit mortons) and decoded >> 8 = 00000000000000000000000A00000000 so the | between them returns 000000000000000A0000000ABCDEFGHI. Now there's no need to mask, since taking only the lowest 16bits correctly yields 0000000ABCDEFGHI (same holds for any value from 10 to 16bits)

template<typename I NBL_FUNC_REQUIRES(Comparable<Signed, Bits, storage_t, true, I>)
NBL_CONSTEXPR_STATIC_INLINE_FUNC vector<bool, D> __call(NBL_CONST_REF_ARG(storage_t) value, NBL_CONST_REF_ARG(portable_vector_t<I, D>) rhs)
{
NBL_CONSTEXPR portable_vector_t<storage_t, D> zeros = _static_cast<portable_vector_t<storage_t, D> >(_static_cast<vector<uint64_t, D> >(vector<uint64_t, 4>(0,0,0,0)));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this will create a hidden variable with an initializer

Copy link
Member

@devshgraphicsprogramming devshgraphicsprogramming Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you literally have to use a temporary to compare against

or declare the variable as a plain const, not a static const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait what does static const do vs using just const

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read the discord thread

Comment on lines 218 to 219
NBL_CONSTEXPR_STATIC portable_vector_t<storage_t, D> InterleaveMasks = _static_cast<portable_vector_t<storage_t, D> >(_static_cast<vector<uint64_t, D> >(vector<uint64_t, 4>(coding_mask_v<D, Bits, 0>, coding_mask_v<D, Bits, 0> << 1, coding_mask_v<D, Bits, 0> << 2, coding_mask_v<D, Bits, 0> << 3)));
NBL_CONSTEXPR_STATIC portable_vector_t<storage_t, D> SignMasks = _static_cast<portable_vector_t<storage_t, D> >(_static_cast<vector<uint64_t, D> >(vector<uint64_t, 4>(SignMask<Bits, D>, SignMask<Bits, D> << 1, SignMask<Bits, D> << 2, SignMask<Bits, D> << 3)));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again a plain const is okay, static const is not

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also is there a pretier way to write this ? or at least format (ever component new line?)

Comment on lines 221 to 224
// Obtain a vector of deinterleaved coordinates and flip their sign bits
const portable_vector_t<storage_t, D> thisCoord = (InterleaveMasks & value) ^ SignMasks;
// rhs already deinterleaved, just have to cast type and flip sign
const portable_vector_t<storage_t, D> rhsCoord = _static_cast<portable_vector_t<storage_t, D> >(rhs) ^ SignMasks;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you always flipping signs, regardless of Signed ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done?

Comment on lines 411 to 451
NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> equals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
{
return impl::Equals<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
}

NBL_CONSTEXPR_INLINE_FUNC bool operator!=(NBL_CONST_REF_ARG(this_t) rhs) NBL_CONST_MEMBER_FUNC
{
return value != rhs.value;
}

template<bool BitsAlreadySpread, typename I
NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> notEquals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
{
return !equals<BitsAlreadySpread, I>(rhs);
}

template<bool BitsAlreadySpread, typename I
NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> less(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
{
return impl::LessThan<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
}

template<bool BitsAlreadySpread, typename I
NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> lessEquals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
{
return impl::LessEquals<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
}

template<bool BitsAlreadySpread, typename I
NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> greater(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
{
return impl::GreaterThan<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
}

template<bool BitsAlreadySpread, typename I
NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> greaterEquals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling nitpick, those functions are usually called equal without an s at the end
https://registry.khronos.org/OpenGL-Refpages/gl4/html/equal.xhtml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

`NBL_CONSTEXPR_FUNC`
Adds `OpUndef` to spirv `intrinsics.hlsl` and `cpp_compat.hlsl`
Adds an explicit `truncate` function for vectors and emulated vectors
Adds a bunch of specializations for vectorial types in `functional.hlsl`
Bugfixes and changes to Morton codes, very close to them working
properly with emulated ints
struct Promote
{
T operator()(U v)
NBL_CONSTEXPR_FUNC T operator()(NBL_CONST_REF_ARG(U) v)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vectors should not be taken by NBL_CONST_REF_ARG

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@devshgraphicsprogramming
Copy link
Member

After #947 gets merged, make sure to merge master to this branch and kill the include/nbl/builtin/hlsl/math/morton.hlsl file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants