New morton class with arithmetic and comparison operators #860

Fletterio · 2025-03-27T21:47:27Z

Description

Adds a new class for 2,3 and 4-dimensional morton codes, with arithmetic and comparison operators

Testing

TODO

TODO list:

Need to make sure all operators work properly before merging

… on HLSL side by specializing , a bunch of morton operators

…or both cpp and hlsl

include/nbl/builtin/hlsl/math/morton.hlsl

devshgraphicsprogramming · 2025-04-16T10:42:06Z

include/nbl/builtin/hlsl/morton.hlsl

+        NBL_CONSTEXPR_STATIC uint16_t Stages = mpl::log2_ceil_v<Bits>;
+        [[unroll]]
+        for (uint16_t i = Stages; i > 0; i--)


this loop will never unroll, static const will never be constexpr as a plain variable in a function, @keptsecret got screwed over by this in the HLSL Path Tracer!

Better unroll by hand, or use mpl::log2_ceil_v<Bits> directly as an initializer to uint16_t i=0

devshgraphicsprogramming · 2025-04-16T10:47:53Z

include/nbl/builtin/hlsl/morton.hlsl

+struct MortonEncoder
+{
+    template<typename decode_t = conditional_t<(Bits > 16), vector<uint32_t, Dim>, vector<uint16_t, Dim> >
+    NBL_FUNC_REQUIRES(concepts::IntVector<decode_t> && 8 * sizeof(typename vector_traits<decode_t>::scalar_type) >= Bits)


its actually >Bits+Dim not >=Bits because you will be left shifting the components, and last will have its MSB at Bits+Dim-1

But this is for the decode_t, which immediately gets transformed to a vector of encode_t which does have enough Bits to hold the interleaved and shifted coordinates

Idk what I was thinking when I wrote this tbh, maybe the check should be the other way around?

8 * sizeof(typename vector_traits<decode_t>::scalar_type) <= max(Bits, 16)

to ensure you don't get an implicit truncation

or just drop that altogether idk

devshgraphicsprogramming · 2025-04-16T10:50:44Z

include/nbl/builtin/hlsl/morton.hlsl

+        encode_t encoded = _static_cast<encode_t>(uint64_t(0));
+        array_get<portable_vector_t<encode_t, Dim>, encode_t> getter;
+        [[unroll]]
+        for (uint16_t i = 0; i < Dim; i++)
+            encoded = encoded | getter(interleaveShifted, i);


I wouldn't count on compiler noticing that |0 is identity and can be optimzed out for emulated_uint64_t, so do

encode_t ecnoded = getter(interleaveShifted,0); [[unroll]] for (uint32_t i=1; i<Dim; i++) encoded = encoded | getter(interleaveShifted,i);

devshgraphicsprogramming · 2025-04-16T10:52:33Z

include/nbl/builtin/hlsl/morton.hlsl

+// ----------------------------------------------------------------- MORTON ENCODER ---------------------------------------------------
+
+template<uint16_t Dim, uint16_t Bits, typename encode_t NBL_PRIMARY_REQUIRES(Dimension<Dim> && Dim * Bits <= 64 && 8 * sizeof(encode_t) == mpl::round_up_to_pot_v<Dim * Bits>)
+struct MortonEncoder


morton::impl::Morton,, too many mortons

devshgraphicsprogramming · 2025-04-16T10:53:48Z

include/nbl/builtin/hlsl/morton.hlsl

+};
+
+// ----------------------------------------------------------------- MORTON DECODER ---------------------------------------------------
+
+template<uint16_t Dim, uint16_t Bits, typename encode_t NBL_PRIMARY_REQUIRES(Dimension<Dim> && Dim * Bits <= 64 && 8 * sizeof(encode_t) == mpl::round_up_to_pot_v<Dim * Bits>)
+struct MortonDecoder
+{


why not merge Decoder and Encoder into a single Transcoder ?

devshgraphicsprogramming · 2025-04-16T10:54:19Z

include/nbl/builtin/hlsl/morton.hlsl

+struct MortonDecoder
+{
+    template<typename decode_t = conditional_t<(Bits > 16), vector<uint32_t, Dim>, vector<uint16_t, Dim> >
+    NBL_FUNC_REQUIRES(concepts::IntVector<decode_t> && 8 * sizeof(typename vector_traits<decode_t>::scalar_type) >= Bits)


same thing with >=Bits needing to be > Bits+Dim

actually same comments as for the interleaveShift function

devshgraphicsprogramming · 2025-04-16T10:57:54Z

include/nbl/builtin/hlsl/morton.hlsl

+            setter(decoded, i, encodedValue);
+        decoded = rightShift(decoded, _static_cast<vector<uint32_t, Dim> >(vector<uint32_t, 4>(0, 1, 2, 3)));


could just write setter(decoded, i, encodedValue>>i);

devshgraphicsprogramming · 2025-04-16T11:10:35Z

include/nbl/builtin/hlsl/morton.hlsl

+NBL_HLSL_MORTON_SPECIALIZE_FIRST_CODING_MASK(2, 0x5555555555555555)        // Groups bits by 1  on, 1  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 1, uint64_t(0x3333333333333333)) // Groups bits by 2  on, 2  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 2, uint64_t(0x0F0F0F0F0F0F0F0F)) // Groups bits by 4  on, 4  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 3, uint64_t(0x00FF00FF00FF00FF)) // Groups bits by 8  on, 8  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 4, uint64_t(0x0000FFFF0000FFFF)) // Groups bits by 16 on, 16 off
+
+NBL_HLSL_MORTON_SPECIALIZE_FIRST_CODING_MASK(3, 0x9249249249249249)        // Groups bits by 1  on, 2  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 1, uint64_t(0x30C30C30C30C30C3)) // Groups bits by 2  on, 4  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 2, uint64_t(0xF00F00F00F00F00F)) // Groups bits by 4  on, 8  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 3, uint64_t(0x00FF0000FF0000FF)) // Groups bits by 8  on, 16 off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 4, uint64_t(0xFFFF00000000FFFF)) // Groups bits by 16 on, 32 off
+
+NBL_HLSL_MORTON_SPECIALIZE_FIRST_CODING_MASK(4, 0x1111111111111111)        // Groups bits by 1  on, 3  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 1, uint64_t(0x0303030303030303)) // Groups bits by 2  on, 6  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 2, uint64_t(0x000F000F000F000F)) // Groups bits by 4  on, 12 off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 3, uint64_t(0x000000FF000000FF)) // Groups bits by 8  on, 24 off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 4, uint64_t(0x000000000000FFFF)) // Groups bits by 16 on, 48 off (unused but here for completion + likely keeps compiler from complaining)


ull sufficies on the mask literals please

devshgraphicsprogramming · 2025-04-16T11:15:21Z

include/nbl/builtin/hlsl/morton.hlsl

+        // If `Bits` is greater than half the bitwidth of the decode type, then we can avoid `&`ing against the last mask since duplicated MSB get truncated
+        NBL_IF_CONSTEXPR(Bits > 4 * sizeof(typename vector_traits<decode_t>::scalar_type))


I think that > should be a >= because if you have 16 bit morton (e.g. dim=2 stored in a uint32_t) getting decoded into a vector of uint16_t you'll have a shift by 8 in the final coding round

But the comparison is against half the bitwidth. For example if decoding to a vector of uint16_t this decision is made based on whether we have more than 8 bits.

For example if you have exactly 8 bits the last shift is by 4. Ignore hex, let's just say the encoded number is ABCDEFGH where each letter is just representing a binary value. Then in the last round you'll have decoded = 0000ABCD0000EFGH, decoded >> 4 = 00000000ABCD0000 (need 16 bits to hold two 8-bit Mortons) and the | between these looks like 0000ABCDABCDEFGH. Here to get the correct value I do need to mask off the highest 8 bits.

Now say you have more than half the bitwidth of the decode type. For example a 9bit Morton ABCDEFGHI being decoded to a uint16_t. Here since we have more than 8 bits, the last round is a shift by 8 so the spacing between bits is also 8, so decoded will look like decoded = 000000000000000A00000000BCDEFGHI (now need 32 bits to hold two 9-bit mortons) and decoded >> 8 = 00000000000000000000000A00000000 so the | between them returns 000000000000000A0000000ABCDEFGHI. Now there's no need to mask, since taking only the lowest 16bits correctly yields 0000000ABCDEFGHI (same holds for any value from 10 to 16bits)

devshgraphicsprogramming · 2025-04-16T11:18:47Z

include/nbl/builtin/hlsl/morton.hlsl

+    template<typename I NBL_FUNC_REQUIRES(Comparable<Signed, Bits, storage_t, true, I>)
+    NBL_CONSTEXPR_STATIC_INLINE_FUNC vector<bool, D> __call(NBL_CONST_REF_ARG(storage_t) value, NBL_CONST_REF_ARG(portable_vector_t<I, D>) rhs)
+    {
+        NBL_CONSTEXPR portable_vector_t<storage_t, D> zeros = _static_cast<portable_vector_t<storage_t, D> >(_static_cast<vector<uint64_t, D> >(vector<uint64_t, 4>(0,0,0,0)));


again, this will create a hidden variable with an initializer

you literally have to use a temporary to compare against

or declare the variable as a plain const, not a static const

wait what does static const do vs using just const

read the discord thread

devshgraphicsprogramming · 2025-04-16T11:20:47Z

include/nbl/builtin/hlsl/morton.hlsl

+        NBL_CONSTEXPR_STATIC portable_vector_t<storage_t, D> InterleaveMasks = _static_cast<portable_vector_t<storage_t, D> >(_static_cast<vector<uint64_t, D> >(vector<uint64_t, 4>(coding_mask_v<D, Bits, 0>, coding_mask_v<D, Bits, 0> << 1, coding_mask_v<D, Bits, 0> << 2, coding_mask_v<D, Bits, 0> << 3)));
+        NBL_CONSTEXPR_STATIC portable_vector_t<storage_t, D> SignMasks = _static_cast<portable_vector_t<storage_t, D> >(_static_cast<vector<uint64_t, D> >(vector<uint64_t, 4>(SignMask<Bits, D>, SignMask<Bits, D> << 1, SignMask<Bits, D> << 2, SignMask<Bits, D> << 3)));


again a plain const is okay, static const is not

also is there a pretier way to write this ? or at least format (ever component new line?)

devshgraphicsprogramming · 2025-04-16T11:25:57Z

include/nbl/builtin/hlsl/morton.hlsl

+        // Obtain a vector of deinterleaved coordinates and flip their sign bits
+        const portable_vector_t<storage_t, D> thisCoord = (InterleaveMasks & value) ^ SignMasks;
+        // rhs already deinterleaved, just have to cast type and flip sign
+        const portable_vector_t<storage_t, D> rhsCoord = _static_cast<portable_vector_t<storage_t, D> >(rhs) ^ SignMasks;


why are you always flipping signs, regardless of Signed ?

Should be done?

devshgraphicsprogramming · 2025-04-16T11:55:03Z

include/nbl/builtin/hlsl/morton.hlsl

+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> equals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return impl::Equals<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
+    }  
+
+    NBL_CONSTEXPR_INLINE_FUNC bool operator!=(NBL_CONST_REF_ARG(this_t) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return value != rhs.value;
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> notEquals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return !equals<BitsAlreadySpread, I>(rhs);
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> less(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return impl::LessThan<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> lessEquals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return impl::LessEquals<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> greater(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return impl::GreaterThan<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> greaterEquals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC


spelling nitpick, those functions are usually called equal without an s at the end
https://registry.khronos.org/OpenGL-Refpages/gl4/html/equal.xhtml

`NBL_CONSTEXPR_FUNC` Adds `OpUndef` to spirv `intrinsics.hlsl` and `cpp_compat.hlsl` Adds an explicit `truncate` function for vectors and emulated vectors Adds a bunch of specializations for vectorial types in `functional.hlsl` Bugfixes and changes to Morton codes, very close to them working properly with emulated ints

devshgraphicsprogramming · 2025-09-12T06:07:48Z

include/nbl/builtin/hlsl/cpp_compat/promote.hlsl

 struct Promote
 {
-    T operator()(U v)
+    NBL_CONSTEXPR_FUNC T operator()(NBL_CONST_REF_ARG(U) v)


vectors should not be taken by NBL_CONST_REF_ARG

devshgraphicsprogramming · 2025-11-12T15:07:43Z

After #947 gets merged, make sure to merge master to this branch and kill the include/nbl/builtin/hlsl/math/morton.hlsl file

…not a single token

Fletterio added 15 commits March 21, 2025 15:48

Initial commit

cdcc9ad

Merge branch 'concepts_fix' into mortons

8e84558

Merge branch 'concepts_fix' into mortons

d33fab5

CHeckpoint before master merge

5fe6c08

Checkpoint before merging new type_traits change

f18b2fa

Merge branch 'master' into mortons

7d86cba

Works, but throws DXC warning

4ebc555

Added concept for valid morton dimensions

55a2ef6

Creation from vector working as intended

f516256

Added some extra macro specifiers, vector truncation with no warnings…

534d81b

… on HLSL side by specializing , a bunch of morton operators

Add safe copile-time vector truncation and some function specifiers f…

6256390

…or both cpp and hlsl

Morton class done!

246cefc

Remove some leftover commented code

1c7f791

Remove leaking macro

5088799

Bugfixes with arithmetic

e25a35c

devshgraphicsprogramming reviewed Mar 29, 2025

View reviewed changes