Skip to content

bfloat16: byteswap does not swap bytes #308

@stliibm

Description

@stliibm

The bfloat16 datatype does not byteswap. Here is my reproducer:

import ml_dtypes
import numpy as np
import sys

print ('sys.version: {}'.format (sys.version))
print ('sys.byteorder: {}'.format (sys.byteorder))
print ('ml_dtypes.__version__: {}'.format (ml_dtypes.__version__))
print ('np.__version__: {}'.format (np.__version__))
print ('----------------')

print ('Input bytes:')
raw_data=bytes.fromhex('f5 3e f6 3e 00 3f 52 3f f1 3e 51 3f 58 3e 39 3f c0 7f 80 7f 80 7f 80 ff')
print (' '.join('{:02x}'.format(x) for x in raw_data))
print ('----------------')

np_dtype = np.dtype(ml_dtypes.bfloat16)
np_arr = np.frombuffer(raw_data, dtype=np_dtype)
np_arr_swapped = np_arr.byteswap()
np_arr_uint16 = np.frombuffer(raw_data, dtype=np.uint16)
np_arr_uint16_swapped = np_arr_uint16.byteswap()

print ('np_arr:')
print (np_arr)
print ('np_arr_swapped:')
print (np_arr_swapped)
print ('----------------')

print ('np_arr.tobytes():')
np_arr_bytes = np_arr.tobytes()
print (' '.join('{:02x}'.format(x) for x in np_arr_bytes))

print ('np_arr_swapped.tobytes():')
np_arr_swapped_bytes = np_arr_swapped.tobytes()
print (' '.join('{:02x}'.format(x) for x in np_arr_swapped_bytes))

print ('np_arr_uint16.tobytes():')
print (' '.join('{:02x}'.format(x) for x in np_arr_uint16.tobytes()))

print ('np_arr_uint16_swapped.tobytes():')
print (' '.join('{:02x}'.format(x) for x in np_arr_uint16_swapped.tobytes()))
print ('----------------')

i = 0
errors = 0
while i < len(raw_data):
    if i % 2 == 0:
        j = i + 1
    else:
        j = i - 1
    if np_arr_swapped_bytes[i] == np_arr_bytes[j]:
        msg = 'OK'
    else:
        msg = 'ERROR'
        errors += 1
    print ('np_arr_swapped_bytes[{}]={:02x} == np_arr_bytes[{}]={:02x} => {}'
           .format (i, np_arr_swapped_bytes[i], j, np_arr_bytes[j], msg))
    i += 1

if errors == 0:
    print ('=> SUCCESS')
    exit (0)
else:
    print ('=> ERRORS')
    exit (1)

It fails on x86_64(little-endian) and s390x(big-endian; of course with different sys.byteorder) with this:

sys.version: 3.13.5 (main, Jun 12 2025, 00:00:00) [GCC 15.1.1 20250521 (Red Hat 15.1.1-2)]
sys.byteorder: little
ml_dtypes.__version__: 0.5.3
np.__version__: 2.3.2
----------------
Input bytes:
f5 3e f6 3e 00 3f 52 3f f1 3e 51 3f 58 3e 39 3f c0 7f 80 7f 80 7f 80 ff
----------------
np_arr:
[0.478516 0.480469 0.5 0.820312 0.470703 0.816406 0.210938 0.722656 nan
 inf inf -inf]
np_arr_swapped:
[0.478516 0.480469 0.5 0.820312 0.470703 0.816406 0.210938 0.722656 nan
 inf inf -inf]
----------------
np_arr.tobytes():
f5 3e f6 3e 00 3f 52 3f f1 3e 51 3f 58 3e 39 3f c0 7f 80 7f 80 7f 80 ff
np_arr_swapped.tobytes():
f5 3e f6 3e 00 3f 52 3f f1 3e 51 3f 58 3e 39 3f c0 7f 80 7f 80 7f 80 ff
np_arr_uint16.tobytes():
f5 3e f6 3e 00 3f 52 3f f1 3e 51 3f 58 3e 39 3f c0 7f 80 7f 80 7f 80 ff
np_arr_uint16_swapped.tobytes():
3e f5 3e f6 3f 00 3f 52 3e f1 3f 51 3e 58 3f 39 7f c0 7f 80 7f 80 ff 80
----------------
np_arr_swapped_bytes[0]=f5 == np_arr_bytes[1]=3e => ERROR
np_arr_swapped_bytes[1]=3e == np_arr_bytes[0]=f5 => ERROR
np_arr_swapped_bytes[2]=f6 == np_arr_bytes[3]=3e => ERROR
np_arr_swapped_bytes[3]=3e == np_arr_bytes[2]=f6 => ERROR
np_arr_swapped_bytes[4]=00 == np_arr_bytes[5]=3f => ERROR
np_arr_swapped_bytes[5]=3f == np_arr_bytes[4]=00 => ERROR
np_arr_swapped_bytes[6]=52 == np_arr_bytes[7]=3f => ERROR
np_arr_swapped_bytes[7]=3f == np_arr_bytes[6]=52 => ERROR
np_arr_swapped_bytes[8]=f1 == np_arr_bytes[9]=3e => ERROR
np_arr_swapped_bytes[9]=3e == np_arr_bytes[8]=f1 => ERROR
np_arr_swapped_bytes[10]=51 == np_arr_bytes[11]=3f => ERROR
np_arr_swapped_bytes[11]=3f == np_arr_bytes[10]=51 => ERROR
np_arr_swapped_bytes[12]=58 == np_arr_bytes[13]=3e => ERROR
np_arr_swapped_bytes[13]=3e == np_arr_bytes[12]=58 => ERROR
np_arr_swapped_bytes[14]=39 == np_arr_bytes[15]=3f => ERROR
np_arr_swapped_bytes[15]=3f == np_arr_bytes[14]=39 => ERROR
np_arr_swapped_bytes[16]=c0 == np_arr_bytes[17]=7f => ERROR
np_arr_swapped_bytes[17]=7f == np_arr_bytes[16]=c0 => ERROR
np_arr_swapped_bytes[18]=80 == np_arr_bytes[19]=7f => ERROR
np_arr_swapped_bytes[19]=7f == np_arr_bytes[18]=80 => ERROR
np_arr_swapped_bytes[20]=80 == np_arr_bytes[21]=7f => ERROR
np_arr_swapped_bytes[21]=7f == np_arr_bytes[20]=80 => ERROR
np_arr_swapped_bytes[22]=80 == np_arr_bytes[23]=ff => ERROR
np_arr_swapped_bytes[23]=ff == np_arr_bytes[22]=80 => ERROR
=> ERRORS

Debugging with gdb showed, that we end up in ml_dtypes/_src/custom_float.h:NPyCustomFloat_CopySwapN() where swap=1 and src is NULL. As src is NULL, it just returns without swapping instead of swapping dst directly. From just looking at the sources, the same applies for NPyCustomFloat_CopySwap().
The same would in theory also apply for ml_dtypes/_src/intn_numpy.h, but I think there are no >=2byte datatypes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions