GH-123894: Simplify ascii decode using unaligned loads #123895

rhpvorderman · 2024-09-10T09:38:36Z

When working on #120212 I noticed that the code for ascii_decode could be worded much simpler.

The current code prevents unaligned loads, but on x86-64 unaligned loads are not necessarily slower. By using memcpy the compiler is allowed to force aligned loads on platforms that require it, while allowing unaligned loads on platforms that support it.
It simplifies the code and allows for easier manual unrolling if that is desired at a later point.

EDIT: I feel that this does not warrant a NEWS entry. So I haven't written one.

Issue: Simplifying ascii-decode #123894

Add comment

picnixz

While it's easier to read, I'm wondering whether the memcpy() calls could induce some overhead. We are copying many small parts and check for the ASCII_CHAR_MASK mask instead of one single large memcpy() and some pointer arithmetic behind.

picnixz · 2024-09-10T11:17:19Z

Objects/unicodeobject.c

-                break;
-            *q++ = *p++;
+    Py_UCS1 *q = dest;
+    const char *size_t_end  = end - (SIZEOF_SIZE_T - 1);


Not sure if we already discussed it but would _Py_ALIGN_DOWN(end, SIZEOF_SIZE_T) work?

Probably. Is using a macro here important?

picnixz · 2024-09-10T11:18:36Z

Objects/unicodeobject.c

+        const size_t *restrict _p = (const size_t *)p;
+        size_t *restrict _q = (size_t *)q;
+        size_t value;
+        memcpy(&value, _p, SIZEOF_SIZE_T);


How are performances affected by those multiple memcpy() calls?

Memcpy isn't called. Memcpy with a given size gets optimized to a single load.

Q.E.D: https://godbolt.org/z/a9xdjfoq4

Ah I see the ASM yes. I'd suggest reformulating the comment by saying that "When supported, compilers optimize memcpy() into a single LOAD instruction if the number of bytes to copy is a literal value".

~~I confirmed it on GCC, but is it the case for MSVC & clang by the way? (NVM, I posted before seeing your reply)~~

ZeroIntensity · 2024-09-10T14:19:29Z

I wasn't aware of any precedent for using size_t in the codebase. I do see that it was there beforehand, but maybe that was a mistake -- we should be using Py_ssize_t, shouldn't we?

rhpvorderman · 2024-09-11T07:04:20Z

This can be closed as the CPython maintainers prefer aligned loads: #120212 (comment)

Simplify ascii decode using unaligned loads

a8a1a93

Add comment

bedevere-app bot added the awaiting review label Sep 10, 2024

bedevere-app bot mentioned this pull request Sep 10, 2024

Simplifying ascii-decode #123894

Closed

picnixz added the skip news label Sep 10, 2024

picnixz reviewed Sep 10, 2024

View reviewed changes

rhpvorderman mentioned this pull request Sep 11, 2024

gh-120196: Faster ascii_decode and find_max_char implementations #120212

Closed

rhpvorderman closed this Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-123894: Simplify ascii decode using unaligned loads #123895

GH-123894: Simplify ascii decode using unaligned loads #123895

rhpvorderman commented Sep 10, 2024 •

edited

Loading

picnixz left a comment

picnixz Sep 10, 2024

rhpvorderman Sep 10, 2024

picnixz Sep 10, 2024

rhpvorderman Sep 10, 2024

rhpvorderman Sep 10, 2024

picnixz Sep 10, 2024 •

edited

Loading

ZeroIntensity commented Sep 10, 2024

rhpvorderman commented Sep 11, 2024

GH-123894: Simplify ascii decode using unaligned loads #123895

GH-123894: Simplify ascii decode using unaligned loads #123895

Conversation

rhpvorderman commented Sep 10, 2024 • edited Loading

picnixz left a comment

Choose a reason for hiding this comment

picnixz Sep 10, 2024

Choose a reason for hiding this comment

rhpvorderman Sep 10, 2024

Choose a reason for hiding this comment

picnixz Sep 10, 2024

Choose a reason for hiding this comment

rhpvorderman Sep 10, 2024

Choose a reason for hiding this comment

rhpvorderman Sep 10, 2024

Choose a reason for hiding this comment

picnixz Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

ZeroIntensity commented Sep 10, 2024

rhpvorderman commented Sep 11, 2024

rhpvorderman commented Sep 10, 2024 •

edited

Loading

picnixz Sep 10, 2024 •

edited

Loading