Summary
A compiled and serialized regular expression may contain two bytes of uninitialized data from previous memory allocations.
Details
For certain regular expressions and compile options, two bytes of the compiled structure that are left uninitialized may be "exposed" by pcre2_serialized_encode(). The following code in pcre2_compile_class.c gives special attention to these two bytes for debug and valgrind build configurations:
#if defined PCRE2_DEBUG || defined SUPPORT_VALGRIND
if ((char_lists_size & 0x2) != 0)
{
/* In debug the unused 16 bit value is set
to a fixed value and marked unused. */
((uint16_t*)data)[-1] = 0x5555;
#ifdef SUPPORT_VALGRIND
VALGRIND_MAKE_MEM_NOACCESS(data - 2, 2);
#endif
}
#endif
But for normal builds, the two bytes are left uninitialized with potential content from previous memory allocations and may be copied by pcre2_serialize_encode(). If the initialization to 0x5555 is activated, the problem seems to go away.
PoC
Character classes with Unicode code points seems to be necessary to provoke the bug. The code below uses "[\H]" with compile option PCRE2_UTF. Others work as well such as "[z-\x{100}]" with PCRE2_CASELESS | PCRE2_UTF.
The following C program run with valgrind will detect undefined data returned by pcre2_serialize_encode_8():
/* pcre2_bug.c */
#define PCRE2_CODE_UNIT_WIDTH 8
#include <stdio.h>
#include <pcre2.h>
#include <valgrind/memcheck.h>
#include <assert.h>
int main()
{
char regex[] = "[\\H]";
int error;
PCRE2_SIZE erroroffset;
const pcre2_code *code;
uint8_t* bytes;
PCRE2_SIZE size;
int32_t encode_res;
uint8_t *undef;
code = pcre2_compile_8(regex, sizeof(regex),
PCRE2_UTF,
&error, &erroroffset,
NULL);
assert(code);
encode_res = pcre2_serialize_encode_8(&code, 1, &bytes, &size, NULL);
assert(encode_res);
undef = (uint8_t*) VALGRIND_CHECK_MEM_IS_DEFINED(bytes, size);
if (undef) {
printf("Found undefined byte at offset %d\n", (int)(undef - bytes));
}
return 0;
}
It does not matter if pcre2 is configured with --enable-valgrind or not.
Compile and link against PCRE2
> gcc pcre2_bug.c -I$PCRE2_REPO/src $PCRE2_REPO/.libs/libpcre2-8.a
and run with valgrind
> valgrind ./a.out
==395820== Memcheck, a memory error detector
==395820== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==395820== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==395820== Command: ./a.out
==395820==
==395820== Uninitialised byte(s) found during client check request
==395820== at 0x10963C: main (in a.out)
==395820== Address 0x4a9d710 is 1,280 bytes inside a block of size 1,345 alloc'd
==395820== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==395820== by 0x11D79B: pcre2_serialize_encode_8 (in a.out)
==395820== by 0x1095B6: main (in a.out)
==395820==
Found undefined byte at offset 1256
Impact
Two symptoms come to mind:
- The same regular expressions may produce "randomly" different serialized format. This does not seem to be a security related problem and depending on API guarantee ambitions, maybe not even a PCRE2 problem at all.
- Sensitive information may be disclosed. However
-- only two bytes at a time
-- probably very hard to control what two bytes to obtain from earlier allocations
-- serialized bytes must be exposed to untrusted actor which by itself seems unsafe as the end consumer of the data pcre2_serialize_decode() is not meant to accept untrusted data.
Reported by Erlang/OTP
Even though Erlang/OTP version 28.1 offer an Erlang API toward pcre2_serialize_encode(), we do not consider this a serious vulnerability for our users and will probably not report this as a CVE on Erlang/OTP. But we thought it would be proper to report to PCRE2 for assessment before we publish a fix.
Summary
A compiled and serialized regular expression may contain two bytes of uninitialized data from previous memory allocations.
Details
For certain regular expressions and compile options, two bytes of the compiled structure that are left uninitialized may be "exposed" by
pcre2_serialized_encode(). The following code inpcre2_compile_class.cgives special attention to these two bytes for debug and valgrind build configurations:But for normal builds, the two bytes are left uninitialized with potential content from previous memory allocations and may be copied by
pcre2_serialize_encode(). If the initialization to 0x5555 is activated, the problem seems to go away.PoC
Character classes with Unicode code points seems to be necessary to provoke the bug. The code below uses
"[\H]"with compile optionPCRE2_UTF. Others work as well such as"[z-\x{100}]"withPCRE2_CASELESS | PCRE2_UTF.The following C program run with valgrind will detect undefined data returned by
pcre2_serialize_encode_8():It does not matter if pcre2 is configured with
--enable-valgrindor not.Compile and link against PCRE2
and run with valgrind
Impact
Two symptoms come to mind:
-- only two bytes at a time
-- probably very hard to control what two bytes to obtain from earlier allocations
-- serialized bytes must be exposed to untrusted actor which by itself seems unsafe as the end consumer of the data
pcre2_serialize_decode()is not meant to accept untrusted data.Reported by Erlang/OTP
Even though Erlang/OTP version 28.1 offer an Erlang API toward
pcre2_serialize_encode(), we do not consider this a serious vulnerability for our users and will probably not report this as a CVE on Erlang/OTP. But we thought it would be proper to report to PCRE2 for assessment before we publish a fix.