Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in memory profiler (when memalloc_add_event calls traceback_free) #11751

Open
oranav opened this issue Dec 17, 2024 · 3 comments
Open
Assignees
Labels
Profiling Continous Profling

Comments

@oranav
Copy link
Contributor

oranav commented Dec 17, 2024

We're hitting SIGSEGVs every now and then with the memory profiler.

Python version is 3.11.11. ddtrace is 2.17.3. We're using the amd64 architecture.

I've extracted a native stack traceback from the coredump:

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007fb47c227f1f in __pthread_kill_internal (signo=11, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007fb47c1d8fb2 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#3  0x00007fb47a2f222f in ?? () from /app/venv/lib/python3.11/site-packages/ddtrace/internal/datadog/profiling/ddup/../libdd_wrapper-glibc-x86_64.so
#4  <signal handler called>
#5  0x00007fb47c523964 in _Py_Dealloc (op=<unknown at remote 0x7fb47ac9c230>) at Objects/object.c:2390
#6  0x00007fb4781a86e5 in traceback_free () from /app/venv/lib/python3.11/site-packages/ddtrace/profiling/collector/_memalloc.cpython-311-x86_64-linux-gnu.so
#7  0x00007fb4781a7a40 in memalloc_add_event.part () from /app/venv/lib/python3.11/site-packages/ddtrace/profiling/collector/_memalloc.cpython-311-x86_64-linux-gnu.so
#8  0x00007fb4781a7cbe in memalloc_malloc () from /app/venv/lib/python3.11/site-packages/ddtrace/profiling/collector/_memalloc.cpython-311-x86_64-linux-gnu.so
#9  0x00007fb47c545be6 in PyObject_Malloc (size=44) at Objects/obmalloc.c:712
#10 _PyBytes_FromSize (size=11, use_calloc=0) at Objects/bytesobject.c:103
#11 0x00007fb47c583a47 in PyBytes_FromStringAndSize (size=11, str=0x7fb3efd2d820 "<REDACTED>") at Objects/bytesobject.c:136

It seems to me that this call access some invalid memory.

I believe #11460 might fix it; a possible explanation is that two threads decide to ditch the same traceback, in case reservoir sampling yielded the same index in both threads, then we might call traceback_free twice on the same pointer (as long as it isn't guarded by a lock).
I'm not sure if that's the case though, but it's a possible explanation.

@sanchda
Copy link
Contributor

sanchda commented Dec 17, 2024

👋 Thank you for the report, @oranav. #11460 is indeed the fix for this.

@taegyunkim taegyunkim added the Profiling Continous Profling label Dec 17, 2024
@sanchda sanchda self-assigned this Dec 17, 2024
@sanchda
Copy link
Contributor

sanchda commented Dec 20, 2024

FYI, this will be released (later today, I hope) in 2.18.1. I'm also attempting to back-port to the 2.17 and 2.16 lines (🤞). It'll be part of mainline starting in the 2.19.0 release.

@sanchda
Copy link
Contributor

sanchda commented Dec 20, 2024

Confirming that 2.18.1 shipped. Would love to hear some folks weigh in on whether or not it solved this problem for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Profiling Continous Profling
Projects
None yet
Development

No branches or pull requests

3 participants