Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing BEAM with SIGBUS without use of unsafe code #294

Open
elbow-jason opened this issue Dec 30, 2019 · 7 comments
Open

Crashing BEAM with SIGBUS without use of unsafe code #294

elbow-jason opened this issue Dec 30, 2019 · 7 comments

Comments

@elbow-jason
Copy link
Contributor

I implemented Decoder on a type that crashes the VM with a SIGBUS.

I made a minimal reproducible example in this repo: https://github.com/elbow-jason/why_sig_bus

@elbow-jason
Copy link
Contributor Author

After studying this for a little bit. I think I've found the crux of the problem.

Inside the decode function of Args there is:

if let Ok(Args::One(arg)) = term.decode() {
    return Ok(Args::One(arg));
}

Which is self-referential; It uses a decoded Args::One to decode a term into an Args.

Changing it to:

if let Ok((arg,)) = term.decode() {
    return Ok(Args::One(arg));
}

Does not cause a SIGBUS.

@elbow-jason
Copy link
Contributor Author

The question now is: How do we keep others from having this issue?

@evnu
Copy link
Member

evnu commented Jan 6, 2020

@elbow-jason Thanks for finding this! Some additional info when running this with a debug build of ERTS and valgrind (rustc 1.39.0 (4560ea788 2019-11-04), Elixir 1.9.4, Erlang/OTP 22 [erts-10.6]) :

/tmp/why_sig_bus(master ✔) "$ERL_TOP/bin/cerl" -valgrind --track-origins=yes \
  -pa $(find /usr/lib -name "ebin" -type d | grep elixir) \
  -elixir ansi_enabled true -noshell -s elixir start_cli \
  -- -extra /usr/bin/mix run -e "WhySigBus.decode_args(1)"
==109032== Memcheck, a memory error detector
==109032== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==109032== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==109032== Command: /home/mo/tools/erlang/otp/bin/x86_64-unknown-linux-gnu/beam.valgrind.smp -S 4:4 -SDcpu 4:4 -- -root /home/mo/tools/erlang/otp -progname /home/mo/tools/erlang/otp/bin/cerl\ -valgrind -- -home /home/mo -- -kernel shell_history enabled -- --track-origins=yes -pa /usr/lib/elixir/lib/ex_unit/ebin /usr/lib/elixir/lib/elixir/ebin /usr/lib/elixir/lib/iex/ebin /usr/lib/elixir/lib/eex/ebin /usr/lib/elixir/lib/logger/ebin /usr/lib/elixir/lib/mix/ebin -elixir ansi_enabled true -noshell -s elixir start_cli -- -- -extra /usr/bin/mix run -e WhySigBus.decode_args(1)
==109032==
==109032== Warning: set address range perms: large range [0x5058000, 0x45058000) (noaccess)
Compiling NIF crate :whysigbus_native (native/whysigbus_native)...
    Finished release [optimized] target(s) in 0.01s
==109032==
==109032== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==109032==  Bad permissions for mapped region at address 0x46D9AFF8
==109032==    at 0x49D26300: rustler::types::tuple::<impl rustler::types::Decoder for ()>::decode (in /tmp/why_sig_bus/priv/native/libwhysigbus_native.so)
==109032==
==109032== Process terminating with default action of signal 11 (SIGSEGV)
==109032==  Bad permissions for mapped region at address 0x46D9AFF0
==109032==    at 0x482E120: _vgnU_freeres (vg_preloaded.c:59)
==109032==
==109032== HEAP SUMMARY:
==109032==     in use at exit: 21,582,795 bytes in 29,565 blocks
==109032==   total heap usage: 212,546 allocs, 182,981 frees, 166,718,244 bytes allocated
==109032==
==109032== LEAK SUMMARY:
==109032==    definitely lost: 0 bytes in 0 blocks
==109032==    indirectly lost: 0 bytes in 0 blocks
==109032==      possibly lost: 875,424 bytes in 4,194 blocks
==109032==    still reachable: 20,560,426 bytes in 25,342 blocks
==109032==         suppressed: 146,945 bytes in 29 blocks
==109032== Rerun with --leak-check=full to see details of leaked memory
==109032==
==109032== For lists of detected and suppressed errors, rerun with: -s
==109032== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[1]    109032 segmentation fault (core dumped)  "$ERL_TOP/bin/cerl" -valgrind --track-origins=yes -pa  -elixir ansi_enabled

@evnu
Copy link
Member

evnu commented Jan 6, 2020

GDB for the coredump indeed indicates an endless loop:

gdb) bt
#0  0x00007f69fbf454ab in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#1  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#2  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#3  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#4  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#5  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#6  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#7  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so

...

#9388 0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#9389 0x00007f69fbf463c2 in std::panicking::try::do_call () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#9390 0x00007f69fbf8437a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:80
#9391 0x00007f69fbf45a30 in <whysigbus_native::decode_args as rustler::nif::Nif>::RAW_FUNC::nif_func () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#9392 0x0000557cd69bae34 in process_main ()
#9393 0x0000557cd69c66ec in ?? ()
#9394 0x0000557cd6c31bd0 in ?? ()
#9395 0x00007f6a43a604cf in start_thread () from /usr/lib/libpthread.so.0
#9396 0x00007f6a4398f2d3 in clone () from /usr/lib/libc.so.6

@evnu
Copy link
Member

evnu commented Jan 6, 2020

For simple cases, Rust will warn on endless recursion. Apparently, it cannot detect the recursion in this case here. See this playground example for a simple case which is detectable.

EDIT: When running the playground example, the overflow is detected properly:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow

I assume that we do not see such a message as the NIF is not actually handling setting up the stack itself. See here for a discussion regarding this. The calling environment is responsible to set up the running thread and its stack.

@evnu
Copy link
Member

evnu commented Jan 6, 2020

With some digging, I found the place where ERTS sets a guard page for spawned threads (ethread.c):

#ifdef ETHR_STACK_GUARD_SIZE
    (void) pthread_attr_setguardsize(&attr, ETHR_STACK_GUARD_SIZE);
#endif

In your example, we should run into this guard. I believe that we cannot do anything further, as we cannot know at compile time if a recursive call is finite. But it would probably be helpful to at least have some indication of this in the README. Maybe a "Pitfalls" section would be good?

@evnu evnu added the needs docs label May 5, 2020
@hansihe
Copy link
Member

hansihe commented Sep 8, 2021

This could fit well as an example of safety caveats at https://rustler-web.onrender.com/docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants