Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Thread Collider: Race detection / Formal Verification of Nim concurrent programs #127

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented May 9, 2020

For now, just a shell to solve my Heisenbugs deadlock/livelock woes and random failures in CI.

The goal, create a model checker that exhaustively checks all possible states (i.e. interleavings of threads and load/store/lock/condvar/futex) directly in Nim.

RFC:

Research:

The idea is to

  • use vector clocks to model causal relations: "happens-before" and "concurrent", https://en.wikipedia.org/wiki/Vector_clock
    image
    The events in the middle on thread B and C can happen in any order.
    A data structure can be verified to handle tp hold its invariant, or the result to be correct with simple asserts, if Thread Collider can generate all possible thread interleavings.
  • use fibers to simulating threading in a way we can freely control and suspend/resume/backtrack.
  • Overload Nim threads, Atomics, Lock, Condvar with shadow threads, shadow atomics, ... which add suspend point for fibers and metadata to carry the model checking
  • Implement DPOR algorithm that supports the C11/C++11 memory model, in particular both relaxed memory model, sequentially consistent and fences (DPOR = Dynamic Partial Order Reduction, a depth-first search algorithm that prunes already visited/redundant paths)

Difference with ThreadSanitizer:

ThreadSanitizer does only one execution of the program. A model checker proves that there is no race by bruteforcing all possible execution paths.

Status:

The vector clock data structure is implemented.

However, architecting the library seems quite complex from just the papers, everything seems to impact each other in a non-standard way and I need more mastery on the C++11 memory model.

For example to handle compiler reordering of relaxed atomic statements

 var x {.noInit.}, y {.noInit.}: Atomic[int32]
 x.store(0, moRelaxed)
 y.store(0, moRelaxed)

 proc threadA() =
   let r1 = y.load(moRelaxed)
   x.store(1, moRelaxed)
   echo "r1 = ", r1

 proc threadA() =
   let r2 = x.load(moRelaxed)
   y.store(1, moRelaxed)
   echo "r2 = ", r2

 spawn threadA()
 spawn threadB()

It is possible to have r1 = 1 and r2 = 1 for this program,
contrary to first intuition.

I.e. we can see that before setting one of x or y to 1
a load at least must have happened, and so those load should be 0.

However, under a relaxed memory model, given that those load-store
are on different variables, the compiler can reorder the program to:

 var x {.noInit.}, y {.noInit.}: Atomic[int32]
 x.store(0, moRelaxed)
 y.store(0, moRelaxed)

 proc threadA() =
   x.store(1, moRelaxed)
   let r1 = y.load(moRelaxed)
   echo "r1 = ", r1

 proc threadA() =
   y.store(1, moRelaxed)
   let r2 = x.load(moRelaxed)
   echo "r2 = ", r2

 spawn threadA()
 spawn threadB()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant