`The single pass scan game`

If there is any one most salient technological aspect of the Burst world it's the single-pass-scan and all the various attendant challenges associated with that getting that right. The single-pass-scan is the critical inner-loop of Burst Behavioral Analysis and the way we approach that game is where many of its performance wins lie.

`the game`

Burst analytics are significant calculations across high-cardinality sets of behavioral entities where each of those entities is an object-tree with generally high-cardinality collections of behavioral 'events'. All of that data needs to be filtered/measured/categorized as fast and as efficiently as technologically practical.

`the challenges`

The bad news is that Burst needs to provide high transaction-rate, low-latency calculations day in and day out, on very large entities sets where each entity can be quite large and the basic algorithms so efficient as to be limited by the simply reading and writing of memory. Very simple changes to how memory is read or how an instruction is turned into byte code can make dramatic differences.

This means:

We need to strenuously limit the number of VM objects created and carefully manage non VM memory as well.
We need to carefully optimize how multiple CPUs and cores and their cache lines interact with the various cache levels of the memory architecture.
We need to be sure we are using best practices with our multicore thread usage especially as regards synchronization.

`the opportunities`

The good news come in two forms:

There is no need to calculate direct inter-entity relationships i.e. most of the calculus ends up with a high degree of locality within the entity object-tree. This allows us to divide up our processing across multiple cores and multiple nodes.
processing is inherently ordered by causality/time i.e. there is a high degree of directionality in our algorithm. This allows us to take advantage of modern hardware's innate forward moving path optimization.

`how to win`

All this translates to a finite number of design practices:

rigorously (no exceptions) translate all analytic processing of the entity object-model into a single pass depth first traversal (easier said than done)
don't create any VM objects during the scan. Evens small amounts of GC are death at Burst operation rates.
on a given worker node, batch entities into contiguous memory 'regions' and bind all operations to a single thread/core.
place all significant data structures into off-heap memory 'parts'
manage parts using lock-free, off-heap queues (thanks JCL)
always move forward in a byte order sense when accessing large chunks of off heap memory (such as the Brio Blob)
carefully divide threads into finite sized 'cpu bound' and cached 'async request' pools.
be mindful of concurrency levels and transaction rates on queues
have the OS do what it is best at e.g. mmap files
generate the final analysis algorithm into reusable maximally 'efficient' bytecode and allow that byte code to JIT optimize.
highly specialized data structures such as Felt Cubes and Routes

`the winner`

The single-pass-scan was an enormous architectural bet made in the very early stages of the Burst architecture. It was not even always clear that it could be done i.e. that all the questions we would want to ask could be answered that way. Fortunately, it did in fact turn out to be a successful bet. This decision permeates throughout the architecture. However, take an especally close look at these relevant modules for a deeper dive:

Brio -- single pass scan encoded binary data format
Tesla -- thread and memory management
EQL -- declarative language with single pass scan semantic output
Felt -- an execution semantic object model for single pass scans
Fabric -- multi-node / multi-core distributed processing
Zap -- high performance off heap data structures for single pass calculus

------ HOME --------------------------------------------

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

single_pass_scans.md

single_pass_scans.md

`The single pass scan game`

`the game`

`the challenges`

`the opportunities`

`how to win`

`the winner`

Files

single_pass_scans.md

Latest commit

History

single_pass_scans.md

File metadata and controls

The single pass scan game

the game

the challenges

the opportunities

how to win

the winner

`The single pass scan game`

`the game`

`the challenges`

`the opportunities`

`how to win`

`the winner`