"Burst data is optimized at all levels; from the distributed node partitioning/slicing to the region based multicore parallelism and its byte order binary encoded as an object-tree depth-first-traversal"
Burst is not a database. It does not store data authoritatively, though it does intelligently cache data locally in memory as well as on disk. It imports (samples) data on-demand via a data loading pipeline, from your external data source or feed, via its built in sample store subsystem that is customized for your data via a bespoke sample source implementation that you write, or an existing one that you modify and/or configure to meet your needs.
The exact topology/implementation of your on-demand data loading pipeline is highly dependent on the type of data and the type of datasource or datastream you are pulling from. The Burst samplestore architecture and its specialized parallel data protocols provides support for a wide range of systems from a single SQL server database to an HBASE installations with many hundreds of data nodes.
In order for Burst to understand and process your data, your entity data model needs to be defined in a suitable Brio Schema. This schema needs to be available anywhere your data is being used. This schema is written in a specialized source language and placed as a resource into a schema/presser provider framework discoverable on both the samplesource and burst classpath.
To go along with your schema, you need to implement a Brio presser using Burst libraries. This presser, driven by the schema, is a Burst library which efficiently encodes (low GC churn, multicore) generic entity trees, in some external data model, into the specialized internal Brio Blob binary format. This internal form is used all the way through the rest of the pipeline including the network (as compressed), disk, and in-memory.
Burst provides libraries that support the writing of schema and pressers and making them available on your classpath. This provider framework supports dynamic loading of schema and pressers placed into the classpath. The rest of the samplesource implementation is up to you. You need only include a few Burst libraries.
When talking about Burst data one most start with the data about that data. In order for Burst to load your data in order to analyze it, you must give Burst a minimum of information so it can load and manage it.
Burst has a subsystem called the Catalog that manages various salient metadata types using an external SQL database. The Domain and the View are the two managed metadata types associated with the management of data.
Domains are a generalized specification for a dataset e.g. a set of entities that is all end users of a specific mobile application. Burst uses need to create a Domain metadata object in the Catalog for each type of dataset they want to do behavioral analysis on.
Views are a specification for a particular 'view' of a Domain e.g. the last 30 days of all european users of that mobile application. Burst uses need to create a View metadata object in the Domain for each specific dataset they want to do behavioral analysis on.
The samplestore
sdfg
sdfg
sdfg
sdfg
sdfg
The speed and resiliency of data import in Burst is of course highly dependent on the external data store. What can be said is that
------ HOME --------------------------------------------