This gem provides SCD (Slowly Changing Dimension) behaviour for ActiveRecord models. The kind of SCD implemented is “Type 2” acording to en.wikipedia.org/wiki/Slowly_changing_dimension
A model which includes this extension keeps the history of entities that change ‘slowly’ (here slowly meaning not more often than once a day). Any such entity is identified by an identity
column (computed automatically from some natural key(s)) which is not a primary key of the table. For a single entity multipe records may exist in the table that represent the entity at different periods of time. We’ll use this terminology here:
-
Identity: is the key that identifies an entity of a SCD through time. Here we’ll often use identity in a loose sense to refer to the entity which it identifies.
-
An Iteration is each of the revisions or variations over time that an Identity may go through. Each iteration of an identity is identified by a surrogate primary key (id by default) has an effective period in which the iteration is the valid representation of the identity. Here this period is specified by start and end dates (so that variations which have any frequency higher than daily cannot be handled by this method)
Models that use this plugin must provide an identity
column that establishes the identity of the entities in addition to the table’s primary key that identifies each iteration (revision) of the entity. An iteration is active at an effective period of time which is is defined, with day-granularity, by two integer columns, effective_from
and effective_to
, using YYYYMMDD format.
By default effective_from has value 0 and effective_to 99999999; these special values meaning unlimited periods of time (0 represents the ‘start of time’ and 99999999 the ‘end of time’).
SCD models must also provide a compute_identity
method to compute the identity attribute from other attributes of the record.
Assume that we have a model @Country@ with attributes @code@, @name@, @area@, etc. We can attarch ‘acts-as-scd’ behaviour to this model using either the @acts_as_scd@ or @has_identity@ methods. [FIXME: to be determined if they are going to be synonyms or not]
We’ll use the latter. The parameter passed to it is the type of the @identity@ column. Notice that we must define how identity is to be computed.
class Country < ActiveRecord::Base has_identity :string, limit: 3 def compute_identity self.identity = code end end
We would define the schema for this table:
create_table :countries do |t| t.string :code, limit: 3 t.string :name t.float :area t.string :identity, limit: 3 t.integer :effective_from, default: 0 t.integer :effective_to, default: 99999999 end
Imagine you’re in a country named Germany at a date such as 1930-01-01 (Disclaimer: no historic accuracy intented). We can create a record to represent it like so:
Country.create_identity code: 'DEU', name: 'Germany', area: 357_02
Now we have a single record that represents an iteration of the country with an indefinite life span (even though we probably know the country hasn’t been around forever, we’re not interested now in such details).
We can query for the current iteration of Germany with:
Country.current.where(identity: 'DEU').first
Note that the current
term, when applied to identities, is used to mean the last iteration if it extends indefinitely to the future (effective_to == 99999999). The last iteration may have a effective end date, meaning that it has disappeared at that date, and in that case it would not be current. Also, if iterations exist with effective dates in the future, the current iterations may not be active at the current date. To get the iteration which is active at the current or any other date, the at
methods should be used.
And also we can request the iteration that was active at some specific date:
Country.at(Date.new(1930,1,1)).where(identity: 'DEU').first
Now, imagine a war comes and goes and the country is split in two. The original country can be modified as:
Country.create_iteration 'DEU', {area: 248_688}, Date.new(1949,10,7)
Note that we identify the country by its identity and must only supply here the attributes that change from the ‘current’ existing iteration. Let’s compute the area that Germany has lost:
germany_v1 = Country.initial.where(identity: 'DEU').first germany_v2 = Country.current.where(identity: 'DEU').first area = germany_v2.area - germany_v1.area
A new country is established in the lost area; let’s call it East Germany:
Country.create_identity code: 'DDR', name: 'East Germany', area: area
To make things more interesting, let’s assume some more time passes and someone breaks down the division between the countries at, say, 1990-10-03:
date = Date.new(1990,10,3) deu = Country.current.where(identity: 'DEU').first ddr = Country.current.where(identity: 'DDR').first merged_area = deu.area + ddr.area Country.current.where(code: 'DEU').first.terminate_identity(date) Country.create_iteration 'DEU', {area: merged_area}, date ....
You can tae a look at the tests while this is being developed…
We’ll use anoter model, City, to show how to work with associations between SCD models.
…
In some cases, modification of SCD tables is performed not on individual records, but on the whole table at a given date. A typical example of this is an industrial classification system such as SIC for the US (or NAICS for North America, UKSIC, CNAE –Spain– GICS, ISIC, .…)
…
-
Query methods returning (further-refinable) queries
-
Query methods that return objects
-
Methods with limitations
Related iterations, …
-
Write Tests
-
Write Documentation
-
Require modal_fields or make it optional?
-
Create generator to add identity to a model and generate migration
-
Test with both Rails 3 & 4
-
Release gem 1.0.0