You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The task and target infrastructure will need to be reimplemented to be easier to understand, more robust and more compatible with additional targets (like remote files).
It turned out to be quite useful to detect changes in the dependency graph through a checksum mechanism:
Each initial target (i.e. not created by a task) computes a checksum of its contents
Each task computes a checksum calculated from the checksums of its inputs and its code
Each created target gets passed a deterministic key by its task
Each intermediate or final target computes its checksum from the checksums of its task and its deterministic key
All checksums are saved in a database, and changes in a target's checksum force the target to be recreated
Old checksums need to be cleaned properly, or switching back and forth won't cause a rebuild
How do we compute the initial checksums?
A relatively nice way is to use pickle to dump the object into a string, which can be hashed with SHA1.
How do timestamps (and possibly other info) fit in with this?
They could be treated as a fallback, as in
IF a target's checksum hasn't changed AND it offers timestamps THEN check if the timestamp has changed compared to the last saved state
The text was updated successfully, but these errors were encountered:
The Target and Task classes have been rewritten to make them more flexible:
Targets have access to their version after the last run (using LevelDB) and have full control over deciding whether they are up to date
Initial targets (without a creating task) now don't compute their hash from their contents (this lead to many problems). Instead, every target must now have a unique identifier (what this means depends on the Target, e.g. a filepath). This also allows us to scrap the unique_key field set on a task's outputs.
To make datapipe a lot more efficient, the only thing saved in LevelDB now is the memory dict that contains the relevant data for checking the state of a target.
The memory is also serialized through simplejson instead of pickle, which should make things more robust.
The task and target infrastructure will need to be reimplemented to be easier to understand, more robust and more compatible with additional targets (like remote files).
It turned out to be quite useful to detect changes in the dependency graph through a checksum mechanism:
How do we compute the initial checksums?
A relatively nice way is to use pickle to dump the object into a string, which can be hashed with SHA1.
How do timestamps (and possibly other info) fit in with this?
They could be treated as a fallback, as in
The text was updated successfully, but these errors were encountered: