Skip to content

Store the history of a browser's visited paths

License

Notifications You must be signed in to change notification settings

atlas-engineer/history-tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Global History Tree

This data structure can be used to store the history of visited paths or URLs with a file or web browser, in a way that no “forward” element is ever forgotten.

The history tree is “global” in the sense that multiple owners (e.g. tabs) can have overlapping histories. On top of that, an owner can spawn another one, starting from one of its nodes (typically when you open a URL in a new tab).

This global history tree structure reifies all this.

Goals

History tree

In many popular web browsers and file managers, the history is linear. This is unfortunate because it loses information such as the branches that are created when going back, then forward.

Take this linear history:

A -> B

We are on B. If we go back to A, then to C, the linear history becomes

A -> C

We lost the information that we visited B.

A tree history solves this issue by encoding the history as a tree-like data structure instead of a list. With a tree history, the above example would yield

A ->  C
  \-> B

Inter-owner relationships

In web browsers and file managers, content is browsed by what this library calls an owner (e.g. a tab). It’s common practice to create a new owner visiting content coming from an existing owner (e.g. opening a link in a new tab). This induces a relationship between owners.

In particular, this type of owner relationships means that the history of related owners can be related.

This library allows us to have such trees:

(X-A
  (X-B1
     (X-C1 X-C2))
  (X-B2
     (X-D1 Y-D2)))

The X-prefixed nodes belong to owner X, while the Y-ones belong to Y. X current node may be X-B2, while Y current node may be Y-D2.

X is said to be the creator of Y, and Y-D2 is the origin node of Y. Y owns only 1 node, while X owns 6 nodes. None of them overlaps, but wait.

With current owner being X, if we go forward to Y-D2, then we own it and it becomes the new current node of X. Now Y-D2 is owned both by X and Y.

Similarly, Y can “go back” to X-A, which becomes its current node, while X-B2 becomes the forward child of X-A for Y. If Y now visits X-B1, it becomes the forward child of X-A for Y (while X-B2 is still the forward child of X-A for X).

A node may have multiple children. For each owner, the “forward child” is the default choice of which child to visit when calling the forward function. Observe that each node may have different forward children for each of their owners.

Vocabulary notes

Data
Whenever we refer to data, we mean the arbitrary content the user stores in the tree (such as URLs or paths). The data is automatically deduplicated.
Branch
since “history tree” refers to the whole data structure, we avoid using the term “tree” otherwise to avoid any confusion. Instead we talk about “branches”. Whenever we refer to a branch, we mean to whole set of inter-connected nodes up to the root (a node without parent).

Note that the history tree may have multiple, non connected branches.

For entry, node, binding, owner, see the documentation of the respective classes.

Integrity and garbage collection

While the history tree is not immutable per se, it tries to retain as much information as possible. But keeping nodes forever would lead to an ever-growing tree, which is usually not desirable. So the policy is to delete all the nodes of a branch only when they become owner-less. So this happens only on owner deletion.

Everything else remains:

  • Bindings (ownership) cannot be removed as long as the owner is not deleted.
  • Nodes cannot be deleted other than by the aforementioned mechanism.

History data deletion

Entries can only be deleted with delete-data if no node refers to the entry. This can be inconvenient if there are many nodes used by many owners which refer to the entries we would like to delete.

A few options:

  • delete-owner removes an owner. If all the owners are removed from a branch, the branch is garbage-collected. If the entries that were pointed to by the branch nodes are not referenced in any other branch, the entries effectively become node-less and thus available for deletion.
  • reset-owner disowns all the nodes of a given owner and creates a new root node pointing to the current entry of the owner. This makes it possible to free nodes and entries without deleting an owner.

Concurrency

This library is not thread-safe. The user is expected to use a mechanism such as mutexes to guarantee the integrity of the global history tree.

Rationale: It’s only too common that the user wants to persist the global history tree to disk. In this case, some form of thread-safety should already be used for the persisted file. This safety can be trivially used to guarantee the integrity of the global history tree in memory as well.

Customizable entry uniqueness

The entries must be unique in a sense that’s defined by the user. For instance, if the user wants to store

(defclass web-page ()
  ((url :accessor url)
   (title)))

entries, the title might be irrelevant for uniqueness.

Thus, to store web-page’s by unique URL, you can create a history with the url accessor as a key:

(htree:make :key 'url)

When adding an entry with the same URL but with a different title, the existing entry’s title is automatically updated to the new one, but the entry object stored in the tree remains the same.

Change log

0.1.2

  • Remove NASDF as a dependency.

0.1.1

  • Switch from hu.dwim.defclass-star to nclasses.

Future work

Hash tables vs. lists

Initially it was decided to encode the set of unique entries as a hash-table for performance reasons. The reasoning was that hash-tables have constant-time access to their elements as opposed to the more practical Lisp lists, for which access is in linear time.

It turns out that element access in a list is extremely fast with SBCL, and a quick benchmark shows that it’s only when exceeding about 10.000.000 entries that hash tables start becoming more interesting. So maybe hash tables were not the best choice for a set that’s unlikely to have more than 100.000–1.000.000 entries.

Previously we explained how the uniqueness is customizable. In standard Common Lisp, hash tables accept only eq, eql, equal or equalp as test function. So to allow full customizability as in the previous example, we resort to the cl-custom-hash-table library.

Custom hash tables have restricted the design somewhat. For instance, the entries hash table values are the entries themselves, so that we have a way to access the stored keys in constant time. (Indeed, when you call (gethash my-web-page entries), there is no guarantee that the matched key is identical to my-web-page.)

Immutability

The global history tree strives to be as immutable as possible, as we explain in the sections on integrity and deletion. This helps both the developers and the users understand what’s going on, which is essential for such a complex data structure.

It could have been better to have a fully immutable data structure (in the functional programming sense), e.g. using the FSet library. It’s unclear whether the performance penalty would be too important. We would need some benchmark here.

One benefit of full immutability is that we can know precisely when the global history tree was modified (e.g. when my-history is reassigned to the new history value). This allows us, for instance, to serialize only on modification and thus avoid useless serializations, which may be expensive when the history grows big.