Skip to content

Lazy Collections

Simon Brooke edited this page Sep 1, 2021 · 1 revision

If we're serious about a massively parallel architecture, and especially if we're serious about passing argument evaluation off to peer processors, then it would be madness not to implement lazy collections.

Architecturally lazy collections violate the 'everything is immutable' rule, but only in a special case: you can do the equivalent of replacd on a cell that forms part of lazy collection, but only as you create it, before you release it to another process to be consumed; and you may only do it once, to replace the NIL list termination value with a pointer to a new lazy sell.

From the consumer's point of view, assuming for the moment that the consumer is a peer processor, it looks just like a list, except that when you request an element which is, as it were, at the write cursor -- in other words, the producing process hasn't yet generated the next element in the collection -- the request will block.

## How are lazy sequences created?

Essentially, lazy collections are created by a very few primitive functions; indeed, it may boil down to just two,read which reads characters from a stream, and mapc which applies a function to successive values from a sequence. reduce may be a third lazy-generating function, but I suspect that it may be possible to write reduce using mapc.

## What does a lazy sequence look like?

Essentially you can have a lazy sequence of objects, which looks exactly like a list except that it's lazy, of a lazy sequence of characters, which looks exactly like a string except that it's lazy. For practical purposes it would be possible for mapc to generate perfectly normal CONS cells and for read to generate perfectly normal STRG cells, but we've actually no shortage of tags, and I think it would be useful for debugging and also for process scheduling to know whether one is dealing with a lazy construct or not. so I propose two new tags:

  • LZYC -- like a cons cell, but lazy;
  • LZYS -- like a string cell, but lazy.

I acknowledge that, given that keywords and symbols are also sequences of characters, one might also have lazy symbols and lazy keywords but I'm struggling to think of situations in which these would be useful.

## How do we compute with lazy sequences in practice?

Consider the note parallelism. Briefly, this proposes that a compile time judgement is made at the probable cost of evaluating each argument; that the one deemed most expensive to evaluate is reserved to be evaluated on the local node, and for the rest, a judgement is made as to whether it would be cheaper to hand them off to peers or to evaluate them locally. Well, for functions which return lazies –– and the compiler should certainly be able to infer whether a function will return a lazy -- it will always make sense to hand them off, if there is an available idle peer to which to hand off. In fact, lazy-producers are probably the most beneficial class of function calls to hand off, since, if handed off to a peer, the output of the function can be consumed without any fancy scheduling on the local node. Indeed, if all lazy-producers can be reliably handed off, we probably don't need a scheduler at all.

Clone this wiki locally