scratchwork.txt


Goal: Small scripting language that can be embedded anywhere, though not necessarily with the compiler attached.
Features:
	Lisp syntax
	No dependency on any library.
	C/C++
Drawbacks:
	Lisp syntax
	Redundant code due to lack of libraries.

Types
	constant
	name
	function
Composite Types
	enum
	array
	struct
	union

Objects
	int
	bool
	enum
	array
	string
	struct
	union
	function

There will be a parent table called _. This table is the parent of all global variables.
There is no type difference between a string and an identifier. A string constant can contain any character. An identifier must start with a letter or identifier_symbol and the rest of the characters must be letters, identifier_symbols or numbers.
If an identifier is passed to a expression or subexpression, the value will be used in the operation if possible. If the identifier cannot be found in the variable list, then the identifier will be converted into a string literal.
Every expression is a table. When executing a table, the first value must be a function.
("a" "b" "c") is {"a", "b", "c"}.

((a ('a' 'b' 'c')) 'b' 'c')

a/b/c
a/b:

When calling a function, there are two tables. The first is the table that passed to the function. The second is the table that defines how the function is called.

[function f [a b c] [
	[return [+ a b c]]
]]
[f 1 2 3]

The calling list is f [1 2 3]
The definition list is f [a b c]
This is the same call expressed in a different way: [f f.b:2 f.c:3 f.a:1], or [f .b:2 .c:3 .a:1] (shorthand) This can be represented in the AST, but it will not necessarily execute sucessfully if the members are defined outside the function call.

evaluateSubexpression -> Returns a bool, an int, a float, a string, or an identifier.
identifierToValue
stringToIdentifier


"ABI"
	before
		stack base
		args length (always > 0) (DuckLisp FP points here)
		function name (string)
		arg1
		arg2
		arg3
		arg4
		...
		loc0
		loc1
		loc2
		loc3
		...
	after
		stack base
		args length (always > 0) (DuckLisp FP points here)
		function name (string)
		arg1
		arg2
		arg3
		arg4
		...
		loc0
		loc1
		loc2
		loc3
		...
		ret0
		ret1
		ret2
		ret3
		...
		rets length

Data tree
	Access by name is fast.
	Hashes of a given string must always return the same value no matter the scope.
	An object's parent object is its namespace.
	An object may have children.

Can have two methods of execution
	Tree walk
	Virtual machine

Function types
	Bytecode generation
		Appends or inserts bytecode into the block.
	C callback
		Inserts C function call into block in bytecode wrapper.
		This class of functions is called by a bytecode generation function.
	Heterogeneous
		Calls bytecode and callback functions.


nop
add8
add16
add32
add64
sub8
sub16
sub32
sub64
push
pop
return


#call bytecode
#call *bytecode
#call C
#call *C
#add *float *float
#add *float float
#add *int *int
#add *int int
#sub *float *float
#sub *float float
#sub *int *int
#sub *int int


It turns out that this was not a bug in the allocator. Hurrah!
                (dl_memoryBlock_t) { /* 90 */
                        .block = 4A690AF, /* offset = 111 */
                        .block_size = 2,
                        .allocated = true,
                        .unlinked = false,
                        .previousBlock = 44,
                        .nextBlock = 94,
                },

                (dl_memoryBlock_t) { /* 44 */
                        .block = 4A690AE, /* offset = 110 */
                        .block_size = 1,
                        .allocated = true,
                        .unlinked = false,
                        .previousBlock = 42,
                        .nextBlock = 90,
                },


I'm going to split the compiler and VM to a certain extent. DuckLisp will always have the VM, but it won't always have the compiler. This will allow me to use as little memory as possible on small devices. For example, I doubt MicroComp will be able to compile DuckLisp for a while, even after a C compiler is ported to it.

Constant propigation
	All the function has to do is be marked as pure. If it is pure and all arguments are constants, then the function may be pre-calculated in that instance.
Tail call
	Is a recursive call the last function called? Replace the arguments and jump to the beginning.


Bookmark:
	Figure out how bytecode chunks are structured in relation to each other before they are merged.
		Function definitions can all be placed at the end of the program.
		Function calls will only be generated after the function has been defined.
	Make `compile` generate function call bytecode and call generators.
	I have encountered a need for temporary variables. The question is, how do I allocate and free them? Once I've solved that, is there an easy way to do it more efficiently?
	The problem with recursion is lack of context. With a stack, I can just pop the last value to see the context. With a function call, I can't see anything above the current function.

--- Two months later ---

^^^ Helpful, but I'm still confused.
Stack is for placing objects on during program execution, right?
Scope stack is for placing symbols on during program compilation.
	These symbols are variables and generators.
My code does not entirely reflect the above.

I seem to have VM functions that manipulate the stack. If I am to completely seperate the VM from the compiler, then I need to move them to the VM module. C callbacks should not be passed to the compiler. The compiler should return a mapping of names to callback slots, and the callbacks should be placed in those slots during VM initalization. These slots will likely be stored on the stack.

Generators will generate high-level bytecode. This bytecode is inserted into a linked list that doubles as an array. Branch targets will be stored as indexes to the array, which will allow instructions to be inserted between the branch and the target with no side effects.
Functions are compiled to bytecode, and placed after the `defun` instruction. `defun` then pushes the address of the function on the stack. `call` will take the stack address and extract the address of the function. The function will then be called.
C functions are not stored in bytecode. They are stored on a separate stack dedicated to C callbacks. The `ccall` instruction will take the callback stack address and then run the function in that position. Forward references to a label are dealt with by inserting a pseudo-instruction to act as the target. When the bytecode emitter comes across this pseudo-instruction, it immediately replaces it with the first of its own instructions.

To ease the load on my brain, I think I will add an intermediate representation. This is solely for the purpose of dealing with branch targets.

Compiler sees expression. Compiler looks up function name in the compiled function list. If it is there, a function call is generated. Compiler looks up function name in C function list. If it is there, a C function call is generated. Compiler looks up name in generator function list. If it is there, the compiler passes the expression to the generator. The generator emits bytecode. If a generator does not exist, a compile error is thrown.

Recursive descent could work for optimization. A generator can not see anything above it, but it can see below it. The generator can traverse down the tree as far as far as it desires, and if it sees an optimization, it can rearrange the tree. The generator will then return and the compiler will traverse the tree by one node. It will then call the generator for that function. This will continue until the whole tree has been traversed.

Each high-level bytecode instruction will contain an opcode class and arguments. An opcode is an enum. Arguments are unions that can have the types integer, float, string, or label. Labels are `ptrdiff_t`s that point to an element of the bytecode array.

High-level bytecode is assembled into raw bytecode. The opcode is determined from the opcode class and argument sizes, and branch targets are calculated from the final instruction lengths.

Perhaps I should create generators that emit a single instruction so that it is possible to write VM assembly directly in the program? I could also add an assembler that accepts text assembly files. I doubt they would be too hard to parse.

If I go with all of the above, we will have these modules:
	DuckLib.so
	duckLisp.so
	duckVM.so
	duckAsm.so

We traverse the tree top-down because we want to allow the generators to optimize the the tree if they wish.

There are three stacks. "The Stack" is the runtime VM stack. The generator stack contains generators. The function stack may contain C functions.

All functions are anonymous in the VM.

Tree → list strategy:
	Generator:
		Check arguments.
		Reorder arguments in tree.
		Create list of instruction list fragments. These lists will be joined with the compiled arguments, which may themselves have similar trees of instruction list fragments. It looks sort of like this:
			ast = [a, b, c]
			generator:
				instructionList = [i0, i1, i2]
				order = [2, 1, 3]
				reorder(ast, order)
			for i in range(len(ast)):
				newInstructionList.append(instructionList[i])
				newInstructionList.append(ast[i])
		Expand tree into instruction list. This is easy, since we just traverse the tree and append each leaf to the end of the list.

bytecode file format:
	((ascii8[2] DL) (uint16 <callbacks length>) (uint32 <bytecode length>) (uint8[<bytecode length>] <bytecode>))

[[i0 i1 i2] [i3 i4 i5] [i6 i7 i8] [
	[[i9 i10 i11] [i12 i13 i14] [i15 i16 i17] [
		[[i18 i19 i20] [i21 i22 i23]]
	]]
	[[i24 i25 i26] [i27 i28]]
]]

node = [[instruction*]* [node*]]

instructions = [instruction*]
instructionsList = [instructions*]
nodes = [node*]
node = [instructionsList nodes]

node = dl_array_t:dl_array_t
nodes = dl_array_t:dl_array_t
instructionsList = dl_array_t:dl_array_t
instructions = dl_array_t:duckLisp_instructionObject_t
instruction = duckLisp_instructionObject_t

Almost done with compilation. I have decided on the strategy of giving each generator its own piece of the instruction list. The problem is that arguments must be evaluated left-to-right to allow the program to execute top-down.

(7 (3 (1) (2)) (6 (4) (5)))

(
  (defun copy-function ((f pointer:function) (size size_t))
    (var g pointer:function)
    (setq g (malloc size))
    (copy-bytes (cast g pointer:byte) (cast f pointer:byte) (* size size-of(byte)))
    (return g))

  (defun f () 0)
  (var p-g pointer:function)
  (copy-function p-g (addr-of f))
  (defmacro g ()
    #(call p-g))
  (g))

(
  (defun # #
    (var # #)
    (setq g (malloc size))
    (copy-bytes (cast g #) (cast f #) (* size (size-of byte)))
    (return g))

  (defun f # #)
  (var # #)
  (copy-function p-g (addr-of f))
  (defmacro # #
    '(call p-g))
  (g))

(
  (defun
    (var)
    (setq g (malloc size))
    (copy-bytes (cast g) (cast f) (* size (size-of byte)))
    (return g))

  (defun f)
  (var)
  (copy-function p-g (addr-of f))
  (defmacro
    '(call p-g))
  (g))

(
  (defun
    (var)
    (setq g (malloc size))
    (copy-bytes (cast g) (cast f) (* size (size-of byte)))
    (return g))

  (defun f)
  (var)
  (copy-function p-g (addr-of f))
  (defmacro
    '(call p-g))
  (g))

(() ()):    Left-right - 
(f () ()):  Undefined  - Whatever
(f (g ())): Outside-in - Tree, top-down, right-left

Outer expressions compile before inner.
Inner expressions' assembly come before outer's.
Left expressions' assembly come before right's.

Outer expressions compile before inner.
Inner expressions' assembly comes after outer's.
Left expressions' assembly comes before right's.

(7 (3 (1) (2)) (6 (4) (5)))
(1 (5 (7) (6)) (2 (4) (3)))

(1 (2 (3) (4)) (5 (6) (7)))
(1 (5 (6) (7)) (2 (3) (4)))
(1 (5 (7) (6)) (2 (4) (3)))

(5(2 (*1) (*1)) (4 (*3) (*3)))

(7 (3 (1) (2)) (6 (4) (5)))
(7 (3 (1) (2)) (6 (4) (5)))


(7 (3 (1) (2)) (6 (4) (5)))

7 +
 3 +
  1 +
  2 -
 6 -
  4 +
  5 -

Required objects:
	Return stack
	Index? May be built into the array.
	Current node

+ Create array. Push array in current node. Push array on stack. Set array as current node. Push instruction list in current node.
0 Push instruction list in current node.
- Push instruction list in current node. Pop array from stack. Set current node to popped array?

+ Create array. Push array in current node. Push array on stack. Set array as current node. Push instruction list in current node.
0 Push instruction list in current node.
- Push instruction list in current node. Pop array from stack. Set current node to popped array?

I will need a local struct for this.

struct node_s {
	union {
		dl_array_t *node;
		dl_array_t *instructions;
	}
	dl_bool_t isNode;
} node_t;


rrealloc: Instead of adding memory to the end of the block, it adds it to the beginning.


Move pusher to top of loop. Done.
Generate tree in pusher.

node
  instructions
  node
  node
  node

First element in node is always an instruction array.
Store node addresses on stack, not nodes.

We have a dedicated node struct now.
All nodes will be kept in a dl_array.
Nodes will keep an array of nodes by storing the index in the master array.
Indices will be pushed on the stack.

Needed universal identifier for each node → index to a single array.
Needed each nodes to contain a list of nodes → indeces to a subset of the array elements.
Desired a way to keep track of all nodes in use → single array containing nodes.
Desired a way to use dl_array instead of raw memory allocation.


Whew! That's done.


The VM is a modified stack machine. It might be more helpful to think of it as a Harvard architecture machine with a data memory that can grow infinitely.
Each function call creates its own environment. This is mainly a feature of the compiler. When a function is called, local variables are pushed onto the stack. At the end of the function they are popped. DL functions are stored in program memory. Functions are DL objects that contain either a pointer to a DL function or a pointer to a C callback. Since all objects are stored on the stack, all functions are local variables that can be pushed and popped. Variables are referenced by index relative to the current frame pointer. The reason for this is that the stack pointer changes too often to easily keep track of indexes using it, and absolute addressing takes up a lot of space. In most cases, the frame pointer will remain constant for the entire duration of a function. If another function is called, the callee will need to use an offset relative to the frame pointer of the scope that contains the function.

stack — variables
variable — generic data

"ABI"
	before — Push, call, push
		stack base
		args length (always > 0) (DuckLisp FP points here)
		function name (string)
		arg1
		arg2
		arg3
		arg4
		...
		loc0
		loc1
		loc2
		loc3
		...
	after — Push
		stack base
		args length (always > 0) (DuckLisp FP points here)
		function name (string)
		arg1
		arg2
		arg3
		arg4
		...
		loc0
		loc1
		loc2
		loc3
		...
		ret0
		ret1
		ret2
		ret3
		...
		rets length
	cleanup — Return, copy
		ret0 ← Rets are now locs owned by the caller
		ret1 ← Rets are now locs owned by the caller
		ret2 ← Rets are now locs owned by the caller
		ret3 ← Rets are now locs owned by the caller
		arg3 ← Stack top
		arg4
		...
		loc0
		loc1
		loc2
		loc3
		...
		ret0
		ret1
		ret2
		ret3
		...
		rets length

Calls to DL from C are done by pushing objects on the stack and then calling the function. Return values are placed on the top of the stack for C to pop.
The stack is persistent between calls.
Bytecode is not persistent between calls. Bytecode can be run straight by the VM, or a function that is currently on the stack can be called. A bytecode function pointed to by the stack will point to a random block of memory. C will be given no indication of when it is freed.

This is a one-shot run of some bytecode. This may define functions that can be used by other chunks of bytecode. Dangling pointers may result if `bytecode` is freed.
e = duckVM_execute(&duckVM, bytecode, bytecode_length);

This is a function call. DVM knows nothing about function names, so an index must be given instead.
e = duckVM_pushObject(&duckVM, duckLispObject);
e = duckVM_call(&duckVM, function_index); // Index is an absolute address on the stack.
e = duckVM_popObject(&duckVM, &duckLispObject);

Solved:
	C → DL calling conventions.
	DL → C/DL calling conventions?
	Local variables.
Unsolved:
	Non-local scope addressing. Real CPUs deal with this by storing functions in memory, not on the stack.


A label may only have one destination address.
A label may have many source addresses.

A label will have its index placed in a trie right after it is assembled to bytecode.
A backward goto will calculate its jump distance when it is converted to bytecode.
A forward goto will use 32 bits and enter its address into a trie (with label as key) that points to an array with pointers to all the gotos to that label.

(goto <label>) — Create label and goto array if not already created. Emit high-level jump instruction.
(label <label>) — Create label if not already created. Emit label pseudo instruction.
label trie points to an array of links?
Jump — Insert current address in car of link array.
Label — Insert current address in cdr of link array.

Goto trie will contain pointers to arrays of source addresses.
Label trie will contain destination addresses.
After bytecode list is generated and goto and label tries are populated, create jump link array.

I did the label scoping wrong. Scoping should always be done in the generators, but I did it in bytecode generation. What I am currently doing is giving the label name (a string) as the instruction argument. What I should be doing is looking up the label index in the label trie (which may have been constructed in the same generator). The index, not the name, should be passed as the instruction argument. During code generation, the index is used to populate the element of the label array that contains the jump structure.

I dealt with the label scoping problem. It works better. The problem I'm having now is again forward references. Apparently forward references are a problem for scoping in addition to bytecode generation. If a goto or label (they are the essentially the same) is placed inside a scope and another goto or label has been placed after the scope, two labels will be created, one inside and one outside. If a goto or label preceeds the scope, only one label is created.

Rules:
	A label returns no value. It is a pseudo-instruction.
	A label may only appear in the top-level array of a closed scope expression.
		Allowed:
			(
			  (label "cleanup"))
		Disallowed:
			(+ a b (label "cleanup"))
			(push-scope (label "cleanup"))
		This rule is partially an extension of the first rule. This allows the () expression to create the label before its children search for it.

Escape sequences are now expanded in CST → AST.


Discord Hanabi chat:
<discord user=an_origamian>
I've been working on branching.
Scoping for it is done and wasn't too hard, though the current implementation only allows labels in the top level of an expression with an expression as its function part.
```lisp
;; Labels allowed here.
(
  (label l)
  (nop)
  (goto l))

;; Labels disallowed here.
(+ (label l) 5 (goto l))

;; Labels disallowed here.
(+ (push-scope)
   (label l)
   (nop)
   (goto l)
   (pop-scope))

;; Labels allowed here.
(+ (
     (label l)
     (nop)
     (goto l))
   5
   6)
```
.
Labels are pseudo-instructions that have no bytecode representation.
Gotos are jump instructions.
At the moment, labels and gotos return no value.
```lisp
;; Jump8 uses a one-byte relative address.
(jump8 offset) ; Two bytes long.
;; Jump16 uses a two-byte relative address.
(jump16 offset-LSB offset-MSB) ; Three bytes long.
;; Jump32 uses a four-byte relative address.
(jump32 offset-LSB offset-ML offset-MH offset-MSB) ; Five bytes long.
```
When a jump instruction executes, the instruction pointer (IP) points first to the opcode and then to each of the bytes in the operand. After the relative address has been extracted from the instruction, the IP points to the instruction after the jump. The relative address is then added to this value of the IP, and the VM continues executing code from the new address.
.
The compiler portion of the `compile` function generates high-level assembly. Instead of emitting a jump8 instruction, it would simply emit the jump high-level instruction. This then gets converted to one of the above three variants of jump by the assembler portion of `compile`. The question is, how does the assembler choose which instruction to emit? It seems simple at first. Just subtract the source of the jump from the target label. If all jump instructions are the five-byte form, this works just fine, but if the jump instructions vary in length, then it becomes more difficult to calculate the real target address. Take this for example:
```lisp
(
  (label l)
  <130 nops>
  (goto l))
```
This is compiled to (using a representation of the internal high-level assembly)
```lisp
(
  (label 0)
  <130 nops>
  (jump 0))
```
`0` is the index of the label in the label array. It does not correspond to the actual address in any way.
And now down to bytecode:
```lisp
0   <130 nops>
130 <jump8, 16, or 32?>
```
To determine the size of the instruction, we need to know the jump distance. This is easy. Distance ≈ 0 - 130 = -130. |-130| > 127, so we need to use the 16-bit version. The distance is measured from the last byte of the jump instruction, so we also have to add that to the naive total. -130 - 3 = -133.
```lisp
0   <130 nops>
130 jump16 16'd-133
```
`16'd<number>` is Verilog's notation for 16-bit decimal numbers. It's quite convenient for this sort of thing.
Easy! It's just a bit of arithmetic.
Here's where it gets complicated.
```lisp
(
  (label l)
  (label m)
  <124 nops>
  (goto l)
  (goto m))
```
↓
```lisp
(
  (label 0)
  (label 1)
  <124 nops>
  jump 0
  jump 1)
```
↓
```lisp
0     <124 nops>
124   jump?? ??
124+? jump?? ??
```
Now how do we calculate this? It's not that much more difficult to figure out, but if we focus on the second jump it illustrates the point. If we assume the first jump is jump8, then the whole instruction will be two bytes long. If we assume the second jump is jump8, it will also be two bytes. Label `m` will be 0-(124+2+2) = -128 bytes forward. Since the size of the address fits in eight bits using two's compliment negative numbers, jump8 is sufficient to reach the target.
```lisp
0   <124 nops>
124 jump8 8'd-126
126 jump8 8'd-128
```
And it works out! This only happened because of the assumption that each offset fit in a single byte. If we use jump16, it no longer fits.
```lisp
0   <124 nops>
124 jump16 16'd-127
127 jump16 16'd-130
```
It would be possible for the assembler to optimize this to use jump8s, but assuming the smallest size instruction and working up to largest is much easier to program.
.
Needless to say, most programs are going to have a ton of branches, and determining the size will not necessarily be easy.
One approach is to iterate over all the branches in the bytecode and gradually expand each one until the offsets of each one fit into the instruction. This requires that extra bytes are inserted into the middle of the bytecode. To do this with an array, the bytecode memory block would have to be reallocated and then a portion of the bytecode would have to be copied to make room for the inserted instruction. This is inefficient. Another approach is to make the bytecode an array containing links in a list. This has the advantage that links can be inserted into the middle of the list and each link can still be accessed by its index in the array. New list links are pushed onto the end of the array. The problem is that since incrementing the array index does not necessarily mean incrementing the list index, distance can no longer be determined by subtracting the source index from the target index. A third approach is to create a system of equations describing the relationships between each branch. Once the equation is solved and addresses are mathematically calculated, distance is no longer required, so all that needs to be done is rewrite the jump opcodes and insert links after them with the correct address. I don't think creating the equation will be hard, but solving it will be another matter since it will either have if statements or logs.
I've read that optimizing the code size of addressing is not something that programmers worried about even in the 70s, but I still think it would be nice to have.
.
`asize(a)` = `ceiling(log128(a)) + 1`
```c
asize(0) = 0
asize(1) = 1
asize(-1) = 1
asize(127) = 1
asize(128) = 2
asize(-128) = 1
asize(-129) = 2
```
`index` = Index of byte in bytecode array before addresses are inserted.

Source:
```lisp
(
  (label l)
  (label m)
  <124 nops>
  (goto l)
  (goto m))
```
Assembled bytecode:
```txt
<124 nop opcodes>
; No operand yet. Just opcode.
jump8
jump8
```
Annotations:
```txt
l0
l1
    <124 nops>
b0  jump l0 a0
b1  jump l1 a1
```
Left field is an address.
Middle field is an opcode.
Middle-left field is a label so you know where it goes.
Right field is an address associated with the opcode. It is not actually in the bytecode array.
This is the system of equations that result.
```c
l0 = index
l1 = index
b0 = index
a0 = l0 - (b0 + asize(a0))
b1 = b0 + 1 + asize(a0)
a1 = l1 - (b1 + asize(a1))
```
This expands to
```c
l0 = 0
l1 = 0
b0 = 124
a0 = l0 - (b0 + asize(a0))
b1 = b0 + 1 + asize(a0)
a1 = l1 - (b1 + asize(a1))
```
```c
a0 = 0 - (124 + asize(a0))
b1 = b0 + 1 + asize(a0)
a1 = 0 - (b1 + asize(a1))
```
```c
a0 = -ceiling(log128(a0))      - 125
b1 =  ceiling(log128(a0))      +   2
a1 = -ceiling(log128(a1)) - b1 -   1
```
Wheeeee!!!! Logs and ceilings, and maybe even if statements! I have no idea how to solve this without a trial and error approach. Trial and error may be an acceptable solution.
</discord>

1. Start with best estimate using 2-byte jump instructions.
2. Clear the "not done" flag. Set the cumulative offset to zero. For each link… (in order)
  a. Add cumulative offset.
  b. Calculate required instruction size.
  c. Set "not done" flag if current instruction size is too small.
  d. Increase instruction size to required instruction size if necessary.
3. If "not done" flag is set, goto 2. Else, goto 4.
4. Reallocate the bytecode to account for the new size of the branches.
5. Write the branch instructions and relocate the incumbent bytecode.

Sort:
	One array is sorted.
	One array is unsorted.
	The destination array is double the size of the other two.

Array is filled and sorted.


I'm writing generators now. They are a pain, simply because dynamic typing requires detecting many types.
Solutions:
1. Replace the type tree with a single type struct. Will require lots of work, but will reduce the code complexity throughout duck-lisp.
2. Make a list of DL API functions. Refer the chart when writing generators.

Did #1.
Writing generators should be a bit easier now.

The iterative tree traversal is a pain. Let's try making it recursive.

The compiler will change.
The assembler may change.
The generators will change.
The parser will not change.
The disassembler will not change.
The emitters should not change.

Steps:
source → reader ⇒ AST
(AST → generator ⇒ AST) & (AST → emitter ⇒ assembly)
assembly → assembler ⇒ bytecode
bytecode → disassembler ⇒ disassembly

Code structure:
loader
    reader
	compiler
	    generator
		    generator
			emitter
	assembler
	optimizer
disassembler

Fibonacci: {

push-integer.8  00        ; (var a 0)
push-integer.8  01        ; (var b 1)
push-integer.8  00        ; (var c 0)
                          ; (label loop)
push-index.8    00        ; (print a)
c-call.8        00
pop.8           01
push-string.8   01 "\n"   ; (print "\n")
c-call.8        00
pop.8           01

add.8           00 01     ; (+ a b)
move.8          03 02     ; (setq c %)
pop.8           01
move.8          00 01     ; (setq b a)
move.8          02 00     ; (setq a c)

push-index.8    00        ; (print a)
c-call.8        00
pop.8           01
push-string.8   01 "\n"   ; (print "\n")
c-call.8        00
pop.8           01
push-integer.16 E803      ; (print 1000)
c-call.8        00
pop.8           01
push-string.8   01 "\n"   ; (print "\n")
c-call.8        00
pop.8           01

push-integer.16 E803      ; (< 1000 a)
less.8          03 00
c-call.8        00        ; (print %)
brnz.8          C0        ; (brnz % loop)
pop.8           02

nop                       ; (nop)

jump.8          BB        ; (goto loop)

return
}

I ran into a problem I didn't expect. In order to calculate the condition for a branch, I need to push objects on the stack. In order to keep the stack balanced, I need to pop those items off the stack. So I guess I will do something similar to what I did to `progn` and pop all arguments required in the calculation before the branch. The number of arguments to pop will be given to `br??` as an additional integer argument. This argument will probably have to come after the jump offset in order to remain compatible with the current jump size optimization.

What the final version will look like:


push-integer.16 E803      ; (< 1000 a)
less.8          03 00
c-call.8        00        ; (print %)
+brnz.8          C0 02     ; (brnz % loop)
-brnz.8          C0        ; (brnz % loop)
-pop.8           02
-
-nop                       ; (nop)
-
-jump.8          BB        ; (goto loop)

The top scope seems to always be unused. This should not be deleted without investigation because it may be possible for the top-level expression to declare a local.

Woohoo! Branching works!

label 0 multiply-loop
goto  1 multiply-end
label 2 add-loop
goto  3 add-end
label 3 add-end
goto  2 add-loop
goto  0 multiply-loop
label 1 multiply-end

label multiply-loop
goto  multiply-end
label add-loop
goto  add-end
goto  add-loop
label add-end
goto  multiply-loop
label multiply-end

Problem fixed by making labels greater than gotos when a tie occurs when sorting.

Macros:
	Create VM.
	Push arguments as locals.
	Run macro.
	Retrieve and paste result.
	Destroy VM.

Functions:
	Save PC on call stack.
	Push arguments as locals.
	Jump to function.
	Run function.
	Set PC to top of call stack, pop all function arguments and locals, and push the result.

Functions will be placed in the bytecode in relation to where they were defined in the source. The first instruction of a function will be a jump to the end of the function. This is so that program flow can pass through a function without running it. It *is* a hack, but right now I'm probably too lazy to do anything more complicated like relocating the function body.

(defun 1+ (a)
  (print a)
  (print (+ a 1)))
(print (1+ 5))

(goto --g###1)
  label 1+
  push-integer 1

  (print -1)

  add -2 -1
  
  (print -1)
  
  return 2
(label --g###1)
push-integer 5
call 1+
ccall print


All stack addressing is relative to the frame pointer.

Bookmark:
    Add implicit progn to function body.
    Return instruction
	    pop
		reset frame pointer
		return
	call instruction
	    save frame pointer
		jump

Bookmark:
	Use frame pointer.
	Find error messages that don't actually throw an error and force them to.

FUNCTIONS WORK!!!

REPL time!
I need to free my memory. Here's the list of all resources:

main::duckLispMemory freed
duckLisp_init::duckLisp->source Needs to be reallocated in duckLisp_loadString and freed in duckLisp_quit
duckLisp_init::duckLisp->errors Needs to be reallocated in duckLisp_loadString and freed in duckLisp_quit
duckLisp_init::duckLisp->scope_stack Needs to be reallocated in duckLisp_loadString and freed in duckLisp_quit
duckLisp_init::duckLisp->generators_stack Needs to be freed in duckLisp_quit
duckLisp_init::duckLisp->labels_stack Needs to be reallocated in duckLisp_loadString and freed in duckLisp_quit
I think cst_append is fine.

duckLisp_quit needs to clean up a bunch of stuff.

Labels will need to be able to return absolute program addresses.

(defun length (list)
  (var i 0)
  (while (not (null? list))
	(setq list (cdr list))
    (setq i (1+ i)))
  i)

(defun nreverse (list)
  (var reversed-list (list))
  (while (not (null? list))
	(setq reversed-list (cons (car list)
							  reversed-list))
	(setq list (cdr list)))
  reversed-list)

(defun append (list1 list2)
  (var appended-list (list))
  (while (not (null? list1))
	(setq appended-list (cons (car list1)
							  appended-list))
	(setq list1 (cdr list1)))
  (while (not (null? list2))
	(setq appended-list (cons (car list2)
							  appended-list))
	(setq list2 (cdr list2)))
  
  (nreverse appended-list))

;; Edit: I don't believe this will work. It can capture variables, but it will create a copy instead of pointing to it.
(defmacro lambda (caps args &rest body)
  (var name (gensym))
  `(no-scope
	(defun ,name ,args
	  ,body)
	(list (get-label ,name)
		  ,(append args caps))))

;; This is complicated for a simple function call…
(defun funcall (function arguments)
  (var label (car function))
  (var args (car (cdr function)))
  (var args-length (length args))

  (while (not (and (null? args) (null? arguments)))
	(push (if (null? (car args))
			  (
			    (var result (car arguments))
			    (setq arguments (cdr arguments))
			   result)
			  (car args)))
	
	(setq args (cdr args)))

  (call label))

Required keywords/functions
    lambda
        defmacro
	    gensym — partially implemented
		quote — Implemented
		list — Implemented
		    cons — Implemented
		get-label — Implemented
	funcall
	    car — Implemented
		cdr — Implemented
		null? — Implemented
		push — partially implemented
		call — Implemented

(setq a-closure (list func a b c))

move.8 is throwing an error. This is resolved and was probably caused by bad balancing.

It would be *nice* to intern symbols, but I don't know how easy that would be. If I omit symbols, string literals could be a problem, but then again, I could just put them inside strings. It shouldn't be too hard to detect a string literal. The only problem this causes is when algorithmically creating string literals, but in this case, just wrap quotes around the string to distinguish it.

So here are the disadvantages of not having a dedicated symbol type:
    Symbol comparison is slow.
	The difference between symbols and strings is that symbols have a specific string format.

There is now a new symbol data type. It does not yet have a master package. Edit: Now it does.

get-label prerequisites:
    Fetch absolute address of labels. I may have to do more fixups. On the other hand, these addresses are absolute, so that means four bytes for every one of them.
	Fetch labels by name, ID, or pointer. Labels lose all of their identity when they are copied into the links array, and multiple target duplicates are created.

I need to keep track of
    Absolute address of target
	Absolute address of push-integer.32

number of targets != number of push-integers
number of links > number of targets
number of links != number of push-integers

Easy solution: Add an array to certain links, the elements of which point to the push-integer.32s.
Slightly more difficult solution(?): Create a target-reference link array.

Create a second set of links which point from the `push-integer.32`s to the targets?
I think all I have to do is set `.size` to 4 and the optimizer will figure everything out for me.

I could do something really hacky. 😀 Subtract 4 from "push-integer.32" and then set `.size` to 4 before optimization. 4 will be added to the instruction during optimization, which will change the instruction back to "push-integer.32".
The proper way would be to add another boolean to `jumpLink_t` that keeps it from being optimized and treats it as if there's a size of 4. This should be slightly slower.

Absolute addressing routed using boolean.

Callbacks implemented.

Macros will be a special case of generators. There will be a dedicated generator macro that will look at its name and then call the macro script associated with it.

Lambdas are about twice as complicated as I expected. Apparently I need to capture *the original variable*, not a copy.

defun f x
        [λ y
          (funcall y x)
       	 λ z
       	   ← x z]
(funcall f 5)

compiles to

    f: jump e0				  defun
   λ0: jump e1				  defun→lambda0
       funcall y			  defun→lambda0→funcall→expression
	   return 1				  defun→lambda0
  $c0: push-closure λ0 1 x 	  defun→lambda0
e1,λ1: jump e2				  defun→lambda1
       set-uv ux z			  defun→lambda1→expression
	   return 1				  defun→lambda1
  $c1: push-closure λ1 1 x	  defun→lambda1
   e2: push-list 2 $c0 $c1	  defun
       make-uv 1 x			  defun
	   return 3				  defun
e0,$x: push 5				  funcall
       call f				  funcall

New functions:
    push-closure λ0 1 x — stack.append('(λ0 &x))
	call c — (let ((a (cdr c)))
	           (while a
	             push (car a)
				 ← a (cdr a))
			   jump (car c))
	set-uv i x — ← uv[i] x
	push-uv i — stack.append(uv[i])
	make-uv 1 x — uv.append(x)

Maybe I should make all functions anonymous. That would certainly help with consistency. On the other hand, It would be nice if pure functions could be called normally so that a ton of near-empty closures aren't stored on the stack.

Let's try both.
`defun` will return a closure on the stack. It will then be popped unless it is explicitly saved by a `var` or `setq`. `lambda` will do the exact same thing, but it will not be given a name. This makes `defun` into a labeled lambda.

Added function scoping. Function scoping is different from normal scoping because it indicates the boundary of local variables. If a form breaks that boundary and references variables in a parent function, a closure must be formed to keep addressing from breaking.

`generator_expression` and `generator_compoundExpression` will need to know how to register an upvalue, and possibly allocate space on the stack for it.
`generator_setq` will need to know how to register and set an upvalue.

Compiler: Stack space needs to be allocated for an upvalue *before* the upvalue is actually used so that it can be treated as a function argument. The most direct way is to traverse the code tree beforehand to see if any upvalues are used. I'd rather not do this. A potential alternative is to create a VM instruction, `push-uv`, that will push the specified upvalue on the stack. This is less efficient if the upvalue is used multiple times since the original plan would do one push at the start of the function while this plan will push the upvalue every time it is used.

`push-closure` pushes a closure on the stack. A closure contains the function address and upvalue addresses.
`funcall` saves the current instruction pointer and upvalues, jumps to the new function, and sets the new upvalues as the current one.
`set-uv` sets an upvalue to the value of a stack object. The address of the upvalue is found by pointing to the correct upvalue address in the current upvalue array.
`push-uv` pushes an upvalue onto the stack. The address of the upvalue is found by pointing to the correct upvalue address in the current upvalue array.
`make-uv` allocates a new upvalue.

Why did I clear the locals_length in functions?
It is preventing me from accessing variables in parent scopes.
I suppose that is how it is supposed to work. All addressing is supposed to be done with an upvalue.
The problem is that an upvalue needs an index to point to. With multiple nested functions, it is impossible to determine the proper index of the free local.
Fixed. locals_length is no longer cleared in functions.

Statics can be added in a very similar way to upvalues. push-static, get-static, set-static

The upvalue list will actually be multiple. There will be one for each scope. When the scope is popped, so is the list. When the scope is initialized, the list is empty.
Each scope will have an upvalue trie. If a free variable is used in an inner scope, the current scope's trie will be searched for the upvalue index. If the index is nonexistent, a new upvalue will be created. The index will be inserted into the instruction that references the free variable.

These multiple arrays of upvalues will exist in both the compiler and the VM.

defun f x
        [λ y
          (funcall y x)
       	 λ z
       	   ← x z]
(funcall f 5)

compiles to

    f: jump e0
   λ0: jump e1
   $y: push-uv uy
  $x1: push-index $x
       funcall $y $x1
	   return 1
  $c0: push-closure λ0 1 x
e1,λ1: jump e2
       set-uv ux z
	   return 1
  $c1: push-closure λ1 1 x
   e2: push-list 2 $c0 $c1
       make-uv 1 x
	   return 3
e0,$x: push 5
       call f

Made big mistake. Variables are allocated and freed on the stack based on scope. What I should be doing is allocating and freeing locals only during function call and return.

Should I be creating closures for a variable in any parent scope?
I think the answer is "yes". Let's do that.
From now on, every scope may have upvalues. This may take more VM memory and increase program size, but it should make it more regular.

The first instance of an upvalue should register itself with the nearest function body above it if the free variable exists and is defined outside the function. Each scope can do some bookkeeping if needed to check if it has encountered that variable before.

Only create an upvalue if the free variable is defined outside of the current function.
Locals are kept track of per scope; upvalues are kept track of per function.
If the search for a variable remains inside the current function, do what you normally would do.
If the search for a variable crosses the function boundary, then create an upvalue.
If you find yourself creating an upvalue, first check to see if one already exists in the current function.
The upvalue is stored in the scope where the original variable was defined, but all containing functions will push closures referencing it.
When the scope containing the upvalue exits, it should move the local into the upvalue heap.

Parent scopes possess and manage upvalues.
Child functions create and capture upvalues.

`push-closure` is executed at the end of a function definition.
`funcall` is executed to trigger a function call.
`set-uv` is executed during function execution. Does not affect the stack.
`push-uv` is executed during function execution.
`make-uv` is executed at the end of a scope. Does not affect the stack.

Referencing non-local values will have zero overhead if the variable and the reference are in the same function.
Function definition will have overhead of one extra instruction.
Referencing an upvalue may add overhead.
Setting an upvalue should have zero overhead.
Exiting a scope will have overhead of one extra instruction if an upvalue is created.

I *think* I can implement this in the compiler now. It actually doesn't sound that hard.

`make-uv` doesn't actually make an upvalue. It might, but most of the time `push-closure` will do that. `make-uv` will just copy the local into the heap, so maybe it should be called something like `transfer-uv`?

`make-uv` and `push-closure` can both create upvalues.

`set-uv` and `push-uv` will both index upvalues by upvalue array address. This is an index into the function's array of upvalues.
`make-uv` and `push-closure` will both index upvalues by the index of the local variable. The index will be used as the key to access the upvalue address (not index), which will later be used during function calls to create the upvalue array, which is an array of pointers to the upvalues.

Creating a closure in the VM:
    There exists a stack of the same length as the data stack that contains pointers to upvalues.
    push-closure:
	    Arguments are indices of the free variables.
		Do a lookup of the variables in the upvalue stack. If an element corresponding to a free variable is null, create an upvalue and link it in the stack. Upvalues should point to the stack element that they were created for.
		Create the closure, storing the upvalue addresses.
	make-uv:
	    Arguments are indices of the free variables.
		Do a lookup of the variables in the upvalue stack. Copy the stack elements they correspond to into the upvalue heap (I think this means another implementation of GC).
	return:
	    Pop upvalues stack same number of times as data stack.

Compiler should be emitting all code required to read free variables.

`push-closure` must be inside the function scope so it can access the upvalues list. Function bypass label must come before `push-closure`, which means it and its jump must reside in the same scope. Function name label must come after the function bypass jump, so it must be in the same scope. Function name label must be outside the scope so that other functions can reference it.

And the answer? Temporarily break the scope abstraction. We can fix it later by emitting a `lambda` and a `var`.
Everything is kept in the local scope except the function name label.

Containers: objects, conses, upvalues
Closures are not on the list because they are contained in objects.

Functions must all become closures. The only valid call to a function is through a closure. The function label will become a gensym that is wholly contained by the function scope. The function name will be associated with an object instead of a label. Recursion might still be possible by sheer luck.

(keyword get-env ())
(keyword eval (exp env))
(keyword set-env ())
(
 (var x 5)
 (eval (quote (
			   (var y 3)
			   (+ x y)))
	   (get-env)))

(lambda (n)
  (if (= n 1)
	  1
	  (* n (self (1- n)))))

(defun let ((quote name) value (quote body))
  (eval (list ((quote var) name value)
			  body)
		(get-env)))

(let x 5
  (
   (setq x (1+ x))
   x))

(
 (var name (quote x))
 (var value 5)
 (var body (quote (
				   (setq x (1+ x))
				   x)))
 (eval (list ((quote var) name value)
			 body)
	   (get-env)))

(
 (eval (list ((quote var) (quote x) 5)
			 (quote (
					 (setq x (1+ x))
					 x)))
	   (get-env)))

Two types of macros?
(comptime (defmacro ...))
(runtime (defmacro ...))

Ideal duck-lisp reader: All steps happen at the same time. A single char is read at a time, and if a form is ready, it is compiled. Reader macros can call any function that is fully defined when that parser function is run. Macros can call any function that is fully defined by the time it is called.

duck-lisp v1.1: Functions are compiled as soon

Macros capture something. What do they capture?

(let ((x 5))
  (defexpander (lambda (body)
				 `(progn
					,@(mapcar (lambda () body) (number-sequence 1 x))))))

Macros themselves aren't a problem. Scoped macros that call functions aren't even that bad. The problem is that while functions can potentially exist at compile time, variables only have values at runtime.

Here's the root of the problem. What does the following code do? Perhaps `x' is a symbol at compiletime?

(runtime (
		  (var x 5)
		  (defun *2 (v) (* v 2))
		  (funcall (compiletime ((defun naughty () (*2 x)))))))

`*2' exists at both runtime and compiletime, primarily because it doesn't capture anything. If you want it to only exist at runtime, then change it to `(var *2 (lambda ...))`. `defun' is special.
`naughty' exists only at compiletime. It seems to me that the current environment should be the symbols names assigned to themselves.

First step is to redo how functions are compiled. Currently, when a function definition is found, high-level assembly is generated. In the new system, each function will be fully compiled to bytecode as soon as it is found. Function signatures will be added as well since those are nice. Maybe I'll add variadic functions too.
Second step is to implement macros, which should be easy once the above is done. A macro will be a normal function with a flag set that says it is a macro. It exists only at compile time. When a macro instance is encountered, just call the compiled functions on the code in the arguments and compile the result. Macros will be able to call existing functions, but only if they are not closures.


Every upvalue in the upvalue array call stack should always have an object on the stack associated with it.

Every transfer of a closure must result in an increment of the upvalue array reference count.
Every deletion of a closure must result in a decrement of the upvalue array reference count.
On a reference count of 0, the upvalue array must be deleted.


Function arity might not actually be that hard. If I do it in the VM, all I need is to add an extra field in the instruction for the number of arguments. It's almost as easy in the compiler. I just copy the length of the bindings expression and pass it to the assembler.
I already have a partial function signature. It's the one that I'm struggling to GC. Once I solve that, this addition will require little effort.
Funcall could pose a problem, but I think all I have to do is tell it the expected arity. If the arities match, then there's no problem. If they don't match, then an error is thrown.


I could assign every function an index, and then pass the number of functions to the VM. The VM would create an array of function signatures. These would then be used by function declarations without needing to ever free them.
Another option is to prepend all function signatures to the beginning of the bytecode. When the VM starts up, it creates a static (not C `static`) array of signatures. It then starts executing the bytecode. Function declarations will then point to the function signature instead of directly to the bytecode.
Arity could also be placed at the beginning of the function definition. A function call would jump to a portion in bytecode and then read the arity. This would not work as well for closures since the signature is also needed for `push-uv` and `set-uv`, so they would also need access to the definition.
Instead of doing a deep copy or shallow copy of closures, I could just keep references to them on the stack.
I could add garbage collection for upvalue arrays and ignore all this reference counting junk. It also allows me to limit the number of upvalue arrays that can be created. When I add string GC, I can probably copy most of the upvalue array GC since a string is just an array of chars.

GCing the upvalue arrays worked perfectly.


Free on compiled:
	duckLisp_register_label:duckLisp_t.labels
Free on quit:
    duckLisp_generator_quote:duckLisp_t.symbols_array
	duckLisp_generator_quote:duckLisp_t.symbols_trie


The slowest part of the language is the memory allocator.

Most memory freed. Some is still lost when the compiler throws an error.

Simple pattern matching macros shouldn't be too hard to add. Hygienic is not practical though.

(setq list (cdr list))

(defpmacro to (var op &rest args)
  (setq var (apply op var args)))

Need runtime arity for &rest to be practical.
Need a trie local to pmacros for variable expansion? Could do simple search-and-replace instead.

Unfortunately, this requires the same exact machinery as normal macros since the expansion happens apart from the definition.

If a function is pure (meaning it doesn't use any free variables and doesn't call any impure functions), then it could be compiled and saved for use by macros.

These can all fully define a bytecode stream:
source code
CST
AST
assembly + labels — This one is annoying. Maybe I can merge these two? Labels are currently global, but could easily be generator and emitter parameters.
bytecode

These fully defined forms are useful to store and manipulate:
source code — User input
AST — True form of the language
assembly + labels — Optional, but can be useful to interact with.
bytecode — Executable code


Macros are easy to call in other macros because macros are defined in their own environment. Functions are not easy to call in macros because they are defined in a messy environment that doesn't actually exist at compile time.
Recursion in macros is easy because they are normal lambdas.


List types offer a layer of indirection between objects on the stack and conses on the heap. A cons should never appear on the stack, even though it is an object.
A non-null list should always point to a cons. It should never point to any other type.


Once everything is using one single heap, it should be a lot easier to make improvements, such as switching to a copying collector.

I spent a day debugging the VM and compiler since I thought there was a bug in it. I think I did fix a few potential bugs, some of which may have caused my original problems, but I think the final bug was forcing a garage collection from a running script. So that makes me think that there is a problem with the FFI.

Scratch that. Problem still remains. I'm guessing it's still GC related, and that maybe it has to do with the upvalues or upvalues array stack not being traced?

The bug disappeared when I went from this

(
 (defun 1 …)
 (defun 2 …)
 (defun 3 …))

(
 (defun 1 …)
 (defun 2 …)
 (defun 3 …))

to

(defun 1 …)
(defun 2 …)
(defun 3 …)

(
 (defun 1 …)
 (defun 2 …)
 (defun 3 …))


These problem numbers are buggy when done how I want to: 9 11


Definition: Pure functions are functions who's only free variables are C callbacks and pure functions. Recursion is permitted.


The bug is caused by creation of a corrupt upvalue array.
An object is created and collected multiple times. The first time it was allocated as an upvalue in "9.dl".
A closure is created who's upvalue array points to that upvalue, even though it was supposed to point to another upvalue.

I should try single-stepping when that closure is created.

Upvalue stack is corrupted.

Upvalue stack element 91 is a corrupted upvalue.

Watchpoint did not notice any change to `duckVM->upvalue_stack.elements[91]`. This makes me think that the element it's pointing to is what is getting corrupted, not the pointer itself.

Just to make sure, I checked the pointer value before and after the corruption, and the pointer didn't change.

The upvalue ended up on the free list.

An upvalue on the upvalue stack may not have any reference to it other than the upvalue stack itself.
On the other hand, the arrays on the upvalue array stack already have closures referencing them on the stack.

Bug is fixed. "9.dl" runs smoothly.


When the VM exits, the stack should be cleared, but all objects on the heap should remain until the next `_pushObject`.


Allowing pure function calls in macros is the final major feature I want in DL.


Using an arena allocator may have been a much better choice than the style of allocation I'm using now.


Each scope has a field that indicates if the first function encountered in a lower scope is pure (meaning it does not capture free variables). The only function that needs to set that field to a useful value is `defun`. The problem is by the time the generator for `defun` is called, the scope containing the "pure" annotation is gone. Somehow I need to thread that information from `lambda` through `var` to `defun`. One solution is to take the bodies of `var` and `lambda` out of their functions and turn them into new functions with an extra `*pure` argument. It doesn't feel right, but I suppose it doesn't *really* break the "generator-emitter" model.

We leave `defun` alone, except we call a different function for `var`.
We relocate the entire body of `var` to another function that has `*pure`.
We relocate the entire body of `lambda` to another function that has `*pure`.

And it turns out that I didn't need another field in `duckLisp_scope_t`. All I needed to do was check if the current closure captured anything, though I should somehow check to see if it captured itself.

Compilation of locals is greater than O(n). Every time a pure function is called, the body would have to be included in the binary, resulting in a ton of duplication. An alternative that is O(n) is to use static variables. All pure functions will have to be compiled twice, once for the final binary and once for use in macros.


Statics can be changed to be addressed either by name or by index. Any variable that has no definition will be treated as a named static variable. The compiler will issue a warning if this happens. If a static variable is registered with the compiler or declared beforehand the warning will not be issued and the static will be addressed by index instead of by name.

Statics may be created and destroyed at any time. Statics deleted during execution are simply marked deleted without freeing any memory.

Named and unnamed statics are almost one and the same. All unnamed statics must have a name.
Statics are objects that are pointed to by the statics array. Each entry has a string associated with it that acts as the name.
"Unnamed static" refers only to how the static is addressed. If the static is registered during compilation, it gets an entry in the unnamed statics array. References to statics that the compiler is explicitly told about are treated as unnamed.
When a static is deleted, it is only deleted from the statics array. Statics are never deleted from the unnamed statics array.

The statics array is traced by the garbage collector.
The unnamed statics array is traced by the garbage collector.

Unnamed references to statics are statically scoped. Named references to statics are dynamically scoped.


Macros use named references to statics for pure function calls.


Macros & pure functions and lexical scope are annoying.

Create a single VM instance to be used for all macro calls.
Pass a vector to the VM to trace for GC.
For each pure function:
    Compile each pure function to bytecode.
    Attach the bytecode to the associated symbol in the current scope.
    Execute the bytecode and store the result as a global variable.
    Put the result on the GC-traced vector.
For each macro call:
    If worried about memory: Delete all named global variables.
    For each pure function in the current scope:
        Add the closure object as a global variable.
    Call the macro function.


Lambdas stored in dynamic variables crash when called by bytecode they did not originate from.

1. A copy of the entire bytecode can be copied onto the heap.
2. A copy of the small pieces of bytecode that the function uses can be copied to the heap.

The former is probably better since function definitions may contain other function definitions, so the former reduces bytecode duplication.


Perhaps I should remove static access and only keep dynamic access? Deleting statics doesn't really work well with global closures. Global closures don't work well with dynamics either, but at least I can delete a dynamic and not have the closure address the wrong global. The VM halts if the requested global doesn't exist instead of silently using it as it would if the addressed global was a deleted static.

Final decision: Statics are fast, but must go.

How to improve space efficiency of dynamic variables? Maybe use symbols? Symbols mappings are never deleted, so I think this would actually work. I would just do it with an associative array.


Macros & pure functions should be a little less annoying.

Create a single VM instance to be used for all macro calls.
Pass a vector to the VM to trace for GC.
For each pure function:
    Compile each pure function to bytecode.
    Execute the bytecode.
    Put the result on the GC-traced vector.
    Attach the returned closure to the associated symbol in the current scope.
    Free the bytecode since the VM holds a copy.
For each macro call:
    For each pure function in the current scope from the root to the leaf:
        Add the closure object as a dynamic variable, overwriting any dynamics with the same name.
    Call the macro function.


New features:
    No more static variables
    Dynamic variables
    Garbage collected bytecode
    Tracing of user-added objects.

Static variables have been removed and dynamic/global variables have replaced them.


Bytecode is copied to the heap once on VM start.
The call stack may require references to the bytecode currently in use so that the VM knows which set of bytecode to execute when a function is called and returns.
New closures save a reference to the currently executing bytecode object.

If I were to create a standard duck-lisp, I would not include globals in the specification. This means that a compliant implementation would only need a minimal call stack like is currently implemented.

Bytecode copying and GC implemented.


User object tracing should be easy enough.


An alternative to adding objects to trace is to create a global vector that stores objects I want to keep alive. That would be a very Emacsy way of doing it. Since the GC isn't compacting, objects always stay in the same spot once they are allocated. Because of that, I can gensym the name and hold references to the global vector and all pure functions that I shove in it.

Create a single VM instance to be used for all macro calls.
Pass a vector to the VM to trace for GC.
For each pure function:
    Compile each pure function to bytecode.
    Execute the bytecode.
    Put the result on the GC-traced vector.
    Attach the returned closure to the associated symbol in the current scope.
    Free the bytecode since the VM holds a copy.
For each macro call:
    For each pure function in the current scope from the root to the leaf:
        Add the closure object as a dynamic variable, overwriting any dynamics with the same name.
    Call the macro function.
Destroy the VM.

The GC-traced vector is actually a nested list. A new cons is added to the list each time a new scope is created. Each element holds a list of the pure closures.

It looks like there is already an array of DL functions in each scope. Currently they hold a copy of the bytecode. They will need to be changed to hold a reference to the closure.
Macros are the same way, but they actually work, so I'm going to leave them alone for now.

The DL scope stack only exists to prevent garbage collection of the closures.


Nope. It's still hard.

The problem is that I want to create closures, which means I can't simply use dynamic scope for all the pure functions. Using dynamic scope for lexically scoped functions won't work.
On the other hand, I now have the ability to hold closures on the scope stack, so maybe there's a way I can use actual lexical scope?


I think running callbacks in macros is easy. I should just need to copy over the symbol table and then delete it before I call duckLisp_quit.


The duckVM scope stack along with its closures will stay, but I will use genuine lexical scoping. I will copy over the compiler scope stack to the sub-compiler and then have it compile a single pure function that may reference other functions. This will create a small chunk of bytecode that is then run to create a closure. The closure is placed on the VM scope stack. When a macro is called, then entirety of the VM scope stack is manually placed on the VM's stack and the macro is run.
The only problem I can see right now is that the upvalues might need to reference objects with pointers, but if I recall correctly, they instead use indices which shouldn't cause a problem.

NOPE. The fact that variables are missing from the stack means that I either have to push dummy variables on (which seems impractical) or compile pure functions entirely separately from the main program.
The ideal solution would be some sort of incremental compilation and execution that would allow me to compile and evaluate every function definition once.

The macro itself couldn't care less whether the context it is run in is lexical or global. However, considering my plan with pure functions it might be better if they are compiled lexically.

I don't *want* two scope stacks.

Fexprs are tempting, but I'm trying to keep a sharp line between compiler and interpreter. Fexprs aren't useful when compiling for a microcontroller.

Pure functions *must* be lexically scoped. Pure functions will ideally be converted to closures *first*, and then functions that reference them will be compiled and store them as upvalues.

I *can* add a separate scope stack for pure functions. I just don't want to. I may or may not be able to do incremental compilation and execution.

This is literally the last major feature until the language is complete. It should be OK if it gets messy. I never have to look at that part again. If DL is too slow and I actually do need to fix it, then I can switch to Common Lisp or Scheme and add a DL compatibility layer.

This would be so much easier if C had garbage collection.

I can generate modular high-level assembly.
Functions are *always* closures, and labels never escape closures (that I am aware of). This means that the only linkage to other functions is through stack indices. I might be able to recreate the labels array from just the high-level assembly, in which case, why do I have a global labels array at all?!?
I might be able to modularize bytecode generation and optimization.

I think the assembler as a whole can be moved to a separate function.

It turns out the name field of duckLisp_label_t was set but never read, so it's gone now. Now labels are just a list of instruction indices, which means I can almost certainly generate them right before they are needed. The labels trie will stay as it is though and I will probably add another counter to give each label a unique whole number identifier.


Label reference — Pushes a label.

	duckLisp_label_t label;

	/**/ dl_array_init(&label.sources,
	                   duckLisp->memoryAllocation,
	                   sizeof(duckLisp_label_source_t),
	                   dl_array_strategy_double);
	label.target = -1;
	e = dl_array_pushElement(&labels, &label);
	if (e) goto l_cleanup;

Label creation lookahead — Simply initializes the label. Does not populate it at all.

					duckLisp_label_t label;

					// declare the label.
					/**/ dl_array_init(&label.sources,
					                   duckLisp->memoryAllocation,
					                   sizeof(duckLisp_label_source_t),
					                   dl_array_strategy_double);
					label.target = -1;
					e = dl_array_pushElement(&labels, &label);
					if (e) goto l_cleanup;


Time to split.


AST can be composed.
High-level assembly can be composed in certain cases.
Bytecode can be composed in certain cases.


All interactions between modules to be composed should be done with the stack or globals. Labels should *always* be private to the module. It would be best if modules are always local functions since they are stored on the stack and labels are always local to them.


AST functions can be composed.
HLA functions can be composed.
Bytecode functions can be composed.


I want to incrementally generate assembly and compile it multiple times during that process. Only complete functions exist in the assembly when it is assembled.


The VM remains live in between runs, but the stack is reset. Before each run, assembly

I just remembered! The whole reason I want to use HLA instead of bytecode is that I can incrementally generate HLA but not bytecode. Though I might be able to generate bytecode incrementally and concatenate them together if I feed the function an empty assembly array.

There is a bytecode array dedicated to pure functions. It starts empty and grows over the course of the compilation.

defun (pure): The function is compiled to bytecode in the current scope. *Only* this function is compiled to bytecode. This segment of bytecode references other functions, but does not have the actual referenced function definitions in this bytecode. It is appended to the global pure function bytecode array once it is compiled.
defmacro: The macro is compiled to bytecode in the exact same way that pure functions are. Does it get appended to the pure functions bytecode as well?
macro: Call the macro function.

I don't think separating the assembler from the compiler actually helped much considering the only change I'm making is to feed the compiler an empty HLA array and concatenate all the bytecode together.

Now the problem is where to store the pure function tries. Do I do it in duckLisp_scope_t? But then I would pretty much need two compilers.

Pure functions are compiled in their own compiler instance. The pure function scope stack is copied to the new compiler as the main scope stack.

What needs to be copied into the new compiler:
    pure function bytecode
    scope
        pure functions
        macros
        "function scope?"
        upvalues?

Instead of copying everything, maybe I can just make the scope stack an extra parameter to generators? In practice, this would probably be called something like duckLisp_compilerState_t.

On compiler exit:
    global vm
    global gensym_number
    global generators_stack
    global symbols_trie
    global symbols_array
    local locals_length
    local scope_stack

I think I want to duplicate pretty much all of *_scope_t

typedef struct {
	dl_trie_t locals_trie;
	dl_trie_t functions_trie;
	dl_size_t functions_length;
	dl_array_t pure_functions;
	dl_trie_t generators_trie;
	dl_size_t generators_length;
	dl_trie_t macros_trie;
	dl_size_t macros_length;
	dl_array_t macros;
	dl_trie_t labels_trie;
	dl_bool_t function_scope;
	dl_ptrdiff_t *scope_uvs;
	dl_size_t scope_uvs_length;
	dl_ptrdiff_t *function_uvs;
	dl_size_t function_uvs_length;
} duckLisp_scope_t;
typedef struct {
	dl_memoryAllocation_t *memoryAllocation;
	dl_array_t errors;
	duplicate dl_array_t scope_stack;
	duplicate dl_size_t locals_length;
	dl_array_t generators_stack;
	dl_size_t label_number;
	duplicate dl_size_t label_numberForCompile;
	dl_size_t gensym_number;
	dl_trie_t symbols_trie;
	dl_array_t symbols_array;
	duckVM_t vm;
} duckLisp_t;
typedef struct {
	dl_array_t scope_stack;
    dl_size_t locals_length;
    dl_size_t label_numberForCompile;
} duckLisp_compileState_t;


Globals and generators should not be stored in the top-most scope. They should be stored in a separate global field in duckLisp_t.


Now that I can compile multiple pieces of code at the same time, I think I can remove the `subLisp` instances

I think the next step is to start compiling pure functions into a global _compileState_t.

It might be possible to use one bytecode array for all macros without doing any copying during macro invocation. All pure functions and macros would be incrementally compiled to bytecode and then when a macro is called, it just runs the script and calls the function at the correct stack index.

`var` will not write assembly when compiled in "pure" mode.
`defmacro` will not write assembly when compiled in "pure" mode.


Pure functions and macros are the same from a compiletime standpoint.
Pure functions are compiled in both the runtime and compiletime states.
Macros are only compiled in the compiletime state.
Macros are only executed in the runtime state.

The compiletime state is never compiled by normal functions. Instead, when it is desired to compile to the compiletime state, the compiletime state is set as the runtime state and a new compiletime state is created.

var runtime
+ runtime
defmacro comptime
defun runtime comptime
macro runtime

What does this code do?

(var x 4)
(if x
    ((defun f ()))
    ((defun g ())))

Both are compiled regardless of the value of `x' and the functions are in their own scope, so it's all good.
Functions are independent modules that can be thrown anywhere without worry about whether the labels will cross the boundary of a function. As long as the names of all labels are unique, all is well.

If `defmacro' were run both at compile time and run time like `defun' I think it would work without problems. Both versions would exist in their respective binaries, but only the compile time version would be called by the runtime. So oddly enough, `defmacro' is a special case of `defun'. `var` is normal like other generators. Macro calls are weird since they also transfer control to comptime.

defun runtime→runtime comptime→comptime
defmacro runtime→comptime &optional runtime→comptime
generator runtime→runtime
funcall runtime→runtime
macro runtime→comptime

So the real hard parts are `defun` and macro calls.


Closure addresses are absolute. Maybe I could generate the jump target at runtime when a closure is created? The `name' field would become the relative address to the start of the function instead of the absolute address.
Closure instructions are now relative, but closures objects are absolute and calculated on object creation.


This is my greatest nightmare:

(defun f (a) ())
(defun g (a)
  (defun h (a) (f))
  (defmacro m (a) (h))
  (m))

`f' and `h' are defined when `m' is called. How do I incrementally compile that snippet when `g' has unresolved labels?

Runtime is never called during compilation.
Comptime is

I think `defun' has exponential complexity.

r0 f
c1 f
r0 g
 r0 h
 c1 h
 c1 m
c1 g
 c1 h
 c2 h
 c2 m

(defun f (a)
  (defun g (a)))

 r0 g ← Normal runtime compilation.
 c1 g ← For use by macros defined in `f' but after `g'.
r0 f ← Final runtime
 c1 g ← Normal comptime compilation.
 c2 g ← For use by macros defined in `f' but after `g'.
c1 f ← Final comptime

The problem is that I have to fully compile a function definition inside the body of another function that is in the process of being compiled. How compilation currently works is that the inner function and all forms that come before it are compiled into one block of assembly.

The problem is unresolved labels caused by the fact that I would be only partially compiling functions. There's no way to jump over the function because the function isn't yet complete.

;; Source
(defun f (a)
  (defun g () ())
  (defmacro m () (g))
  (m))

;; Comptime pre-macro
(
 (defun g () ())
 (defmacro m () (g))
 (here))

;; Expanded
(defun f (a)
  (defun g () ())
  ())
(here)

;; Comptime post-macro
(
 (defun g (a))
 (defun m (a)))
(defun f (a)
  (defun g (a))
  ())
(here)


Runtime:

jump skip-g
  label g
  nil
  return
label skip-g

jump skip-f
  label f
  push-closure g 0 0
  nil
  return
label skip-f
push-closure g

(
 (defun g ()
   ()))
(defun f ()
  (declare g)
  ())


Comptime:

jump skip-g
  label g
  nil
  return
label skip-g
push-closure g 0 0

jump skip-m
  label m
  push-upvalue 1
  call
  return
label skip-m
push-closure m 0 0

jump skip-f
  label f
  push-closure g 0 0
  nil
  return
label skip-f
push-closure g

(
 (define g ()
   ())
 (define m ()
   (g)))
(defun f ()
  (declare g)
  (declare m)
  ())


It looks to me like I only need a runtime state and a comptime state. That's much better than the potentially infinite depth I was thinking I would need. The bad news: I may need to relocate function bodies so I can execute partially compiled functions.

Recursion is hard to think about.


Maybe I can detect impure functions by checking the runtime stack during comptime. If the referenced object is a variable, the function is impure.
This feels like static typing.


Declaration: Put the function name in the current scope.
Definition: The location of the function body.

If I can define inner functions before outer functions, then I should be able to get O(n) compilation of nested functions. THIS WILL MAKE FUNCTIONS NON-COMPOSABLE DUE TO LABELS CROSSING FUNCTION BOUNDARIES. Functions are declared as usual. Function definitions will each be compiled to a brand new assembly array. After the body is compiled to the new assembly array, the push-closure is written to the parent function.


(defun f ()
  (defun g ()
    (defun h ()
      ())))

Comptime:
(
 (
  (
   (defun h ()
     ()))
  (defun g ()
    (declare h)))
 (defun f ()
   (declare g)))
(declare f)

Runtime:
(define h ()
  ())
(define g ()
  (declare h))
(define f ()
  (declare g))
(declare f)


(defun f ()
  (defun g ()
    ())
  (defun h ()
    ()))

Comptime:
(
 (defun g ()
   ())
 (defun h ()
   ()))
(defun f ()
  (declare g)
  (declare h))


defun: Define the function. Push a closure.
define: Define the function. Do not push a closure. The name is not in the global scope, but a gensymed label links it to its parent function.
declare: Declare the function in the current scope. Push a closure.

`push-closure` needs to be written twice for comptime because the inner functions may be called before and after the outer function is defined. So one `push-closure` goes right after the inner function is declared and the other `push-closure` goes inside the outer function's body.
Right before the lambda generator exits, the new assembly array is appended to the main assembly array. The new assembly array is discarded.

Do I finally have it? 🥺 This almost feels robust.

The new assembly array is alive for the duration of the lambda generator.

First step is to do the definition relocation. After that's done we'll come back to comptime functions.


Labels peephole optimization:

jump a
…
label b
…
label a
jump b

can be converted to

jump c
…
label c

This should work nicely with the new function definitions.


No top-level forms other than functions and macros should be compiled to comptime assembly.
Top-level functions initiate writing to comptime assembly by compiling their bodies twice instead of once. They set a flag that tells nested functions to only compile their bodies once to the currently selected compile state.


I think I need to do compilation for both runtime and comptime at the same time, even in normal generators like add, if, list, etc. Maybe emitters can write to both assembly arrays at the same time if a flag is set?


In the comptime assembly, the only difference between pure and impure functions is whether definitions have a `push-closure` immediately after them. Pure functions do. Impure functions do not. Pure definitions have two `push-closure`s that point to them while impure functions only have one.
Top-level impure functions Are ideally not compiled to bytecode.


(comptime
 (global *evil* true))

(defmacro wicked ()
  (setq *evil* (not *evil*))
  *evil*)

(defun perverted ()
  (wicked))

The results in `perverted` returning `false` when used at compile time and `true` when used at runtime. Or perhaps the opposite.
Let's say `wicked` expands into one of two functions.

(comptime (defun comptime? () …))

(defmacro wicked ()
  (if (comptime?)
      (quote (defun f ()))
      (quote (defun g ()))))

Should both of the functions be compiled to comptime HLA? Or should `f` be compiled to comptime HLA but `g` be compiled to runtime HLA?

Answer: I give up. I don't think there's a way to do this that won't result in more complicated code doing something unexpected. All code with the exception of macros is only compiled into the runtime HLA by default. Comptime code must be explicitly wrapped by the `comptime` keyword.


Ideally `comptime' would incrementally *run* the program. A new instruction, "yield", can be used as the last instruction of the bytecode snippet. It will pause execution of the VM with the stack still populated. This will allow locals declared with `var' to be usable for the duration of the entire compilation.

I think I can just append new bytecode onto old bytecode without causing any problems with labels.

I think I can run new bytecode as is and discard old bytecode since closures should already be on the stack and link to copies of old bytecode.

Yielding: The VM halts but the stack remains. Nothing is popped. When the VM starts again, the PC initializes to the begining of the new bytecode as it usually does.


Identifier precedence:
local (bytecode)
free (bytecode)
all

Desired identifier precedence:
macro
local (bytecode)
free (bytecode)
all

(make-type)::Type
(make-instance type::Type value::Any closure::Any)::type
(type-of object::Any)::Type
(value object::Any)::Any
(function object::Any)::Any
(set-value object::Composite value::Any)
(set-function object::Composite function::Any)

(defun __make-type () 4)
(defun __make-instance (type value closure) (vector type value closure))
(defun __type-of (object) (elt object 0))
(defun __composite-value (object) (elt object 1))
(defun __composite-function (object) (elt object 2))
(defun __set-composite-value (object value) (set-vector-element object 1 value))
(defun __set-composite-function (object function) (set-vector-element object 2 function))

The new types are called "type" and "composite".

Matrix {
I've had quite a bit of trouble with stack balance, the past week and in the past, so I slightly changed the rules a little bit. The old ones were like below, but a little more ad-hoc.

1.  All "normal" generators emit a sequence of instructions to push one object on the stack. This one object may be popped by the calling generator without problem. As many generators as possible should be normal.
    + is a normal generator. Scope (which is invoked by wrapping parentheses around several statements) is a normal generator. `if` and `setq` are normal generators.
    `noscope` is not a normal generator. `funcall` is not a normal generator. Macros are not normal generators.
2.  Function calls pop their arguments and push one object on the stack.
3.  `var` and `defun` push two objects on the stack. The new top object can safely be popped by the calling generator.
4.  `defmacro` acts like `defun` in the compile time environment and `nil` in the run time environment.
5.  `noscope` may push any number of objects on the stack, but it always pushes one extra that can safely be popped by the calling generator. This allows `var`, `defun`, and `defmacro` to create static variables that remain after `noscope` exits.
}


Characters are dl_uint8_t. Assignment of an integer to a string element will truncate the integer to a character.
Strings are immutable.

(make-string sequence)
(get-vector-element string index)
(car string)
(cdr string)
(null? string)
(= left-string right-string)
(length string)
(symbol-value symbol)
(concatenate left-string right-string)
(substring string start-index end-index)


And that was the last planned feature.


API improvements:
    Don't require the user to mess with the internal object representations.
    Provide a function to make working with error messages nicer?


Each test is stored as "tests/*.dl". Each test in the directory will be run in a new instance of the language.


Label instructions in the HLA actually help with peephole optimizations. If I'm looking for a pattern such as {move.8 pop.8 move.8 pop.8} and the duplicate instructions are caused by the second `move.8` being the target of a jump, then the pattern won't match. The *actual* HLA is {move.8 pop.8 label move.8 pop.8}.


If I wanted a `save-lisp-and-die`, then I would need to dump all the stacks, the heap, and the bytecodes. All pointers would need to be converted to indices.


The new FFI provides some more consistent error handling.
The new FFI is inconsistent. There are only a few small inconsistencies though, and at least half of those are useful functions.
Lua has only a few data types and functions.
I can make up for some of this mess through documentation.


I could give each AST node a range to the source code. Handling macros is the hard part.


Release
    Fix bugs
    Free memory
    Improve compiler error reporting
    Fix and document C API


The parenthesis inferrer needs a DL instance to be able to declare identifiers while executing user-defined identifiers. The VM should be self-contained in the inferrer for simplicity, but it should still be extensible with user-defined generators and callbacks. I think the solution to this is to add a tree-walk interpreter similar to what the compiler has. `__declare` declares an identfier. `__inftime` executes DL code at inference-time. Both of these types of forms are deleted before inference is completed.

__declare __lambda (L &rest I)
__declare ` (I)
__declare , (I)
__declare ,@ (I)

(__declare declare (L L &rest I 1)
           (__lambda (identifier &rest tokens)
                     (__var name (__elt tokens 0))
                     (__var type (__elt tokens 1))
                     (__var forms (__infer identifier tokens))
                     (__var lambda (__elt forms 2))
                     (__list name
                             type
                             lambda
                             __quote c-scoped)))
(__declare __var (L I)
           (__lambda (identifier &rest tokens)
                     (__var name (__elt tokens 0))
                     (__list name
                             __quote L
                             __quote c-scoped)))
(__declare __defun (L L L &rest I 1)
           (__lambda (identifier &rest tokens)
                     (__var name (__elt tokens 0))
                     (__var type (__elt tokens 2))
                     (__list name
                             type
                             __quote c-scoped)))
(__declare __defmacro (L L L &rest I 1)
           (__lambda (identifier &rest tokens)
                     (__var name (__elt tokens 0))
                     (__var type (__elt tokens 2))
                     (__list name
                             type
                             __quote c-scoped)))
(defmacro vars (&rest names) (&rest L 1)
          ` (__noscope
             ,@ (mapcar (__lambda (name) ` (__var , name ())) names)))
(declare vars (L)
         (__lambda (identifer &rest tokens)
                   (mapcar (__lambda (node)
                                     (__list node
                                             __quote L
                                             __quote c-scoped))
                           nodes)))

First lambda: The returned alist contains the new entries to add to the scope.
Second lambda: The returned alist contains the names to delete from the scope.

Subsequent declarations may overwrite earlier definitions.
Nested declarations are simply shadowed.

__declare L I
__infer (__quote c-scoped) (__quote lisp-scoped)

`__infer` is a little weird as a sort of recursion can occur.

Two problems: How to handle forms like `let' and how to handle forms that declare forms.


declare let (lambda (identifier &rest tokens)
              (var bindings (first tokens))
              (var body (rest tokens))
              (var identifier-names (mapcar* (lambda (binding)
                                               (var identifier (first binding))
                                               (var body (rest binding))
                                               (infer … body)
                                               (push-declaration identifier 'L)
                                               identifier)
                                             bindings))
              (infer … body)
              (mapcar* (lambda (identifier)
                         (pop-declaration identifier))
                       identifier-names)
              `(,identifier ,bindings ,@body))

(let ((a 1)
      (b + 2 3))
  var c a
  + c b)

(declare let ((&rest (L I) 1) &rest I 1)
         (lambda ()
           ;; The bindings of the `let' statement is the first argument. It will be inferred using the type above.
           (var bindings (__infer-and-get-next-argument))
           ;; Create a new scope. All declarations in the scope will be deleted when the scope exits.
           (__declaration-scope
            (dolist binding bindings
                    ;; Declare each new identifier in the `let' as a variable.
                    (__declare-identifier (first binding) 'L))
            ;; Infer body in scope.
            (__infer-and-get-next-argument))))

(declare defun (L L L &rest I 1)
         (lambda ()
           (var name (__infer-and-get-next-argument))
           (var parameters (__infer-and-get-next-argument))
           (var type (__infer-and-get-next-argument))
           ;; `defun' defines `self' in order to allow recursion.
           (__declaration-scope
            (__declare-identifier 'self type)
            (var body (__infer-and-get-next-argument)))
           (__declare-identifier name type)))

__declare L I
__infer-and-get-next-argument __declaration-scope __declare-identifier


"__declare" : '((L I &rest 1 I)
                (__declare-identifier (__infer-and-get-next-argument) (__infer-and-get-next-argument)))


(__declare include (L)
           (__infer-file (__infer-and-get-next-argument)))


`duckLisp_read` is quite convenient. Next I should implement `astToInferrerType` so I can allocate an entire type tree in three or so lines.


What every data structure should have:

init(data)
quit(data)
string = serialize(data)

And optionally:

data = parse(string)


Perhaps I can do inference first and a type check after?

infer_expression
    run VM
    loop: inferArguments
    typecheck
inferArgument
    infer/run VM
    loop: infer_compoundExpression
    typecheck


Script calls are recursive. The implicit state passed to the script must be stored on a stack.

We will need to end all scripts with a `yield`.


I think it would have been better to use relative indices in generators by default.


Inferrer VM context is recursive.
Inferrer is incremental.
Inferrer is split between main function and VM.


I think the REPL situation is bad. I need to make it so that the top-level compile state sticks around after compilation is finished.


Inference is good enough for now. The inferrer internals don't need to be documented quite yet.


This language is made for Hidey-Chess and microcontrollers.
Hidey-Chess needs to run compiled functions. It also needs a REPL.
Microcontrollers need to run compiled functions. Even REPLs have to run compiled functions.

Hidey-Chess needs named variables. This is what globals are for. Unfortunately, the compiler has no knowledge of these. There are no global macros other than C generators. Comptime global variables aren't the same as runtime global variables.

Hidey-Chess could have completion, but it would have to depend on defined symbols.

Ideally I could do a partial compilation. A continuation? Or perhaps I shouldn't bother? I *do* need to make inference work with `include' though.


Decision: The REPL situation will stay as is, but a new library will be created that places functions in the global scope.


Either I can allow C to extend the inferrer, or I can add C reader macros.

`include' is moving into the reader as a user-defined function. This has the benefit that the include generator doesn't have to worry about compiling the included file. It just has to produce the AST.

To avoid parsing, maybe add a hook to be run after each expression or symbol is parsed?

The reader could have "reader actions" since the C callbacks don't act like reader macros.


`set' setters could be scoped. How would that be implemented?


(cvar setter-scopes ())

(cdefun setter-scope (&rest -1 body)
        `((push-setter-scope)
          ,@
          (pop-setter-scope)))

(cvar scope-extensions #(list setter-scope
                              ...))
(defmacro scope (&rest body) (&rest -1 I)
          var body body
          (dosequence extension scope-extensions
                      setq body (apply (car extension) body))
          ;; Final scope
          (cons nil body))

(defmacro set (pattern value)
  (if list? pattern
      (scope
       lvar setter (search-setter-scopes setter-scopes (car pattern))
       (apply setter value (cdr pattern)))
      `(setq ,pattern ,value)))

(scope
 (cdefun set-set-car (value cons)
         `__set-car cons ,value)
 define-setter car #set-set-car

 var a cons 1 2
 set (car a) 4
 a)  ; ⇒ (4 . 2)


{b1; b2 b3; b4} → (() (__noscope b1) (__noscope b2 b3) (__noscope b4))


Macros aren't working quite right *still*. Specifically, everything seems to work if I forbid `defmacro` from running at compile-time, but if I enable it stuff that abuses it tends to break.
I think I need to understand the model from scratch since there's so many places I just guess what statements to add or modify.


The way it used to work is functions could be nested. Now that is not the case. Inner functions are written to the global assembly array before outer functions. It works splendidly. The purpose of this was compile-time macros.
The runtime assembly is one large array that is grown as it passes through the compiler. The comptime assembly is a small array that only ever contains one `__comptime`, `__defmacro`, or macro at one time. Or else it fails.
I think the problem I'm having is that I'm trying to use two instruction generation paradigms in the same generators. I want to chop the comptime assembly into little bits spread out over time, but I want to keep the runtime assembly all together.
I think I should be able to stream assembly to both runtime and comptime but simply not delete the runtime assembly.
Functions bodies are written sequentially in chunks. They are held in a separate assembly array until they are ready to be written.
There are typically a max of three assembly variables at any one time. There is the global assembly, the `assembly` parameter, and the local assembly array. The global assembly is for writing top-level instructions and complete function bodies to. The local assembly array is for holding the function body until it is complete and ready to be written to the global array. The `assembly` parameter seems to be for writing to the body of the current function.
`__defmacro` should write its body to a local array then the global array. I believe it does do this since it calls the `__defun` generator and that's what `__defun` does.
The next problem is that `__defmacro` creates both a function and a macro.
Currently, macros are pushed onto the top of the stack. In other words, they are assumed to be top-level functions. Clearly this is bad when the macro is defined inside another macro. It must be possible for the macro to reference free variables. Oh dear. That's a fexpr. I must forbid fexprs.
I suppose I *could* add fexprs if I add `eval'.
The current system is limited to performing macro definitions in the top-level scope. :( I think macros could even capture free variables in the top-level scope even when those variables are defined in the same `__comptime' form. Another potential macro configuration I could try to make work is a `__defmacro' inside `__defmacro' that doesn't capture any local variables.
An improved system would allow using macros anywhere. This would add fexprs.

)
(comptime
 (var w 5)
 (defmacro f (w)
   (var x 3)
   (defmacro g (y)
     `(list ,x ,y))
   (g w))
 (f w))

(comptime
 (var w 5)
 (defmacro f (w)
   (var x 3)
   (defun g (y)
     `(list ,x ,y))
   (eval (g (quote w))))
 (f w))

(comptime
 (var w 5)
 (defmacro f (w)
   (var x 3)
   (defun g (y)
     `(list ,x ,y))
   ;; Eval must be lexically scoped. 😬
   (eval `(list 3 w)))
 (f w))

(comptime
 (var w 5)
 (defmacro f (w)
   (var x 3)
   (defun g (y)
     `(list ,x ,y))
   (list 3 (quote w)))
 (f w))

(comptime
 (var w 5)
 (defmacro f (w)
   (var x 3)
   (defun g (y)
     `(list ,x ,y))
   (list 3 (quote w)))
 (3 w))

5

Adding this would be more work than I'm willing to put in. I think I will allow macros anywhere in the runtime environment but forbid them in the comptime environment.


A small library takes almost no time to compile, so including the library every time code is entered on the Hidey-Chess REPL should be acceptable.


Callbacks now work nicely with the type system.


Hidey-Chess REPL will work OK now due to illusion. Scripts will be much less wordy as well. The only thing it truly needs now is a stable C API. Fortunately, I think I know how to do this now. It might end up being a lot of work, but it will little thought.


I don't think upvalues should be easily accessible since you can't refer to them by name.


Lisp-2 is a well-tested method to control namespace collisions. How would I sanely implement that?

Add a function namespace to the scope stack. `defun` would write to both that and the variable namespace. `funcall` would call variables. Functions would be called like normal functions. It doesn't seem like that big of a deal to change, but let's defer this until it actually becomes a problem.

Duck-lisp function call environment search order:
scope_getFunctionFromName
if !macro:
    duckLisp_scope_getLocalIndexFromName
    On fail duckLisp_scope_getFreeLocalIndexFromName
if !function:
    scope_getFunctionFromName
    On fail assume global.

What I want it to be:
scope_getFunctionFromName
if !(macro or function):
    duckLisp_scope_getLocalIndexFromName
    if !found:
        fail duckLisp_scope_getFreeLocalIndexFromName
        if !found
            On fail assume global.

And so we have a weird mix of lisp-1 and lisp-2. Macros are in their own namespace. Yaaaaay

Functions and macros should reside in the same namespace. Normal variables will reside in their own namespace (along with functions). This may present a challenge since functions and variables can both be captured in closures.


It seems that garbage collection of full memory causes a stack overflow. This is bad because stack overflows cannot be recovered from like full heaps can. The best solution would be to not allocate memory during traces, but the next best would be to use the heap.


The C FFI sucks. There's no easy way I can make it not suck. It would probably suck even if I tried my best to fix it. I think Lua was able to do it well because it was designed from the start to interoperate with other languages.

Lua stores everything on the stack, not only to provide a nice FFI to non-C languages, but also so the user doesn't have to worry about an object getting collected while he's working with it.


Garbage collection doesn't work well on files since writing to a file can lock it from use by other programs.


There is a commonly recognized set of primitive types that all decent languages have: integers (within a certain range), floats, strings.


dispatch(duckLisp, compileState, generator_name, expression) {
    return scope.lookup(generator_name)(duckLisp, compileState, expression);
}


`var`, `defun`, and `defmacro` in block-scoped languages should not be treated as global keywords. They should be treated as if they are transformed into global keywords by macroexpansion of the block. (block (var x 4) x) → (let ((x 4)) x), (block (+ 4 (var x 3)) x) → (let () (+ 4 (var x 3)) x). Since `var` is not a keyword in that last form, a compile error is raised.

Have `noscope` perform dispatch on `var`, `defun`, `noscope`, and maybe `defmacro`.


Need to copy expression instead of simply passing it to noscope since AST is no longer immutable.


I should probably consider making `__apply' work on vectors.


Requirements:
  Consistent API usage for most objects. User objects may be an exception.
  Objects never ever leave the stack/heap.
Arbitrary requirements to help with the user's mental model:
  Only objects on the stack may be operated on. No setting the nth element of a nested list without first directly linking it to the stack with a "list" object.
  Only the top object of the stack may be operated on by type-specific functions.
  Immutable types will have their value set only when pushed on the stack.
  Mutable types will have their value set to a default value when pushed on the stack. The user may then set that object to a different value.
  Using a function on an object that does not have a matching type will result in an error.


Standard prototype for everything:
  *_get_*(duckVM_t *duckVM, CurrentType *in/out, ...other_parameters);

general
  pop(number_to_pop)
  push(source_stack_index)
  copy(destination_stack_index, source_stack_index)
  typeOf()
  call(stack_index)
bool
  pushBool()
  setBool(boolean)
  copyBool(&boolean)
integer  value
  pushInteger()
  setInteger(integer)
  copyInteger(&integer)
float  value
  pushFloat()
  setFloat(floatingPoint)
  copyFloat(&floatingPoint)
string
  pushString(string, length)
  copyString(&string, &length)
list
  pushNil()
  pushCons()
symbol
  pushSymbol(id, name)
  pushSymbolName()
  pushSymbolId()
closure
  copyClosureName(&name)
  pushClosureBytecode()
  copyClosureArity(&arity)
  copyClosureIsVariadic(&is_variadic)
vector
  pushVector()
type
  pushType()
  pushExistingType(integer)
  copyType(&type)
composite
  pushComposite()
  pushCompositeValue()
  pushCompositeFunction()
  setCompositeValue(stack_index)
  setCompositeFunction(stack_index)
user

sequence
  pushCar()
  pushCdr()
  setCar(stack_index)
  setCdr(stack_index)
  pushElement(sequence_index)
  setElement(stack_index, sequence_index)
  length()

upvalue
upvalueArray


All pretty printing functions should include value, type, and name.


There are several ways to achieve infinite recursion.
 Nest too many parentheses. — Cap at 1000.
 Do infinite recursion in the VM. I'll have to check, but this may result in an OOM error, not a stack overflow.
 Do infinite macro expansion in the compiler. — I think I will instead cap absolute recursion. So every time an expression is compiled the counter will increment. Cap at 1000.


To prevent the user from touching `duckVM_object_t` I have to keep `duckVM_execute` from passing back a return value.

My current code expects a return value to be popped off. Now that will have to be done manually.
Yields don't pop anything off. I think that remains the same.

Doesn't this make yields and halts exactly the same?


Found an annoying and funny bug. I found it in one of the Hidey-Chess DL instances. It wasn't reproducible in duckLisp-dev. I noticed that when I defined a global function and then called an innocent C callback like `print`, the VM would segfault. I went through the typical process of finding the minimum code that would crash and found that the names of the parameters determined whether it crashed or not. "(x y)" was fine. "(r g)" and "(p q)" were not. The dissassembly looked OK. A closure was created, then a global to contain it, then the global was pushed, then `print` was called. The `push-*` and `pop` instructions were balanced. Finally, I printed out the name of the global (as a symbol) just to make sure it looked right. "f→260". Looks fine. I looked back at the disassembly. `global.8 04`. Globals are indexed by their ID, so the argument to `global` should also be 260. But 260 is greater than 255, so it doesn't fit in 8 bits. Its hex representation is 0x104, or 0x04 when coerced to an 8-bit integer. It turns out there's no support in the compiler for globals with symbol IDs over 8 bits. There isn't even an instruction for that. I guess I assumed that it was unlikely that there would be more than 256 globals, with the plan of indexing them by the order they were created in, but later I changed my mind and indexed them by symbol ID and there definitely can be more than 256 symbols. The ID of `print` was 0x03. The value of `f` when I wasn't printing it was 0x103. So `print` would get overwritten with a closure, and when that closure was called as a C function it naturally segfaulted. I think I couldn't reproduce it in duckLisp-dev because it has more C callbacks then Hidey-Chess does, raising the ID of `f` higher than in Hidey-Chess.


Todo:
    Finish parenthesis inference?
        C: Add support for nested types — L
            Alternatively I could rewrite the parser as some sort of dynamic Pratt parser.
    Fix bugs — L
        C/DL: Write lots of tests — M
    Fix error handling? — L
        C: Fix generator error reporting. — L
        C: Fix VM error reporting? — M?
    Optimization? Should probably save for hypothetical DL2.
        C: Hash tables/Tries? — M
        C: Switch to stack VM? — M
        C: String interning? — M
        C: Rewrite memory allocator? — L
    Document — L
        SQUISH: Document C API — M
        SQUISH: Write overview of VM?
    Clean up. — M
    Test on ARM — S
    Test on Windows — M
    Clean up. — S
    Release 1!!!
    Create Hidey-Chess branch. — M

Duck-lisp architecture:
Parse source text into an AST.
Run parenthesis inference.
Expand macros — Most complicated step as it recursively performs other steps here as well.
Compile AST into HLA.
Peephole optimization: Remove redundant stack operations.
Assemble HLA into bytecode.
Peephole optimization: Minimize the size of branch instructions.

Duck-lisp goals:
Be a useful embeddable language.
Make compiler and VM run on nearly any computer architecture or operating system.
Make compiler and VM optionally independent of the standard library.
Minimize bytecode binary sizes.
Maximize extensibility when practical.
Allow the VM to run on a different machine than the compiler.
Demonstrate a free-form version of parenthesis inference.
The C API must not be mind-bogglingly difficult to use.