-
Notifications
You must be signed in to change notification settings - Fork 1
A virtual architecture, with its own instructionset, and a dedicated assembler with enhanced features.
License
Obi-Wan/vARCH
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Index
1. COMPILE / INSTALL
2. WHAT IS IT
3. ASSEMBLY (two examples)
3.1 BRIEF DESCRIPTION
3.2 SUBROUTINES
3.3 CONSTANTS
3.4 PREPROCESSOR
4. LATEST DEVELOPMENTS
4.1 TEMORARIES / Registers Auto-allocation
4.2 CALLING CONVENTIONS
4.2 ELF Object FILES
4.4 SAMPLE FILES
--------------------------------------------------------------------------------
1. COMPILE / INSTALL
Before trying to compile it, remember to run autoreconf.
It will then be enough to type:
./configure
make
I'm not providing a predefined way to install right now. It's more like a toy,
than a tool.
--------------------------------------------------------------------------------
2. WHAT IS IT
vARCH is a virtual machine / interpreter of bytecode for a virtual architecture
that I invented for learning purposes.
It is not intended for performance or flexibility use cases. It's just a simple
and easy to learn architecture.
For the ease of development I also created a simple assembler.
Explanation on how to write the asm code is aided by some practical examples.
If you want to contribute to vARCH, you should ask for documentation/answers to
me directly, because it is a spare time project and I don't have much time for
documentation.
What I will report here are samples of the Asm language: they need to be
assembled and then either moved to the name "bios.bin" or soft linked to that
name.
--------------------------------------------------------------------------------
3. ASSEMBLY
The first example is a simple program for calculating the first n factorials.
The "main" function is special in the sense that is always put at the beginning
of the generated executable.
----------------
biosFactorial.s
----------------
; calculate the factorial by definition of the first 5 numbers
.function "main"
.init:
MOV, $1 , %R8
.start:
; the counter is post incremented after the copy
MOV, %R8+ , %R1
MOV, $1 , %R2
.iter:
MULT, %R1- , %R2
MO, %R1 , $1
IFNJ, @save
JMP, @iter
.save:
PUSH, %R2
MO, %R8 , .maxnum
IFJ, @end
JMP, @start
.end:
HALT
.end
.global
.maxnum:
.i32_t $12
.end
--------------------------------
Another simple program that shows how to give shape to subroutines.
It also exploits a missing feature to ease the work of outputting text: it
should launch the signal to the peripheral and than wait for the ready state,
before sending another character to display, but it's just pretending that a
terminal can work synchronously with the cpu.
----------------
biosTestAssembler.s
----------------
#include "std_conversion.s"
.function "main"
.init:
MOV, @string1 , %T001
MOV, 4 , %R6
; call the printing subroutine
JSR, @print , %T001
; call recursive subroutine
MOV, 4 , %T002
MOV, %T002 , %T012
MOV, %T012 , %T013
MOV, %T013 , %T014
JSR, @recursive , %T014
; call conversion subroutine
MOV, @bufferEnd , %T003
SUB, @buffer , %T003
MOV, 15234 , %T004
MOV, @buffer , %T005
JSR, @integerToString , %T004 , %T005 , %T003
; Verify result
MOV, %R0 , %T008
EQ, %T008 , 0
IFJ, @error
MOV, @buffer , %T007
JSR, @print , %T007
HALT
.error:
MOV, @stringError , %T001
JSR, @print , %T001
HALT
.end
.global
; string to write
.string1:
.string "test: yeah"
.i8_t 10 ; '\n'
.i8_t 0
.stringError:
.string "Error"
.i8_t 0
.end
.function "recursive"
.param %R1 %T001
.local
.decrement: .const
.i32_t 2
.numberOfCalls: .static
.i32_t 0
.lowerBound:
.i32_t 0
.stateRegister:
.i32_t 0
.localString:
.string "local_string"
.i8_t 0
.end
SUB, .decrement , %T001
INCR, .numberOfCalls
LO, %T001 , .lowerBound
IFJ, @exit
JSR, @recursive , %T001
.exit:
MOV, %SR , .stateRegister
RET
.end
.global
.uselessGlobal:
.i32_t $4
.end
----------------
std_io.s
----------------
; subroutine to call
.function "print"
.param %R1 %T001 ; address of the string to print
.local
.printCmd: .const
.i32_t 131072
.printCmd1: .const
.i32_t 131073
.endChar: .const
.i16_t 0
.end
MOV, 0 , %T002
.test:
EQ, (%T001) : .i8_t, .endChar : .i16_t
IFJ, @exit
PUT, (%T001)+ : .i8_t, .printCmd1
INCR, %T002
JMP, @test
.exit:
RET, %T002
.end
-----------------------------------
BRIEF DESCRIPTION OF EXAMPLES:
There are 8 data registers R1, R2, .. , R8, and 8 general purpose address
registers A1, A2, .. , A8. Then a specific register for the stack pointer SP,
and USP for the priviledged executor that wants to access unprivileged stack
pointer. (Issuing SP when privileged, an issuing SP when unprivileged gives two
different stack pointers)
There is another kind of registers, Tn, where n is any number, which is the set
of temporaries, which are later assigned and optimized by the register
allocation logics.
As noted before, there are two kinds of execution: privileged and unprivileged.
The bit identifying this is in the status register ( SR ). This was inspired by
Motorola 68k family.
Accessing data and registers is achieved in many ways with different meanings:
Addressing registers and memory is again very similar to the way it's done for
M68k asm, so the arguments of the instructions can be:
- IMMEDIATE: prefixing constants with $ which uses those as numerical constants,
or using @ (at) prefixed labels which accesses the pointer to the label.
- REGISTER: prefixing registers with % (eg. %R1, %A3, %SP ..) which accessed the
content of the registers.
- DIRECT: where the explicit address of a memory location is used, either using
. (dot) prefixed labels which accesses the content of the address in memory
marked by the label, or prefixing constants with > which uses them as addresses
(and addresses the memory address pointed by the constant).
- REGISTER INDIRECT: surrounding registers with round parenthesis ( and ), (e.g.
(A1), (SP) ..) which accesses the address in memory, pointed by the content of
the register.
- REGISTER MEMORY INDIRECT: it accesses memory using the address contained in a
location in memory pointed by an address into a register, and this is achieved
by the double parenthesis operator: ((A1)), ((SP)), etc.
- DISPLACED: applies a displacement to the address contained into a register and
accesses to the result, e.g. -20(SP), 3(A1), etc. Displacement is a 24bit signed
integer.
- INDEXED: uses one register as base address to a given location, and then
another register as index for accessing elements in an array, e.g. (A1)[R1]
- DISPLACED+INDEXED: same as the two together, but displacement in this case is
a 16bit signed integer.
There are then other useful methods for reducing code size and speeding up
execution with a compact code:
- prefixing (or postfixing) registers with + or - which in turn pre(post)
increments or decrements the accessed data:
+ if prefixing with %Reg# it will modify the content of the register
+ if prefixing with (Reg#) it will still modify the register content (and not
of the pointed data)
+ if doing this on (address), data pointed by the address will be modified
------------------
Some words about SUBROUTINES:
Subroutines have a simple but strict syntax. You have to specify subroutine name
after the .function marker, and identify the end of the subroutine using the
.end marker.
The name of the subroutine needs to be enclosed in double quotes, like a normal
string.
There is no possibility to use free code, outside of subroutines, since the main
subroutine serves as entry point for the binary.
Subroutines, when assembled do export their name as a global label and they can
be accessed through . and @ semantics.
To call a subroutine just issue a
JSR @name_of_subroutine [arguments...]
Since JSR stands for "Jump to SubRoutine".
The subroutine calling conventions are managed by the assembler. This gives the
freedom to the programmer to specify the arguments (expressed as temporaries) on
the very same line of the function call.
Functions and subroutines can also be recursive and have local variables which
are allocated on the stack each time that the functions are called.
The stack allocation is automatically managed by the assembler.
------------------
Some words about VARIABLES, CONSTANTS and their attributes:
The .global marker opens the area where global/public constants should be
located. This area is terminated with an .end marker.
Variables or constants, are there visible from everywhere.
It's possible to specify local/private constants for function calls only after
the marker .local which is terminated by the .end marker, just like
functions and globals.
To specify access to these memory locations, it is also possible to use the
.static specifier, which behaves exactly like in C.
To optimize things, .const declared memory locations, are allocated like
static variables, to prevent stack allocations at every function call.
Local variables on the stack are supported just with the register
auto-allocation logic, because it requires some code transformation.
Labels make data visible and easily callable from outside.
Finally labels can also receive attributes like .size and .num .
While .num is not fully used now (tells how many times .size is repeated),
and will extensively used for arrays, .size gives the size of the size, in
bytes, of the data reached through the label.
------------------
Some words about PREPROCESSOR:
As can be seen from the example, it is possible to include code from other files
and compile it as if it was part of the current source.
If any error happens in the included files, it will be reported in which file
and at which line the error took place.
------------------
4. ADDITIONAL FEATURES
4.1 - TEMPORARIES
There is, as noted before, a new kind of registers: T[0-9]* which serve as
temporaries for successive elaboration by the allocation logics.
Avoiding explicit register use, for temporaries, frees the programmer from one
of the most bothering tasks, letting the assembler decide which register
allocation to use.
For maximum performance, the programmer can still use direct register assignment
when needed, while letting the assembler manage all the rest.
4.2 - CALLING CONVENTIONS
The arguments of a function are explicitly defined using the ".param" keyword.
While the returned (if returned) element is always in register R1.
Caller-save registers are R[1-5] and A[1-5], while callee-save registers are
R[6-8] and A[6-8].
4.3 - WRITING ELF OBJECT FILES
Compiling with "-c" flag now outputs an object file which respects the ELF file
format. This object file can then be linked against other object files, to
generate proper executables, using the very same syntax of many other famous
compilers.
4.4 - SAMPLE FILES
Some sample .s files are in the main folder:
testNewAssembler.s
std_io.s
std_conversion.s
The user can have a look to these examples.
------------------
If you feel this description incomplete, let me know.
I hope you will enjoy this playground for learning.
About
A virtual architecture, with its own instructionset, and a dedicated assembler with enhanced features.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published