Assembly language is a low-level programming language that is closely related to machine code. It provides a way to write instructions that the CPU can execute directly. This guide covers the basic syntax and grammar of assembly language, focusing on registers, instructions, and a simple example.
An Assembly program typically consists of three sections:
- Data Section: Used to declare initialized and uninitialized data or constants.
- BSS Section: Used to declare variables.
- Text Section: Contains the actual code.
Here is a simple example of an assembly program that writes a message to stdout and then exits:
section .data
msg db 'Hello, world!', 0xA ; The message to print
section .text
global _start
_start:
; Write the message to stdout
mov eax, 4 ; syscall number for sys_write
mov ebx, 1 ; file descriptor 1 is stdout
mov ecx, msg ; pointer to the message
mov edx, 13 ; message length
int 0x80 ; call kernel
; Exit the program
mov eax, 1 ; syscall number for sys_exit
xor ebx, ebx ; exit code 0
int 0x80 ; call kernel
Registers are small, fast storage locations within the CPU used to hold data temporarily during execution. Common general-purpose registers include:
eax
: Accumulator registerebx
: Base registerecx
: Counter registeredx
: Data register
Other types of registers include:
esi
andedi
: Source and destination index registersebp
andesp
: Base pointer and stack pointer registerseip
: Instruction pointer register
Instructions are the commands that tell the CPU what operations to perform. Each instruction typically consists of an opcode (operation code) and operands (data or addresses). Common instructions include:
mov
: Move data from one location to anotheradd
: Add two valuessub
: Subtract one value from anotherint
: Interrupt, used to make system calls
The general format of an assembly instruction is:
opcode destination, source
mov eax, 1 ; Move the value 1 into the eax register
add eax, 2 ; Add the value 2 to the eax register
sub eax, 1 ; Subtract the value 1 from the eax register
int 0x80 ; Make a system call
Memory addressing in assembly language refers to the way in which the location of data is specified. There are several types of memory addressing modes:
-
Immediate Addressing: The operand is a constant value.
mov eax, 10 ; Move the constant value 10 into the eax register
-
Register Addressing: The operand is a register.
mov eax, ebx ; Move the value in the ebx register into the eax register
-
Direct Addressing: The operand is a memory address.
mov eax, [0x1234] ; Move the value at memory address 0x1234 into the eax register
-
Indirect Addressing: The operand is a memory address stored in a register.
mov eax, [ebx] ; Move the value at the memory address stored in the ebx register into the eax register
-
Indexed Addressing: The operand is a memory address calculated using a base address and an index.
mov eax, [ebx + esi] ; Move the value at the memory address (ebx + esi) into the eax register
-
Base-Indexed Addressing: The operand is a memory address calculated using a base address, an index, and an optional displacement.
mov eax, [ebx + esi + 4] ; Move the value at the memory address (ebx + esi + 4) into the eax register
Control flow instructions in assembly language determine the order in which instructions are executed. Common control flow instructions include:
-
Jump Instructions: Used to transfer control to another part of the program.
jmp
: Unconditional jump.je
/jz
: Jump if equal / zero.jne
/jnz
: Jump if not equal / not zero.jg
/jnle
: Jump if greater / not less or equal.jl
/jnge
: Jump if less / not greater or equal.
jmp label ; Unconditional jump to 'label' je equal_label ; Jump to 'equal_label' if zero flag is set jne not_equal_label ; Jump to 'not_equal_label' if zero flag is not set
-
Loop Instructions: Used to repeat a block of code a certain number of times.
loop
: Decrementecx
and jump ifecx
is not zero.
mov ecx, 10 ; Set loop counter to 10 loop_start: ; Code to repeat loop loop_start ; Decrement ecx and jump to 'loop_start' if ecx is not zero
-
Call and Return Instructions: Used to call and return from procedures.
call
: Call a procedure.ret
: Return from a procedure.
call procedure ; Call 'procedure' ; ... procedure: ; Procedure code ret ; Return from 'procedure'
Assembly language supports various data types and directives to define and manipulate data. Common data types include:
db
(Define Byte): Defines a byte (8 bits) of data.dw
(Define Word): Defines a word (16 bits) of data.dd
(Define Double Word): Defines a double word (32 bits) of data.dq
(Define Quad Word): Defines a quad word (64 bits) of data.
section .data
byteVar db 0x1 ; Define a byte variable
wordVar dw 0x1234 ; Define a word variable
dwordVar dd 0x12345678 ; Define a double word variable
qwordVar dq 0x123456789ABCDEF0 ; Define a quad word variable
Directives are commands that provide instructions to the assembler. Common directives include:
section
: Defines a section of the program (e.g.,.data
,.bss
,.text
).global
: Makes a symbol available to the linker.extern
: Declares an external symbol.equ
: Defines a constant value.
section .data
msg db 'Hello, world!', 0xA ; Define a message
section .bss
buffer resb 64 ; Reserve 64 bytes for a buffer
section .text
global _start
_start:
; Code to execute
Input and output operations in assembly language are typically performed using system calls. These system calls interact with the operating system to read from input devices or write to output devices.
To write to standard output (stdout), you can use the sys_write
system call. The following example writes a message to stdout:
section .data
msg db 'Hello, world!', 0xA ; The message to print
section .text
global _start
_start:
mov eax, 4 ; syscall number for sys_write
mov ebx, 1 ; file descriptor 1 is stdout
mov ecx, msg ; pointer to the message
mov edx, 13 ; message length
int 0x80 ; call kernel
mov eax, 1 ; syscall number for sys_exit
xor ebx, ebx ; exit code 0
int 0x80 ; call kernel
To read from standard input (stdin), you can use the sys_read
system call. The following example reads input from stdin into a buffer:
section .bss
buffer resb 64 ; Reserve 64 bytes for the buffer
section .text
global _start
_start:
mov eax, 3 ; syscall number for sys_read
mov ebx, 0 ; file descriptor 0 is stdin
mov ecx, buffer ; pointer to the buffer
mov edx, 64 ; number of bytes to read
int 0x80 ; call kernel
mov eax, 1 ; syscall number for sys_exit
xor ebx, ebx ; exit code 0
int 0x80 ; call kernel
These examples demonstrate basic input and output operations using system calls in assembly language.
Comments in assembly language are used to annotate the code and make it more understandable. They are ignored by the assembler and do not affect the execution of the program. In most assembly languages, comments are denoted by a semicolon (;
).
Single-line comments start with a semicolon and continue to the end of the line.
mov eax, 1 ; Move the value 1 into the eax register
While assembly language does not have a specific syntax for multi-line comments, you can achieve this by using multiple single-line comments.
; This is a multi-line comment
; explaining the following block
; of code.
mov eax, 1
mov ebx, 2
add eax, ebx
Using comments effectively can greatly improve the readability and maintainability of your assembly code.