Skip to content

Implementation

Muqsit Rayyan edited this page Sep 27, 2022 · 15 revisions

Tokens

A token is a substring within that has an influence on the expression it occurs within. For example, in the expression x + y * 2, the tokens are x, +, y, *, and 2. Whitespace is not considered a token as it has no impact on the expression. This method of composing tokens gives the writer the liberty to format an expression as they desire. Below is a table of all tokens within arithmexp.

Token Name Token Symbols
Binary Operator + / ** % * -
Numeric Literal [0-9] .
Parenthesis Left (
Parenthesis Right )
Unary Operator + -
Variable [A-Z] [a-z] [0-9] _

Tokenization

The Scanner class takes the role of constructing tokens from a given expression string (in other words, the Scanner performs tokenization). It consists of an array of token builders such as a numeric literal token builder (which scans for 1, 2.34, etc.), a parenthesis token builder (which scans for (, and )), etc. Tokenization is the very first stage of parsing.

Expression Notation

The parser spends most of its work in converting a given expression from infix notation (x + y) to postfix notation (x y +).

In infix notation, binary operators occur in-between their operands (this method of writing expressions is the one most of us are familiar with). For example, in the infix expression 2 + 3, the operator + occurs in between the left operand 2 and the right operand 3. In postfix notation, binary operators occur after their operands. For example, in the expression 2 3 +, the operator + is placed after the left operand 2 and the right operand 3. This notation is commonly used internally in computing for a few reasons.

In the postfix expression 2 3 add, the operand 2 is evaluated first, followed by the operand 3, finally followed by the operator add which executes the operation of addition. In contrast to the infix notation 2 add 3, or even the prefix notation add(2, 3), the postfix notation accurately describes the order of operations being performed, leaving no room for uncertainty. Furthermore, parentheses, operator precedence, and operator associativity are not required during the evaluation of a postfix expression. This shifts the computational cost of performing such operations to the parser rather than the evaluator. The expression 2 + 3 * 4 in postfix notation with * taking precedence over + can be written as 2 3 4 * +. The expression (2 + 3) * 4 would be 2 3 + 4 * in postfix notation. Postfix converts a binary tree structure such as ((2 * 3) / (4 + 2)) - 7 to a linear array structure 2 3 * 4 2 + / 7 -, allowing low-memory and high-performant evaluation during runtime using stack operations (see stack-oriented programming).

1. Quick Start

1.1. Installation

2. Documentation Notes

2.1. Constant
2.2. Function
2.3. Macro
2.4. Operator
2.5. Variable
2.6. Expression Optimization
2.7. Implementation Internals

Clone this wiki locally