Find here:
https://github.com/vladcc/shawk/tree/main/shawk/awk/rdpg
# rdpg
Compiler-compiler in awk
rdpg is a compiler-compiler in awk. It takes a context free grammar
representation and translates that to a LL(1) recursive descent parser for a
target language. This translation consists of two or three stages, each of
which is represented as a separate awk script: naive translation, optimization,
and target language generation. The naive translation and optimization steps
output an intermediate representation. In the final stage this ir is translated
to the target language. The output of each stage is plain text which becomes the
input of the next stage. Note that you can write a backend for any language in
any language, as long as that backend understands the ir, and you'd still be
able to use rdpg and the optimizer at the front of the pipeline.
As a very short example:
$ cd rdpg/
$ head examples/infix_calc_grammar.txt
# start symbol
rule parse
defn statements
end
rule statements
defn statement eoi
defn statement statements
end
$ awk -f rdpg_ir.awk -f rdpg.awk examples/infix_calc_grammar.txt | head -n 25 | cat -n
1 comment generated by rdpg.awk 1.11
2 func parse
3 block_open parse_1
4 comment rule parse
5 comment defn statements
6 call tok_next
7 if call statements
8 block_open parse_2
9 return true
10 block_close parse_2
11 else
12 block_open parse_2
13 return false
14 block_close parse_2
15 block_close parse_1
16 func_end
17 func statements
18 block_open statements_1
19 comment rule statements
20 comment defn statement eoi
21 comment defn statement statements
22 if call statement
23 block_open statements_2
24 if call eoi
25 block_open statements_3
A complete run with maximum optimization and translation to awk will look like:
$ awk -f rdpg_ir.awk -f rdpg.awk <input-file> | \
> awk -f rdpg_ir.awk -f rdpg-opt.awk -vOlvl=5 | \
> awk -f rdpg_ir.awk -f rdpg-to-awk.awk
E.g.:
$ awk -f rdpg_ir.awk -f rdpg.awk examples/infix_calc_grammar.txt | awk -f rdpg_ir.awk -f rdpg-opt.awk -vOlvl=5 | awk -f rdpg_ir.awk -f rdpg-to-awk.awk | head -n 25 | cat -n
1 # <definitions>
2 # translated by rdpg-to-awk.awk 1.01
3 # generated by rdpg.awk 1.11
4 # optimized by rdpg-opt.awk 1.1 Olvl=5
5 function parse( _arr) {
6 # rule parse
7 # defn statements
8 tok_next()
9 while (1) {
10 if (statement()) {
11 if (eoi()) {
12 return 1
13 } else {
14 continue
15 }
16 }
17 return 0
18 }
19 }
20 function eoi( _arr) {
21 # rule eoi?
22 # defn EOI
23 if (tok_match(EOI())) {
24 tok_next()
25 return 1
Each stage needs to know about the ir which is defined in rdpg_ir.awk, so it
becomes somewhat cumbersome to invoke. The optimization step can be skipped, but
then you'd end up with a naive parser, which will use a lot of stack space
because of needless recursion.
$ awk -f rdpg_ir.awk -f rdpg.awk examples/infix_calc_grammar.txt | awk -f rdpg_ir.awk -f rdpg-to-awk.awk | head -n 25 | cat -n
1 # <definitions>
2 # translated by rdpg-to-awk.awk 1.01
3 # generated by rdpg.awk 1.11
4 function parse( _arr) {
5 # rule parse
6 # defn statements
7 tok_next()
8 if (statements()) {
9 return 1
10 } else {
11 return 0
12 }
13 }
14 function statements( _arr) {
15 # rule statements
16 # defn statement eoi
17 # defn statement statements
18 if (statement()) {
19 if (eoi()) {
20 return 1
21 } else if (statements()) {
22 return 1
23 } else {
24 return 0
25 }
Any of the rdpg*.awk scripts except for rdpg_ir.awk can be invoked with the
-vHelp=1 flag for more information on what they do.
Below is a breakdown of the project structure.
1. rdpg:
rdpg.awk - the main translator
rdpg_ir.awk - the ir definition, i.e. string constants
rdpg-opt.awk - the optimizer
rdpg-to-awk.awk - translates ir to awk
rdpg-to-c.awk - translates ir to C
2. The rdpg parser:
rdpg.awk generates ir from a CFG, but still needs to parse the CFG
representation. To keep things simple, I've chosen to use a line oriented state
machine parser. Gets the job done, it's trivial to implement in awk and is easy
to generate. Also, it maps pretty well to how CFGs are usually written down.
gen-rdpg/scriptscript.awk - this script generates the parser part of rdpg.awk
gen-rdpg/rdpg-rules.txt - the input to scriptscript.awk
3. The interesting parts of the examples directory:
examples/infix_calc_grammar.txt - represents a context free grammar in rdpg
syntax for the venerable infix expression calculator example. Supports
exponentiation, in order to demonstrate right associativity, and a basic
on-error token synchronization.
examples/test_input.txt - this file is given to the generated parsers during
their test runs. The output of each parser must match examples/accept_result.txt
examples/awk - here you can find one awk source file per rdpg optimization level
examples/c - same as examples/awk but in C
4. Testing:
tests/run-tests.sh - the entry point for all tests. Exercises the whole
pipeline. Starts with basic rdpg.awk functionality, then moves to optimization,
and ends with the generation of one parser per optimization level per target
language. These parsers are then run with examples/test_input.txt as input and
are actually the files from the examples/awk and examples/c directories.
Correctness is confirmed by diff-ing the current output with the expected one.
run-tests.sh can be run from any directory. By default, if all tests pass it
doesn't say anything and prints information only when something goes wrong. If
you give it an argument, however, it will print the commands for each step it
executes. E.g.:
$ bash tests/run-tests.sh
$
$ bash tests/run-tests.sh x | head
diff <(awk -f ../rdpg.awk -vVersion=1) <(echo 'rdpg.awk 1.11')
[ 0 -eq 0 ]
diff <(awk -f ../rdpg-opt.awk -vVersion=1) <(echo 'rdpg-opt.awk 1.1')
[ 0 -eq 0 ]
diff <(awk -f ../rdpg-to-c.awk -vVersion=1) <(echo 'rdpg-to-c.awk 1.01')
[ 0 -eq 0 ]
diff <(awk -f ../rdpg-to-awk.awk -vVersion=1) <(echo 'rdpg-to-awk.awk 1.01')
[ 0 -eq 0 ]
diff <(awk -f ../rdpg_ir.awk -f ../rdpg.awk ./test_rdpg/test_rdpg_opt_olvl3_inf_loop.txt | awk -f ../rdpg_ir.awk -f ../rdpg-opt.awk -vOlvl=3) ./test_rdpg/accept_rdpg_opt_olvl3_inf_loop.txt
[ 0 -eq 0 ]