Conversation
|
Hold on, let me force push formatting pushes to the second commit. Forgot that |
src/lparser.c
Outdated
| * `PrintCode()` in `luac.c`. Note that we have to deduct one because luac | ||
| * prints bytecode pcs starting from 1. | ||
| */ | ||
| int loop_pc = INT_MAX; |
There was a problem hiding this comment.
How about something like:
int loop_pc;
if ((GET_OPCODE(i) == OP_JMP) && (GETARG_sJ(i) < 0)) {
loop_pc = GETARG_sJ(i) + pc + 2 - 1;
} else if (GET_OPCODE(i) == OP_FORLOOP) {
loop_pc = pc - GETARG_Bx(i) + 2 - 1;
} else { return; }
f->yklocs[loop_pc] = yk_location_new();to avoid the INT_MAX edge case.
There was a problem hiding this comment.
Actually why not just:
if ((GET_OPCODE(i) == OP_JMP) && (GETARG_sJ(i) < 0)) {
f->kylocs[GETARG_sJ(i) + pc + 2 - 1] = yklocation_new();
} else if (GET_OPCODE(i) == OP_FORLOOP) {
f->yklocs[pc - GETARG_Bx(i) + 2 - 1 = yk_location_new();
}There was a problem hiding this comment.
That's fine if we don't want the assert.
There was a problem hiding this comment.
The INT_MAX aspect is probably incorrect on some platforms, so foregoing the assert seems OK.
There was a problem hiding this comment.
Hopefully f0b504b is a decent compromise.
I considered doing 2 loops:
- one to initialise every pc with a null location
- a second to overwrite nulls with non-null locations where necessary
This would be more fool-proof, but less performant.
|
I am not too worried by the slowdowns: that was kind of inevitable. List has always been on the edge of timeout. |
I see. FWIW, the benchmark does a lot of recursion. My guess is that we don't handle that well, but that's one for another day. |
|
Agreed. Let's get this in. |
|
Works for me! Please squash. |
Example output:
```
$ ./src/luac -l while_loop.lua
main <while_loop.lua:0,0> (11 instructions at 0x62c51d0)
0+ params, 3 slots, 1 upvalue, 1 local, 1 constant, 0 functions
1 [1] VARARGPREP 0
2 [1] LOADI 0 300
ykloc: 3 [2] EQI 0 0 1
4 [2] JMP 6 ; to 11
5 [3] ADDI 0 0 -1
6 [3] MMBINI 0 1 7 0 ; __sub
7 [4] GETTABUP 1 0 0 ; _ENV "print"
8 [4] MOVE 2 0
ykloc: 9 [4] CALL 1 2 1 ; 1 in 0 out
10 [4] JMP -8 ; to 3
11 [5] RETURN 1 1 1 ; 0 out
```
(note that the locations have been added in the wrong place, and this
change is to help me fix that)
The placement of YkLocations was based upon incomplete bytecode
instructions, leading to nonsense placement of locations.
The first commit adjusts `luac` to make it show location placement in
its output. The second commit fixes the placement.
Examples follow. Pay attendtion to the "to X" annotations that luac
prints on the ends of lines that may jump.
for loops
=========
```lua
for i = 0, 300 do
print(i)
end
```
before
------
```
main <for_loop.lua:0,0> (10 instructions at 0x10b181d0)
0+ params, 6 slots, 1 upvalue, 4 locals, 1 constant, 0 functions
1 [1] VARARGPREP 0
2 [1] LOADI 0 0
3 [1] LOADI 1 300
4 [1] LOADI 2 1
5 [1] FORPREP 0 3 ; exit to 10
6 [2] GETTABUP 4 0 0 ; _ENV "print"
ykloc: 7 [2] MOVE 5 3
8 [2] CALL 4 2 1 ; 1 in 0 out
9 [1] FORLOOP 0 4 ; to 6
10 [3] RETURN 0 1 1 ; 0 out
```
after
-----
```
main <for_loop.lua:0,0> (10 instructions at 0x1995d1d0)
0+ params, 6 slots, 1 upvalue, 4 locals, 1 constant, 0 functions
1 [1] VARARGPREP 0
2 [1] LOADI 0 0
3 [1] LOADI 1 300
4 [1] LOADI 2 1
5 [1] FORPREP 0 3 ; exit to 10
ykloc: 6 [2] GETTABUP 4 0 0 ; _ENV "print"
7 [2] MOVE 5 3
8 [2] CALL 4 2 1 ; 1 in 0 out
9 [1] FORLOOP 0 4 ; to 6
10 [3] RETURN 0 1 1 ; 0 out
```
while loops
===========
```lua
local i = 300
while i ~= 0 do
i = i - 1
print(i)
end
```
before
------
```
main <while_loop.lua:0,0> (11 instructions at 0x3837c1d0)
0+ params, 3 slots, 1 upvalue, 1 local, 1 constant, 0 functions
1 [1] VARARGPREP 0
2 [1] LOADI 0 300
ykloc: 3 [2] EQI 0 0 1
4 [2] JMP 6 ; to 11
5 [3] ADDI 0 0 -1
6 [3] MMBINI 0 1 7 0 ; __sub
7 [4] GETTABUP 1 0 0 ; _ENV "print"
8 [4] MOVE 2 0
ykloc: 9 [4] CALL 1 2 1 ; 1 in 0 out
10 [4] JMP -8 ; to 3
11 [5] RETURN 1 1 1 ; 0 out
```
after
-----
```
main <while_loop.lua:0,0> (11 instructions at 0x3bd111d0)
0+ params, 3 slots, 1 upvalue, 1 local, 1 constant, 0 functions
1 [1] VARARGPREP 0
2 [1] LOADI 0 300
ykloc: 3 [2] EQI 0 0 1
4 [2] JMP 6 ; to 11
5 [3] ADDI 0 0 -1
6 [3] MMBINI 0 1 7 0 ; __sub
7 [4] GETTABUP 1 0 0 ; _ENV "print"
8 [4] MOVE 2 0
9 [4] CALL 1 2 1 ; 1 in 0 out
10 [4] JMP -8 ; to 3
11 [5] RETURN 1 1 1 ; 0 out
```
nested loops
============
```lua
local i = 300
while i ~= 0 do
i = i - 1
for i = 0, 10 do
print(i)
end
end
```
before
------
```
main <nested_loops.lua:0,0> (16 instructions at 0x206921a0)
0+ params, 7 slots, 1 upvalue, 5 locals, 1 constant, 0 functions
1 [1] VARARGPREP 0
2 [1] LOADI 0 300
ykloc: 3 [2] EQI 0 0 1
4 [2] JMP 11 ; to 16
5 [3] ADDI 0 0 -1
6 [3] MMBINI 0 1 7 0 ; __sub
7 [4] LOADI 1 0
8 [4] LOADI 2 10
9 [4] LOADI 3 1
10 [4] FORPREP 1 3 ; exit to 15
11 [5] GETTABUP 5 0 0 ; _ENV "print"
ykloc: 12 [5] MOVE 6 4
13 [5] CALL 5 2 1 ; 1 in 0 out
ykloc: 14 [4] FORLOOP 1 4 ; to 11
15 [6] JMP -13 ; to 3
16 [7] RETURN 1 1 1 ; 0 out
```
after
-----
```
main <nested_loops.lua:0,0> (16 instructions at 0x132f21a0)
0+ params, 7 slots, 1 upvalue, 5 locals, 1 constant, 0 functions
1 [1] VARARGPREP 0
2 [1] LOADI 0 300
ykloc: 3 [2] EQI 0 0 1
4 [2] JMP 11 ; to 16
5 [3] ADDI 0 0 -1
6 [3] MMBINI 0 1 7 0 ; __sub
7 [4] LOADI 1 0
8 [4] LOADI 2 10
9 [4] LOADI 3 1
10 [4] FORPREP 1 3 ; exit to 15
ykloc: 11 [5] GETTABUP 5 0 0 ; _ENV "print"
12 [5] MOVE 6 4
13 [5] CALL 5 2 1 ; 1 in 0 out
14 [4] FORLOOP 1 4 ; to 11
15 [6] JMP -13 ; to 3
16 [7] RETURN 1 1 1 ; 0 out
```
Most benchmarks speed up, in many cases by a lot. For example:
- Richards is about 4x faster than before.
- Bounce, CD, Json and Permute are about 2x faster than before.
Outliers that slow down as a result of this change are:
- NBody is about 85% slower.
- List times out when run under rebench.
Raw data follows:
```
BigLoop Lua yk 1000000000 6 6628
BigLoop YkLuaAfter yk 1000000000 6 2566
BigLoop YkLuaBefore yk 1000000000 6 2567
YkLuaAfter/YkLuaBefore 1.0
Bounce Lua awfy 1500 6 1397
Bounce YkLuaAfter awfy 1500 6 2607
Bounce YkLuaBefore awfy 1500 6 4663
YkLuaAfter/YkLuaBefore 0.56
CD Lua awfy 250 6 3391
CD YkLuaAfter awfy 250 6 18852
CD YkLuaBefore awfy 250 6 41803
YkLuaAfter/YkLuaBefore 0.45
DeltaBlue Lua awfy 12000 6 935
DeltaBlue YkLuaAfter awfy 12000 6 5379
DeltaBlue YkLuaBefore awfy 12000 6 6906
YkLuaAfter/YkLuaBefore 0.78
Havlak Lua awfy 1500 6 7693
Havlak YkLuaAfter awfy 1500 6 26934
Havlak YkLuaBefore awfy 1500 6 32157
YkLuaAfter/YkLuaBefore 0.83
Json Lua awfy 100 6 1379
Json YkLuaAfter awfy 100 6 7049
Json YkLuaBefore awfy 100 6 12444
YkLuaAfter/YkLuaBefore 0.56
List Lua awfy 1500 6 1090
List YkLuaAfter awfy 1500 0 Failed
List YkLuaBefore awfy 1500 6 10970
YkLuaAfter/YkLuaBefore timeout
Mandelbrot Lua awfy 500 6 446
Mandelbrot YkLuaAfter awfy 500 6 242
Mandelbrot YkLuaBefore awfy 500 6 312
YkLuaAfter/YkLuaBefore 0.77
NBody Lua awfy 250000 6 1045
NBody YkLuaAfter awfy 250000 6 2203
NBody YkLuaBefore awfy 250000 6 1185
YkLuaAfter/YkLuaBefore 1.85
Permute Lua awfy 1000 6 1302
Permute YkLuaAfter awfy 1000 6 1630
Permute YkLuaBefore awfy 1000 6 2947
YkLuaAfter/YkLuaBefore 0.55
Queens Lua awfy 1000 6 925
Queens YkLuaAfter awfy 1000 6 1466
Queens YkLuaBefore awfy 1000 6 5858
YkLuaAfter/YkLuaBefore 0.25
Richards Lua awfy 100 6 5276
Richards YkLuaAfter awfy 100 6 11747
Richards YkLuaBefore awfy 100 6 49269
YkLuaAfter/YkLuaBefore 0.24
Sieve Lua awfy 3000 6 1056
Sieve YkLuaAfter awfy 3000 6 3959
Sieve YkLuaBefore awfy 3000 6 6108
YkLuaAfter/YkLuaBefore 0.64
Storage Lua awfy 1000 6 2462
Storage YkLuaAfter awfy 1000 6 4341
Storage YkLuaBefore awfy 1000 6 4681
YkLuaAfter/YkLuaBefore 0.93
Towers Lua awfy 600 6 1487
Towers YkLuaAfter awfy 600 6 15125
Towers YkLuaBefore awfy 600 6 15133
YkLuaAfter/YkLuaBefore 1.0
```
|
Squashed, but note this needs ykjit/yk#1670 first. |
See commits.
Requires: ykjit/yk#1670