Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: corruption after wasm stack overflow RangeError #70829

Open
rsc opened this issue Dec 13, 2024 · 8 comments · May be fixed by #70856
Open

runtime: corruption after wasm stack overflow RangeError #70829

rsc opened this issue Dec 13, 2024 · 8 comments · May be fixed by #70856
Labels
arch-wasm WebAssembly issues compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@rsc
Copy link
Contributor

rsc commented Dec 13, 2024

Open https://swtch.com/tmp/wasmbug/ in Chrome.
Open the developer console (right click on page, Inspect, then click the Console tab).
It should say "Go starting".
Now click Run.
You should see an "Uncaught RangeError: Maximum call stack size exceeded":

0133597e:0x6304a Uncaught RangeError: Maximum call stack size exceeded
    at runtime.__mspan_.init (0133597e:0x6304a)
    at runtime.__mheap_.initSpan (0133597e:0x612a5)
    at runtime.__mheap_.allocSpan (0133597e:0x60bf7)
    at runtime.__mheap_.alloc.func1 (0133597e:0x5f504)
    at runtime.systemstack (0133597e:0xfe0bd)
    at runtime.__mheap_.alloc (0133597e:0x5f361)
    at runtime.__mcentral_.grow (0133597e:0x28d09)
    at runtime.__mcentral_.cacheSpan (0133597e:0x28161)
    at runtime.__mcache_.refill (0133597e:0x26ca7)
    at runtime.__mcache_.nextFree (0133597e:0x1c94b)

I'm a little surprised that it shows this stack and not a stack inside user code if the stack is too big. Perhaps this is the "m" stack and it is not supposed to be overflowing?

I am not sure whether Go is expected to continue at this point, but it seems to still be running. Click Run again.

Now you should see a fatal error like this:

fatal error: bad sweepgen in refill
wasm_exec.js:22 
runtime stack:
wasm_exec.js:22 runtime.throw({0x46795, 0x16})
wasm_exec.js:22 	/Users/rsc/go/src/runtime/panic.go:1099 +0x3 fp=0x24eec8 sp=0x24eea0 pc=0x143d0003
wasm_exec.js:22 runtime.(*mcache).refill(0x250108, 0x5)
wasm_exec.js:22 	/Users/rsc/go/src/runtime/mcache.go:158 +0x29 fp=0x24ef18 sp=0x24eec8 pc=0x10b50029
wasm_exec.js:22 runtime.(*mcache).nextFree(0x250108, 0x5)
wasm_exec.js:22 	/Users/rsc/go/src/runtime/malloc.go:964 +0xa fp=0x24ef50 sp=0x24ef18 pc=0x1088000a
wasm_exec.js:22 runtime.mallocgcTiny(0x8, 0x19840, 0x1)
wasm_exec.js:22 	/Users/rsc/go/src/runtime/malloc.go:1175 +0x48 fp=0x24efa8 sp=0x24ef50 pc=0x10890048
wasm_exec.js:22 runtime.mallocgc(0x8, 0x19840, 0x1)
wasm_exec.js:22 	/Users/rsc/go/src/runtime/malloc.go:1053 +0x15 fp=0x24efe0 sp=0x24efa8 pc=0x141e0015
wasm_exec.js:22 runtime.newobject(0x19840)
wasm_exec.js:22 	/Users/rsc/go/src/runtime/malloc.go:1714 +0x4 fp=0x24f008 sp=0x24efe0 pc=0x10920004
wasm_exec.js:22 syscall/js.makeValue(0x7ff8000100000014)
wasm_exec.js:22 	/Users/rsc/go/src/syscall/js/js.go:51 +0x5 fp=0x24f038 sp=0x24f008 pc=0x166d0005
wasm_exec.js:22 syscall/js.Value.Get({{}, 0x7ff8000100000006, 0x0}, {0x442f4, 0xd})
wasm_exec.js:22 	/Users/rsc/go/src/syscall/js/js.go:298 +0x9 fp=0x24f078 sp=0x24f038 pc=0x16740009
wasm_exec.js:22 syscall/js.handleEvent()
wasm_exec.js:22 	/Users/rsc/go/src/syscall/js/func.go:90 +0x2 fp=0x24f158 sp=0x24f078 pc=0x166c0002
wasm_exec.js:22 runtime.handleEvent()
wasm_exec.js:22 	/Users/rsc/go/src/runtime/lock_js.go:287 +0x16 fp=0x24f1c8 sp=0x24f158 pc=0x10820016
wasm_exec.js:22 runtime.(*mheap).initSpan(0x241860, 0x372078, 0x0, 0x5, 0x5e4000, 0x1)
wasm_exec.js:22 	/Users/rsc/go/src/runtime/mheap.go:1393 +0x2 fp=0x24f208 sp=0x24f1c8 pc=0x117b0002
wasm_exec.js:22 runtime.(*mheap).allocSpan(0x241860, 0x1, 0x0, 0x5)
wasm_exec.js:22 	/Users/rsc/go/src/runtime/mheap.go:1346 +0x8e fp=0x24f2d0 sp=0x24f208 pc=0x117a008e
wasm_exec.js:22 runtime.(*mheap).alloc.func1()
wasm_exec.js:22 	/Users/rsc/go/src/runtime/mheap.go:970 +0xb fp=0x24f318 sp=0x24f2d0 pc=0x1175000b
wasm_exec.js:22 runtime.systemstack(0x24f328)
wasm_exec.js:22 	/Users/rsc/go/src/runtime/asm_wasm.s:172 +0x3 fp=0x24f320 sp=0x24f318 pc=0x14760003
wasm_exec.js:22 runtime.mstart()
wasm_exec.js:22 	/Users/rsc/go/src/runtime/asm_wasm.s:29 fp=0x24f328 sp=0x24f320 pc=0x14720000

As I said, it is unclear to me whether the bug is the original RangeError or that it can't continue after the RangeError.

To reproduce:

git clone https://github.com/rsc/tmp
cd tmp/wasmshell
go generate # builds main.wasm, copies wasm_exec.js into local directory
go run serve.go
open http://localhost:8001/
@rsc rsc added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 13, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 13, 2024
@rsc
Copy link
Contributor Author

rsc commented Dec 13, 2024

/cc @golang/wasm

@rsc

This comment was marked as outdated.

@dmitshur dmitshur added the arch-wasm WebAssembly issues label Dec 13, 2024
@cherrymui
Copy link
Member

#70829 (comment) looks like there is indeed a deep recursion. At systemstack we just change the SP to point to the M stack in the linear memory, without unwinding the user frames on the Wasm call stack, so from Wasm's perspective it is adding more frames. Perhaps the deep recursion on the user stack is already close to the limit.

@rsc
Copy link
Contributor Author

rsc commented Dec 13, 2024

In non-web Ivy I measured the call depth at 25,573, corresponding to 4X that in Go frames. Playing with Ivy's )maxstack, I found that

op a gcd b = a == b: a; a > b: b gcd a-b; a gcd b-a
)maxstack 1800
1562 gcd !11

is right around the line where it stops working reliably.
Using 1700 makes Ivy consistently stop before hitting the limit.
Using 1800, it works about half the time; the other half have RangeError; but there is no crash.
Using 1900, you have to click Run a handful of times to get a crash.
Using 2000, the first time works but the second usually crashes.

Sometimes I see things like "fatal: sched holding locks", which makes me think that the problem is that when the stack overflow happens, the wasm runtime just unwinds the stack, but the Go code has not had a chance to clean up. So it might be holding locks, didn't run defers, and so on. And perhaps the goroutine gets reused for the next call.

I'm starting to believe more strongly that the RangeError needs to be handled by wasm_exec and have it just tear down Go entirely.

@cherrymui
Copy link
Member

I'm starting to believe more strongly that the RangeError needs to be handled by wasm_exec and have it just tear down Go entirely.

It looks like _makeFuncWrapper (the counter part of js.FuncOf in wasm_exec.js) perhaps should try to catch error and terminate the Go instance? I'm not good at JavaScript to tell for sure.

@Zxilly
Copy link
Member

Zxilly commented Dec 15, 2024

_resume() {
	if (this.exited) {
		throw new Error("Go program has already exited");
	}
	try {
		this._inst.exports.resume();
	} catch (err) {
		if (err instanceof RangeError) {
			// runtime.throw
			this._inst.exports.throw(err.message);
			this.exited = true;
			this._resolveExitPromise();
			throw err;
		}
	}
	if (this.exited) {
		this._resolveExitPromise();
	}
}

I was able to catch RangeError correctly after modifying _resume in wasm_exec.js as above.

@Zxilly Zxilly linked a pull request Dec 15, 2024 that will close this issue
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/636315 mentions this issue: wasm: terminate instance while met stackoverflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm WebAssembly issues compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants