Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic templated JIT? #659

Open
richarddd opened this issue Nov 7, 2024 · 3 comments
Open

Basic templated JIT? #659

richarddd opened this issue Nov 7, 2024 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@richarddd
Copy link
Contributor

Hello team,

This is a substantial proposal, and I recognize that @bnoordhuis is already exploring similar optimizations with QuickJS. However, I believe it may be valuable to consider implementing JIT-optimized fast paths for "simple" operations—such as array length checks, equality comparisons, and other common cases.

By using a templated JIT approach that directly translates bytecode to machine code, we could avoid introducing additional dependencies. Initially, we could limit the implementation to x86 and ARM64 architectures.

While Ben's approach of converting bytecode to C and using a complete compiler (TCC) achieves high performance, it introduces some compilation overhead and additional indirection. In contrast, a templated JIT might offer a leaner path to optimized execution for frequently encountered operations.

For inspiration, Andreas Kling’s recent implementation of a JIT compiler for LibJS in SerenityOS is a great example. You can see his process in this YouTube playlist:

Looking forward to your thoughts

@bnoordhuis
Copy link
Contributor

I've been waiting for someone to open this issue :-)

So, I've been thinking about this a lot obviously, and I have several ideas how to tackle it. Let me start off with the observation that template JITs eliminate interpreter dispatch overhead but not much else.

quickjs has "fat" opcodes - meaning most opcodes do a lot of work - and that helps keep dispatch overhead down. It's usually within 5-25%. That's not nothing but it means a dumb JIT isn't going to move the needle much.

My quickjit experiment is basically a template JIT because tcc is;1 it's somewhere between a little slower to maybe 50% faster than the bytecode interpreter2. I consider it a dead end.

There are three prongs of attack that I'm hopeful will give a significant boost:

  1. Leaning into inline caches and type feedback way more than we do now. Something like r * Math.sin(d) should ideally get lowered to a single type-guarded opcode.

  2. Eliminate VM stack shuffling as much as possible, maybe by switching to a register VM. A decent JIT needs to deal with register allocation anyway so we might as well do the work upfront in the interpreter.

  3. Be smarter about managing memory and reference counts. In some benchmarks quickjs spends an extreme amount of time adjusting objects refcounts up and down, often to end up with the exact same reference count it started out with. Smarter analysis (like deferring refcounting to the end of basic blocks, or even better, until it's observable) should help a great deal.

Once all that is in place, I'm confident a more-than-decent method JIT or tracing JIT falls out almost naturally.

Of course that all takes a lot of time to implement and we're working on this in our spare time so no ETA.


1 tcc is like the MVP of compilers. Fancy register allocation, instruction selection, code motion, constant propagation, loop unrolling, &c? tcc doesn't do any of it, it just translates C input to ASM output in the most straightforward way possible. The quality of its generated code would get you a D- in Compilers 301 ;-)

2 I wrote another proof of concept (not open source) where quickjit shells out to clang, then dlopens the result. It's around 2-4x faster due to clang's massively better optimizer but has several CPU/memory drawbacks (clang is resource hungry) and it's still not remotely in the same ballpark as the big JS engines.

@bnoordhuis bnoordhuis added enhancement New feature or request help wanted Extra attention is needed labels Nov 7, 2024
@KaruroChori
Copy link
Contributor

KaruroChori commented Dec 5, 2024

Just a quick suggestion, have you considered QBE? (there is a C11 compiler using it as backend, but emitting QBE intermediate code directly is quite feasible in my opinion and removes an unnecessary intermediate step)

On paper it is much better than tcc in optimizing code, and it does not even get close to the complexity of llvm.
The main limitation is the limited number of supported architectures.

@bnoordhuis
Copy link
Contributor

I did look at it at the time but, IIRC (memory is a little fuzzy here), it wasn't viable because it's not designed to be used as a library, only as a standalone program.

A quick check of my local qbe checkout seems to confirm that: no library build, lots of global variables, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants