Skip to content

aphrodite-engine/aphrodite-jax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aphrodite-JAX

A small, GPU-only JAX inference engine. Currently, only single GPU inference with Qwen3 models is supported.

Example

from aphrodite_jax import LLM, SamplingParams

llm = LLM("Qwen/Qwen3-0.6B", max_model_len=4096)
outputs = llm.generate(
    ["Hello from Aphrodite-JAX"],
    SamplingParams(temperature=0.6, max_tokens=32),
)
print(outputs[0]["text"])

Benchmarking

python -m aphrodite_jax.bench_perf -m Qwen/Qwen3-0.6B

There is currently no compile cache or AOT compilation, so each shape triggers a compile run.

About

A small JAX-native inference engine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages