Aphrodite-JAX

A small, GPU-only JAX inference engine. Currently, only single GPU inference with Qwen3 models is supported.

Example

from aphrodite_jax import LLM, SamplingParams

llm = LLM("Qwen/Qwen3-0.6B", max_model_len=4096)
outputs = llm.generate(
    ["Hello from Aphrodite-JAX"],
    SamplingParams(temperature=0.6, max_tokens=32),
)
print(outputs[0]["text"])

Benchmarking

python -m aphrodite_jax.bench_perf -m Qwen/Qwen3-0.6B

There is currently no compile cache or AOT compilation, so each shape triggers a compile run.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
aphrodite_jax		aphrodite_jax
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aphrodite-JAX

Example

Benchmarking

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aphrodite-JAX

Example

Benchmarking

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages