Elana: A Simple Energy & Latency Analyzer for LLMs

A lightweight, academic-friendly profiler for analyzing model size, prefilling latency (TTFT), generation latency (TPOT), and end-to-end latency (TTLT) of Large Language Models on multi-GPU and edge GPU plaforms. ELANA provides a simple command-line interface and optional energy consumption logging, making it ideal for research on efficient LLM.

✨ Features

🔍 Profile model memory footprint, supporting Transformers, Mamba, and hybrid models
⚡ TTFT (Time-To-First-Token) measurement for prefilling stage
🎯 TPOT (Time-Per-Output-Token) measurement for autoregressive decoding
🧩 TTLT (Time-To-Last-Token) profiling for full end-to-end inference
🔌 GPU energy logging support for both multi-GPUs on servers and edge GPUs on jetson series
🔥 Optional Torch Profiler integration for kernel-level insights, similar to Nvidia Nsight Compute
🧱 Compatible with any HuggingFace AutoModelForCausalLM model and self-developed model classes

🚀 Create Environment

You can use either conda or virtualenv.

Clone repository

git clone https://github.com/hychiang-git/Elana.git
cd Elana

Using conda

conda create -n elana-env python==3.12
conda activate elana-env

Using virtualenv

python3 -m venv elana-env
source elana-env/bin/activate
pip install --upgrade pip

📦 Install Elana and its Dependencies

pip install .

⚡ Quick Start

TTFT (prefilling latency)

elana meta-llama/Llama-3.2-3B-Instruct --ttft --energy

TPOT (decoding latency)

elana meta-llama/Llama-3.2-3B-Instruct --tpot --energy --cache_graph

TTLT (end-to-end latency)

elana meta-llama/Llama-3.2-3B-Instruct --ttlt --energy --cache_graph

Profile model size

elana meta-llama/Llama-3.2-3B-Instruct --size

📚 More Usage Examples

elana --help

Visualize trace file with Perfetto

We use torch profile to generate a json trace file and visualize it with Perfetto. For example, run

# use --torch_profile
elana meta-llama/Llama-3.2-3B-Instruct --tpot --energy --cache_graph --torch_profile

By adding --torch_profile, it will create a folder for the trace files at

torch_profile/
    {model_name}/
        kernel_metrics.{profile_task}.{timestamp}.csv # all kernels profiling results
        kernel_type_metrics.{profile_task}.{timestamp}.csv # summarize the grouped kernels by user's tags
        {profile_task}.{machine name}.{timestamp}.pt.trace.json # the trace file can be visualized on Perfetto

To open a trace file on perfetto, you have to download it first, and open it on the website

Here is a example of visualizing the details of a kernel

⚠️ Limitations

The tool has only been tested with multi-GPU on one node, multi-node environment has not been tested yet.
CUDA graph capture may not work for models with cpu operations, dynamic shapes or unconventional cache layouts.
Energy logging is relied on GPU APIs from nvidia-ml-py.
Torch profiler introduces significant overhead; recommended only for microbenchmarking, i.e., reduce the number of repeats
Some architectures (e.g., Nemotron-H, Mamba2) use custom KV-cache formats; ELANA includes compatibility paths but behaviors may vary.
Assumes HuggingFace AutoModelForCausalLM-style forward API.

Citation

@article{chiang2025elana,
  title = {ELANA: A Simple Energy and Latency Analyzer for LLMs},
  author = {Chiang, Hung-Yueh and Wang, Bokun and Marculescu, Diana},
  journal = {arXiv preprint arXiv:2512.09946},
  year = {2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
elana		elana
jetson		jetson
misc		misc
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Elana: A Simple Energy & Latency Analyzer for LLMs

✨ Features

🚀 Create Environment

Clone repository

Using conda

Using virtualenv

📦 Install Elana and its Dependencies

⚡ Quick Start

TTFT (prefilling latency)

TPOT (decoding latency)

TTLT (end-to-end latency)

Profile model size

📚 More Usage Examples

Visualize trace file with Perfetto

⚠️ Limitations

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Elana: A Simple Energy & Latency Analyzer for LLMs

✨ Features

🚀 Create Environment

Clone repository

Using conda

Using virtualenv

📦 Install Elana and its Dependencies

⚡ Quick Start

TTFT (prefilling latency)

TPOT (decoding latency)

TTLT (end-to-end latency)

Profile model size

📚 More Usage Examples

Visualize trace file with Perfetto

⚠️ Limitations

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages