AeroRL Agent Instructions

This repository is built for high-velocity AI coding agents (Claude 4 Sonnet, Cursor, GPT-4o, etc.).

Rules every agent must follow:

Never change the public API in aerorl/init.py unless you bump the major version.
All kernels must be in aerorl/kernels/ as pure Triton Python (no raw CUDA unless absolutely required for IPC).
Every new feature must ship with a benchmark delta vs. vanilla verl/TRL on Qwen2.5-VL-7B.
Vision token masking is mandatory for any loss function — use the vision_mask tensor passed from the VLM processor.
Zero-copy KV must survive torch.compile and FSDP2.
All benchmarks must run on a single RTX PRO 6000 96 GB card and report exact VRAM delta and throughput.

Preferred workflow:

Current priority queue:

When in doubt: make it faster and lower-VRAM than verl on VLMs. Ship numbers, not promises.

Provide feedback