What is in PanoLlama:
- New Paradigm: We define a new paradigm for PIG, modeling it as a next-token prediction task to better solve the multilevel coherence challenge.
- New Strategy: Based on token redirection, we develop a training-free next-crop prediction strategy that enables endless PIG with existing VAR models. Compared to current methods with complex designs, PanoLlama offers a more straightforward and efficient framework, achieving SOTA performance in coherence (47.50%), fidelity & diversity (28.16%), and aesthetics (15%).
- Additional Applications: Beyond basic panorama generation, we support applications other PIG methods cannot achieve, including multi-scale generation, mask-free layout control, and multi-guidance synthesis.
- New Benchmark: Given the lack of a standardized testing prompt in prior PIG works, which typically rely on 5-20 specific ones, we construct a dataset of 1,000 detailed prompts across 100+ themes. Along with a comprehensive set of baselines and metrics, this establishes a new benchmark for panorama generation.
For more details, please visit our paper page.
Configuration Set up and configure the environment by installing the required packages:
pip install -r requirements.txtPre-trained Models Download pre-trained models /models under the
corresponding modules:
| module | model | params | tokens | weight |
|---|---|---|---|---|
| text encoder | FLAN-T5-XL | 3B | / | flan-t5-xl |
| image tokenizer | VQVAE | 72M | 16x16 | vq_ds16_t2i.pt |
| token generator | Llama-XL | 775M | 32x32 | t2i_XL_stage2_512.pt |
Generation We support panorama expansion in vertical, horizontal, and both directions. Try the following command to generate a horizontal one:
python -m token_generator.sample \
--seed -1 \
--times 12 \
--addit-cols 24 \
--lam 1 \
--gen-mode h \
--n 1If you find our work helpful, please consider citing:
@inproceedings{zhou2025panollama,
title={Panollama: Generating endless and coherent panoramas with next-token-prediction llms},
author={Zhou, Teng and Zhang, Xiaoyu and Tang, Yongchuan},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={15340--15349},
year={2025}
}