This repository contains the official implementation for pre-training, fine-tuning, and running inference with the In-Context AutoEncoder (ICAE) model. It also supports standard Large Language Models (LLMs) for comparison.
The original ICAE idea is presented in a paper here and the improvements using Positional Identifyiers described here
Our work is described in a paper here
Our trained models are here
- Models: Supports both
ICAEand standardSimpleLLMarchitectures. - Base Models: Compatible with
QwenandMistralseries of models. - Tasks: Pre-training, fine-tuning, and inference on SQuAD, RepoQA, and trajectory datasets.
- Frameworks: Built on top of Hugging Face's
transformers,datasets, andaccelerate. - Experiment Tracking: Integrated with Weights & Biases (
wandb) for easy monitoring of experiments.
.
├── configs/ # Configuration files for experiments
├── data/ # Data for training and evaluation
├── models/ # Model definitions (ICAE and SimpleLLM)
├── scripts/ # Scripts for pre-training, fine-tuning, and inference
├── README.md # This file
-
Clone the repository:
git clone https://github.com/.../icae.git cd icae -
Install dependencies:
pip install -r requirements.txt
All scripts are configured using YAML files located in the configs/ directory. Each script requires a --config_path argument.
1. Prepare pre-training data:
CUDA_VISIBLE_DEVICES=X python -m icae.data.prepare_data_for_pretraining --config_path icae/configs/pretrain_config.yaml2. Run pre-training:
CUDA_VISIBLE_DEVICES=X python -m icae.scripts.pretrain --config_path icae/configs/pretrain_config.yamlFine-tune models on the SQuAD dataset:
CUDA_VISIBLE_DEVICES=X python -m icae.scripts.finetune_SQuAD --config_path icae/configs/finetune_config_icae.yamlEvaluate models on SQuAD with two task types:
ae(autoencoding): Reconstruct input textqa(question answering): Answer questions based on context
CUDA_VISIBLE_DEVICES=X python -m icae.scripts.inference_SQuAD --config_path icae/configs/inference_config.yamlSet task: "ae" or task: "qa" in the config file to specify the task type.
Key configuration parameters:
model_type:"icae"or"llm"do_compress: Enable compression for ICAE (set toTruefor inference)fixed_mem_size: Size of memory tokens (must be a power of 2)mean_compression_rate: Expected compression ratiouse_position_identifiers: Use positional identifiers for compressiontask: Inference task type ("ae","qa","repoqa", or"trajectories")
The inference scripts automatically compute and save metrics:
- SQuAD: BLEU-1, Exact Match (EM), F1 score
- RepoQA: BLEU-1, RepoQA similarity score, Pass@0.8 threshold
- Trajectories: BLEU-1, token-level accuracy, exact match (per turn and per trajectory), compression rates, timing statistics
Metrics are saved to icae/data/metrics/ and predictions to icae/data/predictions/.
- Currently, only a batch size of 1 (
per_device_train_batch_size=1) is supported. - RepoQA fine-tuning and inference currently only support Qwen models.
- Trajectory fine-tuning uses a custom trainer that processes trajectories turn-by-turn with accumulated compressed memory.
- All scripts support both ICAE and SimpleLLM models for comparison.
- Checkpoints are saved in safetensors format and can be loaded for inference or continued training.
If you use this code, please cite:
@misc{gelvan2026problemsimplicitcontextcompression,
title={On Problems of Implicit Context Compression for Software Engineering Agents},
author={Kirill Gelvan and Igor Slinko and Felix Steinbauer and Egor Bogomolov and Florian Kofler and Yaroslav Zharov},
year={2026},
eprint={2605.11051},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2605.11051},
}```