This project provides a simple script to compute alignment metrics for transformer models on various datasets. This the official code for "PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models" (https://arxiv.org/abs/2506.20629).
Install dependencies:
pip install -r requirements.txt
Run the main script:
python main.py --model <huggingface-model-handle> --dataset <math|code|history|logic> --batchsize <BATCHSIZE> --nbsamples <N> --seqlen <SEQ_LEN> --aggregation <type|layer|None> --output_dir <RESULTS_DIR>
Example:
python main.py --model meta-llama/Llama-3.2-1B-Instruct --dataset math --batchsize 8 --nbsamples 100 --seqlen 256 --aggregation type --output_dir results/
--model
: HuggingFace model handle (e.g.,google/gemma-2b
)--dataset
: Dataset name (math
,code
,history
,logic
)--batchsize
: Batch size (not used in this simple version, all samples are processed at once)--nbsamples
: Number of samples to use from the dataset--seqlen
: Sequence length for tokenization--aggregation
: How to aggregate results (type
,layer
, orNone
)--output_dir
: Directory to save results
- Raw and aggregated metrics are saved as JSON files in the specified output directory.