feat: Add hf-mistral3 adapter for Ministral-3 models #3487
Merged
+130
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for evaluating Ministral-3 models (3B, 8B, 14B) which use
Mistral3ForConditionalGenerationinstead ofAutoModelForCausalLM.Closes #3483
Motivation
This bug was discovered during an internal evaluation project at Vago Solutions, a startup based in Germany. When attempting to benchmark Ministral-3 models using the standard
hfbackend, the evaluation fails with:ValueError: Unrecognized configuration class <class 'transformers.models.mistral3.configuration_mistral3.Mistral3Config'> for this kind of AutoModel: AutoModelForCausalLMThis occurs because
lm_evalhard-codes usage ofAutoModelForCausalLM, while Ministral-3 models in Transformers 5.x are exposed asMistral3ForConditionalGeneration(a VLM-style class).Solution
Introduces a new model adapter
Mistral3LMthat:HFLM- Dynamically importsMistral3ForConditionalGenerationwith graceful error handlingcausalbackend (Mistral3 is decoder-only despite the class name)_model_call()to bypass theAutoModelForCausalLMassertiontext_configfor propermax_lengthdetectionUsage
lm_eval --model hf-mistral3
--model_args pretrained=mistralai/Ministral-3-3B-Instruct-2512-BF16,dtype=bfloat16
--tasks hellaswag
--device cuda:0
--batch_size 8
Files Changed
lm_eval/models/mistral3.py- New adapter (created)-lm_eval/models/__init__.pyhf-mistral3toMODEL_MAPPINGRequirements