[Bug]: how can i load local model to finetuning #17986

vangon · 2025-03-03T08:42:56Z

Bug Description

how can i load local model to finetuning
from llama_index.finetuning import SentenceTransformersFinetuneEngine
from sentence_transformers import SentenceTransformer

from transformers import AutoConfig, AutoModel
from sentence_transformers import SentenceTransformer

model_path = "/Users/XXX/Documents/bge-base-zh-v1.5"

# 手动加载配置和模型

config = AutoConfig.from_pretrained(model_path)

model = AutoModel.from_pretrained(model_path, config=config)

# 包装为 SentenceTransformer

sentence_model = SentenceTransformer(modules=[model])

finetune_engine = SentenceTransformersFinetuneEngine(
train_dataset,
model_id=model_path,
model_output_path="test_model",
val_dataset=val_dataset,
)

finetune_engine.finetune()

embed_model = finetune_engine.get_finetuned_model()

Version

0.12.20

Steps to Reproduce

from llama_index.finetuning import SentenceTransformersFinetuneEngine
from sentence_transformers import SentenceTransformer

from transformers import AutoConfig, AutoModel
from sentence_transformers import SentenceTransformer

model_path = "/Users/XXX/Documents/bge-base-zh-v1.5"

# 手动加载配置和模型

config = AutoConfig.from_pretrained(model_path)

model = AutoModel.from_pretrained(model_path, config=config)

# 包装为 SentenceTransformer

sentence_model = SentenceTransformer(modules=[model])

finetune_engine = SentenceTransformersFinetuneEngine(
train_dataset,
model_id=model_path,
model_output_path="test_model",
val_dataset=val_dataset,
)

Relevant Logs/Tracbacks

File /opt/anaconda3/envs/py310/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:154, in validate_repo_id(repo_id)
    151     raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
    153 if repo_id.count("/") > 1:
--> 154     raise HFValidationError(
    155         "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    156         f" '{repo_id}'. Use `repo_type` argument if needed."
    157     )
    159 if not REPO_ID_REGEX.match(repo_id):
    160     raise HFValidationError(
    161         "Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are"
    162         " forbidden, '-' and '.' cannot start or end the name, max length is 96:"
    163         f" '{repo_id}'."
    164     )

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/Users/gaofan36/Documents/bge-base-zh-v1.5'. Use `repo_type` argument if needed.

dosubot · 2025-03-03T08:44:07Z

Hey @vangon! I'm here to help you with any bugs or questions you have. If you need assistance, feel free to ask.

To load a local model for fine-tuning using SentenceTransformersFinetuneEngine and resolve the HFValidationError, ensure that the model_id parameter is correctly specified to point to your local model path. You should provide the path directly without any prefixes. Here's how you can do it:

Set the model_id to the path of your local model directory. This should be a valid path on your filesystem where the model is stored.
Ensure that the path is correctly formatted and accessible.

Here's an example of how you might initialize the SentenceTransformersFinetuneEngine with a local model:

from llama_index.finetuning.embeddings.sentence_transformer import SentenceTransformersFinetuneEngine
from llama_index.finetuning.embeddings.common import EmbeddingQAFinetuneDataset

# Assuming `train_dataset` and `val_dataset` are instances of EmbeddingQAFinetuneDataset
model_path = "/Users/XXX/Documents/bge-base-zh-v1.5"

finetune_engine = SentenceTransformersFinetuneEngine(
    train_dataset=train_dataset,
    model_id=model_path,  # Use the local path to your model
    model_output_path="test_model",
    val_dataset=val_dataset,
)

# Proceed with fine-tuning
finetune_engine.finetune()

embed_model = finetune_engine.get_finetuned_model()

Make sure that the path you provide is accessible and contains the necessary model files. If you encounter HFValidationError, double-check the path format and ensure it is correctly pointing to the local model directory ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

vangon · 2025-03-03T09:17:53Z

3q,but it does not work

i ' m sure path is valid

vangon · 2025-03-03T09:23:34Z

vangon · 2025-03-03T09:29:29Z

Hey @vangon! I'm here to help you with any bugs or questions you have. If you need assistance, feel free to ask.

To load a local model for fine-tuning using SentenceTransformersFinetuneEngine and resolve the HFValidationError, ensure that the model_id parameter is correctly specified to point to your local model path. You should provide the path directly without any prefixes. Here's how you can do it:

Set the model_id to the path of your local model directory. This should be a valid path on your filesystem where the model is stored.

Ensure that the path is correctly formatted and accessible.

Here's an example of how you might initialize the SentenceTransformersFinetuneEngine with a local model:

from llama_index.finetuning.embeddings.sentence_transformer import SentenceTransformersFinetuneEngine
from llama_index.finetuning.embeddings.common import EmbeddingQAFinetuneDataset

Assuming train_dataset and val_dataset are instances of EmbeddingQAFinetuneDataset

model_path = "/Users/XXX/Documents/bge-base-zh-v1.5"

finetune_engine = SentenceTransformersFinetuneEngine(
train_dataset=train_dataset,
model_id=model_path, # Use the local path to your model
model_output_path="test_model",
val_dataset=val_dataset,
)

Proceed with fine-tuning

finetune_engine.finetune()

embed_model = finetune_engine.get_finetuned_model()
Make sure that the path you provide is accessible and contains the necessary model files. If you encounter HFValidationError, double-check the path format and ensure it is correctly pointing to the local model directory [1].

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

3q,help me

dosubot · 2025-03-03T09:31:57Z

To resolve the HFValidationError when using a local model path with SentenceTransformersFinetuneEngine, ensure that the model_id is correctly set to the local path of your model directory. The error you're encountering suggests that the path might be incorrectly interpreted as a repository ID.

Here are a few steps to troubleshoot and resolve the issue:

Check the Path Format: Make sure the model_id is a valid local path string without any prefixes or special characters that might be misinterpreted as a remote repository ID.
Use the Correct Method for Local Paths: The SentenceTransformersFinetuneEngine should handle local paths correctly if the path is directly provided. Ensure that the path is accessible and contains all necessary model files like config.json, pytorch_model.bin, etc.

Verify the Code: Double-check your code to ensure that the model_id is being passed correctly. It should look something like this:

finetune_engine = SentenceTransformersFinetuneEngine(
    train_dataset=train_dataset,
    model_id="/Users/XXX/Documents/bge-base-zh-v1.5",  # Local path
    model_output_path="test_model",
    val_dataset=val_dataset,
)

Check for Updates: Ensure that you are using the latest version of the llama_index and sentence_transformers libraries, as updates might have fixed related issues.

If the problem persists, consider checking the implementation details of the SentenceTransformersFinetuneEngine to ensure it supports local paths as expected ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

vangon added bug Something isn't working triage Issue needs to be triaged/prioritized labels Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: how can i load local model to finetuning #17986

[Bug]: how can i load local model to finetuning #17986

vangon commented Mar 3, 2025

dosubot bot commented Mar 3, 2025

vangon commented Mar 3, 2025

vangon commented Mar 3, 2025

vangon commented Mar 3, 2025

Assuming `train_dataset` and `val_dataset` are instances of EmbeddingQAFinetuneDataset

Proceed with fine-tuning

dosubot bot commented Mar 3, 2025

[Bug]: how can i load local model to finetuning #17986

[Bug]: how can i load local model to finetuning #17986

Comments

vangon commented Mar 3, 2025

Bug Description

# 手动加载配置和模型

config = AutoConfig.from_pretrained(model_path)

model = AutoModel.from_pretrained(model_path, config=config)

# 包装为 SentenceTransformer

sentence_model = SentenceTransformer(modules=[model])

Version

Steps to Reproduce

# 手动加载配置和模型

config = AutoConfig.from_pretrained(model_path)

model = AutoModel.from_pretrained(model_path, config=config)

# 包装为 SentenceTransformer

sentence_model = SentenceTransformer(modules=[model])

Relevant Logs/Tracbacks

dosubot bot commented Mar 3, 2025

vangon commented Mar 3, 2025

vangon commented Mar 3, 2025

vangon commented Mar 3, 2025

Assuming train_dataset and val_dataset are instances of EmbeddingQAFinetuneDataset

Proceed with fine-tuning

dosubot bot commented Mar 3, 2025

Assuming `train_dataset` and `val_dataset` are instances of EmbeddingQAFinetuneDataset