Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: how can i load local model to finetuning #17986

Open
vangon opened this issue Mar 3, 2025 · 5 comments
Open

[Bug]: how can i load local model to finetuning #17986

vangon opened this issue Mar 3, 2025 · 5 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@vangon
Copy link

vangon commented Mar 3, 2025

Bug Description

how can i load local model to finetuning
from llama_index.finetuning import SentenceTransformersFinetuneEngine
from sentence_transformers import SentenceTransformer

from transformers import AutoConfig, AutoModel
from sentence_transformers import SentenceTransformer

model_path = "/Users/XXX/Documents/bge-base-zh-v1.5"

# 手动加载配置和模型

config = AutoConfig.from_pretrained(model_path)

model = AutoModel.from_pretrained(model_path, config=config)

# 包装为 SentenceTransformer

sentence_model = SentenceTransformer(modules=[model])

finetune_engine = SentenceTransformersFinetuneEngine(
train_dataset,
model_id=model_path,
model_output_path="test_model",
val_dataset=val_dataset,
)

finetune_engine.finetune()

embed_model = finetune_engine.get_finetuned_model()

Version

0.12.20

Steps to Reproduce

from llama_index.finetuning import SentenceTransformersFinetuneEngine
from sentence_transformers import SentenceTransformer

from transformers import AutoConfig, AutoModel
from sentence_transformers import SentenceTransformer

model_path = "/Users/XXX/Documents/bge-base-zh-v1.5"

# 手动加载配置和模型

config = AutoConfig.from_pretrained(model_path)

model = AutoModel.from_pretrained(model_path, config=config)

# 包装为 SentenceTransformer

sentence_model = SentenceTransformer(modules=[model])

finetune_engine = SentenceTransformersFinetuneEngine(
train_dataset,
model_id=model_path,
model_output_path="test_model",
val_dataset=val_dataset,
)

Relevant Logs/Tracbacks

File /opt/anaconda3/envs/py310/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:154, in validate_repo_id(repo_id)
    151     raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
    153 if repo_id.count("/") > 1:
--> 154     raise HFValidationError(
    155         "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    156         f" '{repo_id}'. Use `repo_type` argument if needed."
    157     )
    159 if not REPO_ID_REGEX.match(repo_id):
    160     raise HFValidationError(
    161         "Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are"
    162         " forbidden, '-' and '.' cannot start or end the name, max length is 96:"
    163         f" '{repo_id}'."
    164     )

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/Users/gaofan36/Documents/bge-base-zh-v1.5'. Use `repo_type` argument if needed.
@vangon vangon added bug Something isn't working triage Issue needs to be triaged/prioritized labels Mar 3, 2025
Copy link

dosubot bot commented Mar 3, 2025

Hey @vangon! I'm here to help you with any bugs or questions you have. If you need assistance, feel free to ask.

To load a local model for fine-tuning using SentenceTransformersFinetuneEngine and resolve the HFValidationError, ensure that the model_id parameter is correctly specified to point to your local model path. You should provide the path directly without any prefixes. Here's how you can do it:

  1. Set the model_id to the path of your local model directory. This should be a valid path on your filesystem where the model is stored.

  2. Ensure that the path is correctly formatted and accessible.

Here's an example of how you might initialize the SentenceTransformersFinetuneEngine with a local model:

from llama_index.finetuning.embeddings.sentence_transformer import SentenceTransformersFinetuneEngine
from llama_index.finetuning.embeddings.common import EmbeddingQAFinetuneDataset

# Assuming `train_dataset` and `val_dataset` are instances of EmbeddingQAFinetuneDataset
model_path = "/Users/XXX/Documents/bge-base-zh-v1.5"

finetune_engine = SentenceTransformersFinetuneEngine(
    train_dataset=train_dataset,
    model_id=model_path,  # Use the local path to your model
    model_output_path="test_model",
    val_dataset=val_dataset,
)

# Proceed with fine-tuning
finetune_engine.finetune()

embed_model = finetune_engine.get_finetuned_model()

Make sure that the path you provide is accessible and contains the necessary model files. If you encounter HFValidationError, double-check the path format and ensure it is correctly pointing to the local model directory [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@vangon
Copy link
Author

vangon commented Mar 3, 2025

3q,but it does not work

Image
i ' m sure path is valid
Image

@vangon
Copy link
Author

vangon commented Mar 3, 2025

Image

@vangon
Copy link
Author

vangon commented Mar 3, 2025

Hey @vangon! I'm here to help you with any bugs or questions you have. If you need assistance, feel free to ask.

To load a local model for fine-tuning using SentenceTransformersFinetuneEngine and resolve the HFValidationError, ensure that the model_id parameter is correctly specified to point to your local model path. You should provide the path directly without any prefixes. Here's how you can do it:

  1. Set the model_id to the path of your local model directory. This should be a valid path on your filesystem where the model is stored.
  2. Ensure that the path is correctly formatted and accessible.

Here's an example of how you might initialize the SentenceTransformersFinetuneEngine with a local model:

from llama_index.finetuning.embeddings.sentence_transformer import SentenceTransformersFinetuneEngine
from llama_index.finetuning.embeddings.common import EmbeddingQAFinetuneDataset

Assuming train_dataset and val_dataset are instances of EmbeddingQAFinetuneDataset

model_path = "/Users/XXX/Documents/bge-base-zh-v1.5"

finetune_engine = SentenceTransformersFinetuneEngine(
train_dataset=train_dataset,
model_id=model_path, # Use the local path to your model
model_output_path="test_model",
val_dataset=val_dataset,
)

Proceed with fine-tuning

finetune_engine.finetune()

embed_model = finetune_engine.get_finetuned_model()
Make sure that the path you provide is accessible and contains the necessary model files. If you encounter HFValidationError, double-check the path format and ensure it is correctly pointing to the local model directory [1].

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

3q,help me

Copy link

dosubot bot commented Mar 3, 2025

To resolve the HFValidationError when using a local model path with SentenceTransformersFinetuneEngine, ensure that the model_id is correctly set to the local path of your model directory. The error you're encountering suggests that the path might be incorrectly interpreted as a repository ID.

Here are a few steps to troubleshoot and resolve the issue:

  1. Check the Path Format: Make sure the model_id is a valid local path string without any prefixes or special characters that might be misinterpreted as a remote repository ID.

  2. Use the Correct Method for Local Paths: The SentenceTransformersFinetuneEngine should handle local paths correctly if the path is directly provided. Ensure that the path is accessible and contains all necessary model files like config.json, pytorch_model.bin, etc.

  3. Verify the Code: Double-check your code to ensure that the model_id is being passed correctly. It should look something like this:

    finetune_engine = SentenceTransformersFinetuneEngine(
        train_dataset=train_dataset,
        model_id="/Users/XXX/Documents/bge-base-zh-v1.5",  # Local path
        model_output_path="test_model",
        val_dataset=val_dataset,
    )
  4. Check for Updates: Ensure that you are using the latest version of the llama_index and sentence_transformers libraries, as updates might have fixed related issues.

If the problem persists, consider checking the implementation details of the SentenceTransformersFinetuneEngine to ensure it supports local paths as expected [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

1 participant