-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to serv with 2 model folders? #3360
Comments
You can have multiple models in the same folder:
And you can start djl-serving:
|
If you want to use workflow to chain tow models, you see this example: https://github.com/deepjavalibrary/djl-demo/tree/master/djl-serving |
You should be using the python engine. Can you share your |
engine=PyTorch |
You should use our lmi container:
|
|
curl -X POST http://127.0.0.1:8080/invocations { |
The model loaded successfully, however, it failed during inference.
|
we don't have |
[sidney@tech68 ~]$ ./app/serving-djl/bin/serving -m /home/sidney/app/idea/model_root/model_en |
./app/serving-djl/bin/serving -m /home/sidney/app/idea/model_root/model_en curl -X POST http://127.0.0.1:8090/invocations { Same error for both python 3.9 and 3.10 when run djl-serving for llama3 model which in local folders. |
You still have this error during inference:
Does your model works using pure python? Does the peft library version support your model? |
model, tokenizer = FastLanguageModel.from_pretrained( FastLanguageModel.for_inference(model) inputs = tokenizer( Can run pure python successfully. |
ValueError: We need an offload_dir to dispatch this model according to this device_map, the following submodul-------this error is at serving server start time, not in inference time. |
I have 2 model folders for llama3, one is the original and another is the finetuned, how to config to use the 2 model folders for djl-serving?
The text was updated successfully, but these errors were encountered: