Dynamically Serving Inference Model Adapters with ORT #21406
Replies: 1 comment 8 replies
-
That's not possible with ORT as those take effect when the session is initialized. You could potentially add a single empty LoRA to an ONNX UNet model and then add the weights for that as an initializer, but you'd need to destroy the session & recreate it every time you wanted to change LoRA, and you'd also need a different ONNX model for different size LoRAs or ones that are placed on different layers. Creating the ONNX model with the attached LoRA is something I'm not sure how to do automatically, though it's definitely possible if you edit an ONNX protobuf (but it would be nasty to do by hand). The tests for |
Beta Was this translation helpful? Give feedback.
-
Hi Guys,
I'm trying to build a inference server in ORT (Java) and want to add dynamic model adapter serving capabilities as in this HF blog post here. Specifically I want to be able to dynamically "enable" and "disable" preloaded LoRAs and DoRAs for a UNet. From what I've seen so far, I should be doing this by using fixtures like
AddExternalInitializers()
,AddExternalInitializersFromFilesInMemory()
andAddInitializer()
. The thing is that I have no idea how to use these methods. Can anybody point me to code examples, ideally in Java?Thanks!
Beta Was this translation helpful? Give feedback.
All reactions