On-device ONNX Runtime training - Skipping exporting model for inference, as it takes too much memory to generate. Possibility of new idea 💡 #21860
Replies: 1 comment 3 replies
-
Interesting idea. Currently, our save operation is inefficient and causes a lot of memory usage. Since that is the operation that is the bottleneck, even if there are less weights to update, the very operation of converting a model in-memory to on-disk would still use a lot of memory. I can see a version of this working where the generate_artifacts API generates an inference model that references the checkpoint.data file for its external data -- in this case, we would bypass the save / export step, at the cost of having a more bloated external data file for inference. edit: actually, this might not cause memory savings, because you would have to update / save the inference model after creating an updated checkpoint file on disk It is unlikely that we redesign the checkpoint file -- it would require updating the flatbuffers schema for checkpoint, which would cause a lot of work and also would make some checkpoint files incompatible with some versions of ORT, which is something that we generally want to avoid. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone, recently I have been experimenting with ONNX Runtime training on python as well as on the mobile platform Android.
While experimenting, I have noticed that with ONNX Runtime training, after finishing the training we need to export model for inference
export_model_for_inferencing
. In offline phase, this is usually not a problem, however when working with limited memory, this sometimes results in OS killing the process as it runs out of memory.In place of this, I suggest perhaps another idea, however I am unsure if such a thing would be possible as of yet.
Instead of everytime needing to export for inference to remove all the training nodes, we export the inference model in advance, creating thus a training model (already created with generate artifacts) and inference model. Assuming we keep the same model graph. I understand that this would mean having a duplicate model in internal storage, but it may avoid using too much limited ram.
This does raise couple issues, as I don't know how it would work on mobile or in python. And perhaps even more issues, as I am still a beginner with the framework:
Please let me know and discuss, if I have missed something or whether such an option is even possible (sorry if it's a redundant idea).
Beta Was this translation helpful? Give feedback.
All reactions