On-device ONNX Runtime training - Skipping exporting model for inference, as it takes too much memory to generate. Possibility of new idea 💡 #21860

martinkorelic · 2024-08-26T15:22:19Z

martinkorelic
Aug 26, 2024

Hello everyone, recently I have been experimenting with ONNX Runtime training on python as well as on the mobile platform Android.
While experimenting, I have noticed that with ONNX Runtime training, after finishing the training we need to export model for inference export_model_for_inferencing. In offline phase, this is usually not a problem, however when working with limited memory, this sometimes results in OS killing the process as it runs out of memory.

In place of this, I suggest perhaps another idea, however I am unsure if such a thing would be possible as of yet.
Instead of everytime needing to export for inference to remove all the training nodes, we export the inference model in advance, creating thus a training model (already created with generate artifacts) and inference model. Assuming we keep the same model graph. I understand that this would mean having a duplicate model in internal storage, but it may avoid using too much limited ram.

We train the training model and update the weights in certain nodes that need updating
After training we simply copy the new weights to the inference model, only for the nodes that don't have weights frozen
Load the inference model instead of exporting the whole training model

This does raise couple issues, as I don't know how it would work on mobile or in python. And perhaps even more issues, as I am still a beginner with the framework:

How would one copy the weights only for the unfrozen nodes? The current generate_artifact API exports the whole weights in a single checkpoint.data file, would it be possible to segment this by layers and then only update certain ones, to avoid loading everything in memory again.
Is there perhaps an operation that should happen in export for inferencing that doesn't allow such an option to implement in practice?
Would this introduce more or less memory ? If we only copy and update weight layer files that need updating, we could avoid exporting the whole model.

Please let me know and discuss, if I have missed something or whether such an option is even possible (sorry if it's a redundant idea).

carzh · 2024-08-26T18:10:46Z

carzh
Aug 26, 2024
Collaborator

Interesting idea.

Currently, our save operation is inefficient and causes a lot of memory usage. Since that is the operation that is the bottleneck, even if there are less weights to update, the very operation of converting a model in-memory to on-disk would still use a lot of memory.

I can see a version of this working where the generate_artifacts API generates an inference model that references the checkpoint.data file for its external data -- in this case, we would bypass the save / export step, at the cost of having a more bloated external data file for inference. edit: actually, this might not cause memory savings, because you would have to update / save the inference model after creating an updated checkpoint file on disk

It is unlikely that we redesign the checkpoint file -- it would require updating the flatbuffers schema for checkpoint, which would cause a lot of work and also would make some checkpoint files incompatible with some versions of ORT, which is something that we generally want to avoid.

3 replies

martinkorelic Aug 27, 2024
Author

Is there perhaps a way that the inference model references the same checkpoint.data? That way training model and inference model share the checkpoint data? In API docs I have noticed that there is a function to save just the checkpoint when doing Ort training in Java.

In my case, I have a different training model and inference model in terms of outputs and inputs, but the weights and structure is same (inference model has more inputs so its compatible with onnx genai framework). Also this way I would avoid exporting the model for inference (but would still need to overwrite all the weights with saving the checkpoint, which would probably take the same amount of memory). I have tried with implementing a conditional input nodes to combine both train/inference model but its more complicated.

martinkorelic Aug 27, 2024
Author

After experimenting a bit I figured out a way I could make my case work (still however used export for inference function) after a training session.
Although it would be better for exporting for inference, that we could only export certain layers whose weights are updated, instead of exporting the whole graphs, where some weight layers stay the same.

martinkorelic Sep 2, 2024
Author

@carzh
How about loading the weights from the checkpoint state, like this before doing inference session? This would add the weights beforehand not serialize them from the model when creating an inference session, if i am correct.

for (const auto& layer : layer_names) {
        auto parameter = checkpoint_state.GetParameter(layer);

        // Ensure the weight_value is not null
        if (parameter) {
            session->session_options.AddInitializer(layer.c_str(), parameter);
       }
}

However this seems to cause a problem since the parameter is not owned by the user? terminating due to uncaught exception of type Ort::Exception: Buffer containing the initializer must be owned by the user.
Is there a workaround for doing this, what needs to be done here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On-device ONNX Runtime training - Skipping exporting model for inference, as it takes too much memory to generate. Possibility of new idea 💡 #21860

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

On-device ONNX Runtime training - Skipping exporting model for inference, as it takes too much memory to generate. Possibility of new idea 💡 #21860

martinkorelic Aug 26, 2024

Replies: 1 comment · 3 replies

carzh Aug 26, 2024 Collaborator

martinkorelic Aug 27, 2024 Author

martinkorelic Aug 27, 2024 Author

martinkorelic Sep 2, 2024 Author

martinkorelic
Aug 26, 2024

Replies: 1 comment 3 replies

carzh
Aug 26, 2024
Collaborator

martinkorelic Aug 27, 2024
Author

martinkorelic Aug 27, 2024
Author

martinkorelic Sep 2, 2024
Author