Trainning Standard ONNX models #5969

tiberiusferreira · 2020-11-29T06:10:36Z

tiberiusferreira
Nov 29, 2020

Hello, I couldn't find a way to run a standalone training ONNX graph using its embedded training information, is there a way to do it? I could only find examples focused on running Pytorch models.

The hacky workaround I'm using here is configuring an ORTTrainer and accessing the private member trainer._training_session.get_state() to get the updated parameters.

I'm wondering if there is a better way to run a trainable ONNX model and gets it's outputs (updated weights + loss), preferably using the embedded training and Gradient operator so have more control over the training loop.

Thank you for the attention.

kshama-msft · 2020-12-02T23:28:14Z

kshama-msft
Dec 2, 2020
Collaborator

Hello, thank you for trying out the ONNX Runtime Training feature. Currently ONNX Runtime Training is designed to accelerate distributed training for PyTorch models in a multi-GPU training environment. There is also a limitation in the ONNX spec with no operators available for training. However, we would love to learn more about your use case of training standalone ONNX models. Please share any additional context. We also encourage you to try ORT Training for Transformer based models in PyTorch.

1 reply

tiberiusferreira Dec 3, 2020
Author

Hello, thanks for the reply. So my use case is being able to train networks in a portable way outside a Python environment.

There are great ONNX runtimes by a number of vendors to optimize inference, yet the most time consuming part which is getting the model trained is limited to a few frameworks.

In order to train models outside Python, one could use the C++ bindings, but, at least for Pytorch, you can't export the model to ONNX for inference, being forever tied to shipping libtorch and not benefiting from vendors ONNX runners.

So, being able to decouple the model declaration from the training runtime would allow anyone, from whatever language/environment to declare a ONNX model then plug a runtime for training.

As for the limitation of the ONNX spec, I hear you, but what is already there can be quite useful.

For a basic ONNX training loop we only need to:

Get the loss (can be declared as a normal output node)
Get the gradients of the weights W.R.T the loss -> Here we would need the Gradient Op so we can update the weights
A C API of some sort to be able to quickly run the graph, update the model and run it again. My current hacky Python script takes about ~1 second just to run the import lines.

Sure, it is far from perfect (updating the weights ourselves involves copying, not very flexible, etc), but it would be a start.

Looking forward to hearing your thoughts.

kshama-msft · 2020-12-03T21:37:03Z

kshama-msft
Dec 3, 2020
Collaborator

Hello, thank you for the additional context, this was helpful in understanding the scenario better. Regarding the limitation of the ONNX spec, please raise an issue in the ONNX repo for operator support for training. Regarding your question around the API to run the ONNX model and get its outputs, currently we do not support such an option and the faster route would be the ONNX spec update for training.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainning Standard ONNX models #5969

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Trainning Standard ONNX models #5969

tiberiusferreira Nov 29, 2020

Replies: 2 comments · 1 reply

kshama-msft Dec 2, 2020 Collaborator

tiberiusferreira Dec 3, 2020 Author

kshama-msft Dec 3, 2020 Collaborator

tiberiusferreira
Nov 29, 2020

Replies: 2 comments 1 reply

kshama-msft
Dec 2, 2020
Collaborator

tiberiusferreira Dec 3, 2020
Author

kshama-msft
Dec 3, 2020
Collaborator