[Training] linux ondevice training with onnxruntim failed by loading checkpoint: Segmentation fault (core dumped) #21918

scofild429 · 2024-08-29T21:15:14Z

Describe the issue

After ondevice training artifacts have been created by python, I want use them for ondevice training in linux with C/C++. I started with CPU provider. For inferencing with corresponding ONNX, everything works. But Segmentation fault (core dumped) comes when I want to load the checkpoint for ondevice training, for C

g_ort_training_api->LoadCheckpoint(checkpoint_path.c_str(), &state_c);

also C++
auto state_cpp = Ort::CheckpointState::LoadCheckpoint(checkpoint_path);
I got the same this issue.

And many onnxruntime version(17,18,19) have been tried, problem remains the same.

To reproduce

download onnx runtime-linux-x64-gpu*, and header file and lib in system, and call the checkpoint loading function in a CMake project or single file project.

Urgency

urgency!!!

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

18

PyTorch Version

2.3.1+cu121

Execution Provider

Default CPU

Execution Provider Library Version

No response

scofild429 added the training issues related to ONNX Runtime training; typically submitted using template label Aug 29, 2024

scofild429 closed this as completed Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Training] linux ondevice training with onnxruntim failed by loading checkpoint: Segmentation fault (core dumped) #21918

[Training] linux ondevice training with onnxruntim failed by loading checkpoint: Segmentation fault (core dumped) #21918

scofild429 commented Aug 29, 2024

[Training] linux ondevice training with onnxruntim failed by loading checkpoint: Segmentation fault (core dumped) #21918

[Training] linux ondevice training with onnxruntim failed by loading checkpoint: Segmentation fault (core dumped) #21918

Comments

scofild429 commented Aug 29, 2024

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

PyTorch Version

Execution Provider

Execution Provider Library Version