Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A way to continue training #83

Open
Garshishka opened this issue Apr 21, 2024 · 7 comments
Open

A way to continue training #83

Garshishka opened this issue Apr 21, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@Garshishka
Copy link

Considering that this program can use CPU and low VRam cards to train, how about adding a way or parameter to continue training from saved splat.ply? Is this even feasible?

@pierotofy pierotofy added the enhancement New feature or request label Apr 21, 2024
@pierotofy
Copy link
Owner

pierotofy commented Apr 21, 2024

I don't see why not.

  1. Modify savePly to store the current step count (in a comment PLY header value, maybe)
  2. Read PLY back into the tensors (reverse of savePly), read step count.
  3. Resume from the previous step count.

For a numerically correct resume, one should also dump the optimizer state but I don't think that would actually matter too much for the end result.

We'd welcome a pull request for this. Interested?

@Garshishka
Copy link
Author

I would if I could :(
But cpp and ML are an unknown to me

@stefvfx
Copy link

stefvfx commented Apr 28, 2024

I think it would be very useful.

@Itox001
Copy link

Itox001 commented Jul 6, 2024

+1 for this feature. Currently I can only reasonably train ~3000 iterations before RAM consumption exhausts my resources because of the memory leak on MPS devices. I am hoping that stopping and resuming the training would reset this, allowing me to train for longer.

@eloquentarduino
Copy link

+1. I'm not a C++ guy so I can't help here.

@AsherJingkongChen
Copy link

+1 for this feature. Currently I can only reasonably train ~3000 iterations before RAM consumption exhausts my resources because of the memory leak on MPS devices. I am hoping that stopping and resuming the training would reset this, allowing me to train for longer.

So sad. I hope there would be a solution for you 😊.

@elliotmarks06
Copy link

I would also love to see this added!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants