Compute MLE of biases and weights at BMM node construction by Cesar199999 · Pull Request #53 · NP-Eng/verifiaml

Cesar199999 · 2024-03-26T11:37:55Z

The goal of this PR is to provide a solution for #29. Namely, the MLE's of the weight matrix and bias vector are recomputed at each commit and prove step. The proposed solution is to store the MLE's as an attribute of the BMMNode struct, and to compute them only once at the construction of the BMMNode. This saves a lot of recomputation and removes a duplicated chunk of code. Besides, even though the memory increase can be high (10~8x for 256 bits fields), it won't be a botleneck in the near future. For instance, on a GPT2 model (~700MB), the cached MLEs should't exceed 8GB

@Antonio95 proposed to lazily compute the MLE's at commit time and store them as optional attributes of the BMMNode struct. However, this requires to change the signature of the commit for every node type, to allow a mutable reference to self.

A drawback of this solution is that now BMM nodes must include the field F as a type parameter, note that this would be the case for every solution that stores the MLE's as an attribute. Unfortunately, the generic F bubbles up all the way to the Model struct, which adds an undesired layer of complexity.

@HungryCatsStudio/maintainers, Do you think there is a better solution to this problem? If not, Do you think this solution is acceptable?

mmagician · 2024-04-05T10:20:35Z

That's indeed a shame to have the generics propagate. Maybe @jakehemmerle can jump on a live session with you to see if there's any alternative?

Cesar199999 · 2024-04-10T10:44:46Z

That's indeed a shame to have the generics propagate. Maybe @jakehemmerle can jump on a live session with you to see if there's any alternative?

Sounds good! I'll book a session

Compute MLE of biases and weights at BMM node construction

c3f2e99

Cesar199999 requested a review from Antonio95 March 26, 2024 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute MLE of biases and weights at BMM node construction#53

Compute MLE of biases and weights at BMM node construction#53
Cesar199999 wants to merge 1 commit intomainfrom
store-parameters-as-polynomial

Cesar199999 commented Mar 26, 2024

Uh oh!

mmagician commented Apr 5, 2024

Uh oh!

Cesar199999 commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Cesar199999 commented Mar 26, 2024

Uh oh!

mmagician commented Apr 5, 2024

Uh oh!

Cesar199999 commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants