Skip to content

Compute MLE of biases and weights at BMM node construction#53

Draft
Cesar199999 wants to merge 1 commit intomainfrom
store-parameters-as-polynomial
Draft

Compute MLE of biases and weights at BMM node construction#53
Cesar199999 wants to merge 1 commit intomainfrom
store-parameters-as-polynomial

Conversation

@Cesar199999
Copy link
Contributor

The goal of this PR is to provide a solution for #29. Namely, the MLE's of the weight matrix and bias vector are recomputed at each commit and prove step. The proposed solution is to store the MLE's as an attribute of the BMMNode struct, and to compute them only once at the construction of the BMMNode. This saves a lot of recomputation and removes a duplicated chunk of code. Besides, even though the memory increase can be high (10~8x for 256 bits fields), it won't be a botleneck in the near future. For instance, on a GPT2 model (~700MB), the cached MLEs should't exceed 8GB

@Antonio95 proposed to lazily compute the MLE's at commit time and store them as optional attributes of the BMMNode struct. However, this requires to change the signature of the commit for every node type, to allow a mutable reference to self.

A drawback of this solution is that now BMM nodes must include the field F as a type parameter, note that this would be the case for every solution that stores the MLE's as an attribute. Unfortunately, the generic F bubbles up all the way to the Model struct, which adds an undesired layer of complexity.

@HungryCatsStudio/maintainers, Do you think there is a better solution to this problem? If not, Do you think this solution is acceptable?

@Cesar199999 Cesar199999 requested a review from Antonio95 March 26, 2024 11:43
@mmagician
Copy link
Contributor

That's indeed a shame to have the generics propagate. Maybe @jakehemmerle can jump on a live session with you to see if there's any alternative?

@Cesar199999
Copy link
Contributor Author

That's indeed a shame to have the generics propagate. Maybe @jakehemmerle can jump on a live session with you to see if there's any alternative?

Sounds good! I'll book a session

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants