Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 598 Bytes

README.md

File metadata and controls

13 lines (10 loc) · 598 Bytes

buildGPT

Let's see how far we can get with this one.

Broadcasting: Align the dimensions of the two operands to the right and then check each pair of dimensions Train - (used to train the parameters)
Validation - (used to train the hyperparameters)
Test - (used to evaluate the perfomance at the end)

As the model gets more complex and deeper, it becomes less forgiving, ie you need to fine-tune your hyperparameters very scrupulously.

For a batch size of 32, 0.1 dangerously big to estimate the running mean and bias of the batch. We have used 0.001 for the same batch size.