buildGPT

Let's see how far we can get with this one.

Broadcasting: Align the dimensions of the two operands to the right and then check each pair of dimensions Train - (used to train the parameters)
Validation - (used to train the hyperparameters)
Test - (used to evaluate the perfomance at the end)

As the model gets more complex and deeper, it becomes less forgiving, ie you need to fine-tune your hyperparameters very scrupulously.

For a batch size of 32, 0.1 dangerously big to estimate the running mean and bias of the batch. We have used 0.001 for the same batch size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

buildGPT

Files

README.md

Latest commit

History

README.md

File metadata and controls

buildGPT