Training Compute-Optimal Large Language Models

Status: Pending

Author: Jordan Hoffmann, Laurent Sifre, Oriol Vinyals, Sebastian Borgeaud

Topic: Large-Language-Models, Transformers

Category: Architecture, Optimization-No. of params, Pre-Training, Tips & Tricks

Conference: arXiv

Year: 2022

Link: https://arxiv.org/abs/2203.15556

Questions

What did authors try to accomplish?

What were the key elements of the approach?

What can you use yourself from this paper?

What other references to follow?