Skip to content

Latest commit

 

History

History
26 lines (14 loc) · 529 Bytes

Training_Compute-Optimal_Large_Language_Models.md

File metadata and controls

26 lines (14 loc) · 529 Bytes

Training Compute-Optimal Large Language Models

Status: Pending

Author: Jordan Hoffmann, Laurent Sifre, Oriol Vinyals, Sebastian Borgeaud

Topic: Large-Language-Models, Transformers

Category: Architecture, Optimization-No. of params, Pre-Training, Tips & Tricks

Conference: arXiv

Year: 2022

Link: https://arxiv.org/abs/2203.15556

Questions

What did authors try to accomplish?

What were the key elements of the approach?

What can you use yourself from this paper?

What other references to follow?