mlabonne · raigon44 · Jan 5, 2024
diff --git a/README.md b/README.md
@@ -174,7 +174,7 @@ Pre-training is a very long and costly process, which is why this is not the foc
 * [BLOOM](https://bigscience.notion.site/BLOOM-BigScience-176B-Model-ad073ca07cdf479398d5f95d88e218c4) by BigScience: Notion pages that describes how the BLOOM model was built, with a lot of useful information about the engineering part and the problems that were encountered.
 * [OPT-175 Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf) by Meta: Research logs showing what went wrong and what went right. Useful if you're planning to pre-train a very large language model (in this case, 175B parameters).
 * [LLM 360](https://www.llm360.ai/): A framework for open-source LLMs with training and data preparation code, data, metrics, and models.
-
+* [Causal language modeling vs Masked language modeling](https://medium.com/@tom_21755/understanding-causal-llms-masked-llm-s-and-seq2seq-a-guide-to-language-model-training-d4457bbd07fa) by Tomas Vykruta: Explains the difference causal language modeling, masked language modelling and Seq2Seq.
 ---
 ### 4. Supervised Fine-Tuning