From ce9725b8fcf33876079595c04282fbc3d122a4da Mon Sep 17 00:00:00 2001
From: Daniel Bevenius <daniel.bevenius@gmail.com>
Date: Sat, 10 Aug 2024 11:10:53 +0200
Subject: [PATCH] docs: add note about CLS pooling

---
 notes/embeddings.md | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/notes/embeddings.md b/notes/embeddings.md
index d96240fc..006c8481 100644
--- a/notes/embeddings.md
+++ b/notes/embeddings.md
@@ -557,6 +557,14 @@ representation of this token in the context of the entire sequence.
 The effectiveness of last pooling can vary depending on the specific
 architecture of the model. Some models might be better at encoding full-sequence
 information into the final token than others.
-s
 
+And then we have the `CLS` (Classification) pooling which is the same as last
+pooling but it will only use the first token embedding in the sequence which is
+the special `cls` token embedding.
+This approach is inspired by models like BERT, which use a special [CLS] token
+at the beginning of each sequence. The idea is that during training, the model
+learns to encode sequence-level information into this first token.
+This method assumes that the first token can adequately represent the entire
+sequence, which may or may not be true depending on the model's architecture
+and training.