From a2298313bbe8fc90975f43d2f2ec0435b38871f1 Mon Sep 17 00:00:00 2001
From: Daniel Bevenius <daniel.bevenius@gmail.com>
Date: Tue, 6 Aug 2024 06:58:34 +0200
Subject: [PATCH] docs: fix typoes in mamba notes

---
 notes/mamba.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/notes/mamba.md b/notes/mamba.md
index c8fbf2ea..0974d8b1 100644
--- a/notes/mamba.md
+++ b/notes/mamba.md
@@ -15,9 +15,9 @@ training as they can be parallelized, incontrast to RNNs which are sequential.
 
 But, the issue with transformers is that they don't scale to long sequences
 which is because the self attention mechanism is quadratic in the sequence
-length. Every token has to attend to every other token in a sequenc (n²). So if
+length. Every token has to attend to every other token in a sequence (n²). So if
 we have 40 tokens that means 1600 attention operations, which means more
-computation and this just increases the longer the input sequence it.
+computation and this just increases the longer the input sequence is.
 In this respect RNNs are more performant as they don't have the quadratic
 scaling issue that the self attention mechanism has (but do have other like
 slower training).
@@ -26,7 +26,7 @@ The core of Mamba is state space models (SSMs). Before we go further it might
 make sense to review [RNNs](./rnn.md) and [SSMs](./state-space-models.md).
 
 Selective state space models, which Mamaba is a type of, give us a linear
-recurrent network simliar to RRNs, but also have the fast training that we gets
+recurrent network simliar to RRNs, but also have the fast training that we get
 from transformers. So we get the best of both worlds.
 
 One major difference with state space models is that they have state which is
@@ -37,7 +37,7 @@ like RNNs do have state, but recall that they process the input sequentially.
 To understand how Mamba fits in I found it useful to compare it to how
 transformers look in an neural network:
 ```
-Residul          ↑
+Residual         ↑
      +---------> |
      |           |
      |   +-------------------+
@@ -62,7 +62,7 @@ Residul          ↑
 ```
 And then we have Mamba:
 ```
-Residul          ↑
+Residual         ↑
      +---------> |
      |           |
      |   +-------------------+
@@ -132,7 +132,7 @@ definition of a state space model. This makes sense if we think about it as
 this is not specific to neural networks or even computers. Think about an analog
 system, for example an IoT device that reads the temperature from a sensor
 connected to it. To process this signal it needs to be converted into digital
-form. A simliar thing needs to be done in this case as we can't use continous
+form. A simliar thing needs to be done in this case, as we can't use continous
 signals with computers, just like an IoT can't process an analog signal
 directly. So we need to convert into descrete time steps, similar to how an
 Analog-to-Digital Converter ([ADC]) would convert the signal into quantized
@@ -140,14 +140,14 @@ Analog-to-Digital Converter ([ADC]) would convert the signal into quantized
 
 [ADC]: https://github.com/danbev/learning-iot/tree/master?tab=readme-ov-file#analog-to-digital-converter-adc
 
-So instead of the using functions as shown above we concrete values we will
+So instead of the using functions as shown above with concrete values we will
 transform A and B into discrete values and the equations become:
 ```
      _       _ 
 hₜ = Ahₜ₋₁ + Bxₜ
 yₜ = Chₜ+ Dxₜ
 ```
-To get the A_hat and B_hat values a process called discretization is used.
+To get the `A_hat` and `B_hat` values a process called discretization is used.
 
 ### Discretization
 So we will first discretize the parameters A, and B of the state space model,
@@ -339,7 +339,7 @@ transform A and B into discrete values and the equations become:
 hₜ = Ahₜ₋₁ + Bxₜ
 yₜ = Chₜ+ Dxₜ
 ```
-Where A_hat and B_hat are:
+Where `A_hat` and `B_hat` are:
 ```
 A_hat = (I - Δ/2 A)⁻¹           (⁻¹ inverse bilinear transform)
 B_hat = (I - Δ/2 A)⁻¹ ΔB        (⁻¹ inverse bilinear transform)