diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..ddb02da
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,6 @@
+.venv
+logs
+*.tfrecord
+*.npz
+*.zip
+*.pkl
diff --git a/exam/questions.md b/exam/questions.md
index 06452e5..2499f71 100644
--- a/exam/questions.md
+++ b/exam/questions.md
@@ -108,6 +108,8 @@
 
 - Compare Cutout and DropBlock. [5]
 
+- Describe in detail how is CutMix performed. [5]
+
 - Describe Squeeze and Excitation applied to a ResNet block. [5]
 
 - Draw the Mobile inverted bottleneck block (including explanation of separable
@@ -119,3 +121,91 @@
   channels. Write down (or derive) the equation of transposed convolution
   (or equivalently backpropagation through a convolution to its inputs). [5]
 
+#### Questions@:, Lecture 6 Questions
+- Describe the differences among semantic segmentation, image classification,
+  object detection, and instance segmentation, and write down which metrics
+  are used for these tasks. [5]
+
+- Write down how is $\mathit{AP}_{50}$ computed. [5]
+
+- Considering a Fast-RCNN architecture, draw overall network architecture,
+  explain what a RoI-pooling layer is, show how the network parametrizes
+  bounding boxes and write down the loss. Finally, describe non-maximum
+  suppression and how the Fast-RCNN prediction is performed. [10]
+
+- Considering a Faster-RCNN architecture, describe the region proposal network
+  (what are anchors, architecture including both heads, how are the coordinates
+  of proposals parametrized, what does the loss look like). [10]
+
+- Considering Mask-RCNN architecture, describe the additions to a Faster-RCNN
+  architecture (the RoI-Align layer, the new mask-producing head). [5]
+
+- Write down the focal loss with class weighting, including the commonly used
+  hyperparameter values. [5]
+
+- Draw the overall architecture of a RetinaNet architecture (the computation of
+  $C_1, \ldots, C_7$, the FPN architecture computing $P_1, \ldots, P_7$
+  including the block combining feature maps of different resolutions; the
+  classification and bounding box generation heads, including their output
+  size). Write down the losses for both heads. [10]
+
+- Describe GroupNorm, and compare it to BatchNorm and LayerNorm. [5]
+
+#### Questions@:, Lecture 8 Questions
+- Write down how the Long Short-Term Memory (LSTM) cell operates, including
+  the explicit formulas. Also mention the forget gate bias. [10]
+
+- Write down how the Gated Recurrent Unit (GRU) operates, including
+  the explicit formulas. [10]
+
+- Describe Highway network computation. [5]
+
+- Why the usual dropout cannot be used on recurrent state? Describe
+  how the problem can be alleviated with variational dropout. [5]
+
+- Describe layer normalization including all its parameters, and write down how
+  it is computed (be sure to explicitly state over what is being normalized in
+  case of fully connected layers and convolutional layers). [5]
+
+- Draw a tagger architecture utilizing word embeddings, recurrent
+  character-level word embeddings (including how are these computed from
+  individual characters), and two sentence-level bidirectional RNNs (explaining
+  the bidirectionality) with a residual connection. Where would you put the
+  dropout layers? [10]
+
+#### Questions@:, Lecture 9 Questions
+- In the context of named entity recognition, describe what the BIO encoding
+  is and why it is used. [5]
+
+- Write down the dynamic programming algorithm for decoding a BIO-tag sequence,
+  including its asymptotic complexity. [10]
+
+- In the context of CTC loss, describe regular and extended labelings and
+  write down the algorithm for computing the log probability of a gold label
+  sequence $\boldsymbol y$. [10]
+
+- Describe how CTC predictions are performed using a beam-search. [5]
+
+- Draw the CBOW architecture from `word2vec`, including the sizes of the inputs
+  and the sizes of the outputs and used non-linearities. Also make sure to
+  indicate where the embeddings are being trained. [5]
+
+- Draw the SkipGram architecture from `word2vec`, including the sizes of the
+  inputs and the sizes of the outputs and used non-linearities. Also make sure
+  to indicate where the embeddings are being trained. [5]
+
+- Describe the hierarchical softmax used in `word2vec`. [5]
+
+- Describe the negative sampling proposed in `word2vec`, including
+  the choice of distribution of negative samples. [5]
+
+#### Questions@:, Lecture 10 Questions
+- Write down why are subword units used in text processing, and describe the BPE
+  algorithm for constructing a subword dictionary from a large corpus. [5]
+
+- Write down why are subword units used in text processing, and describe the
+  WordPieces algorithm for constructing a subword dictionary from a large
+  corpus. [5]
+
+- Pinpoint the differences between the BPE and WordPieces algorithms, both
+  during dictionary construction and during inference. [5]
diff --git a/labs/.gitignore b/labs/.gitignore
index 6319f80..acfd147 100644
--- a/labs/.gitignore
+++ b/labs/.gitignore
@@ -3,5 +3,5 @@ logs/
 *.h5
 *.keras
 *.npz
-*.pickle
+*.tfrecord
 *.zip
diff --git a/labs/04/cifar10.py b/labs/04/cifar10.py
index 0ed0533..ec06755 100644
--- a/labs/04/cifar10.py
+++ b/labs/04/cifar10.py
@@ -33,7 +33,8 @@ def dataset(self, transform: Callable[[dict[str, np.ndarray]], Any] | None = Non
             return CIFAR10.TorchDataset(self, transform)
 
     class TorchDataset(torch.utils.data.Dataset):
-        def __init__(self, dataset: "Dataset", transform: Callable[[dict[str, np.ndarray]], Any] | None) -> None:
+        def __init__(self, dataset: "CIFAR10.Dataset",
+                     transform: Callable[[dict[str, np.ndarray]], Any] | None) -> None:
             self._dataset = dataset
             self._transform = transform
 
diff --git a/labs/04/cifar10_v2.py b/labs/04/cifar10_v2.py
new file mode 100644
index 0000000..e6b6748
--- /dev/null
+++ b/labs/04/cifar10_v2.py
@@ -0,0 +1,99 @@
+import os
+import sys
+from typing import Any, Callable, Sequence, TextIO, TypedDict
+import urllib.request
+
+import numpy as np
+import torch
+
+
+class CIFAR10:
+    H: int = 32
+    W: int = 32
+    C: int = 3
+    LABELS: list[str] = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
+
+    Element = TypedDict("Element", {"image": np.ndarray, "label": np.ndarray})
+    Elements = TypedDict("Elements", {"images": np.ndarray, "labels": np.ndarray})
+
+    _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/cifar10_competition.npz"
+
+    class Dataset(torch.utils.data.Dataset):
+        def __init__(self, data: "CIFAR10.Elements") -> None:
+            self._data = data
+            self._data["labels"] = self._data["labels"].ravel()
+
+        @property
+        def data(self) -> "CIFAR10.Elements":
+            return self._data
+
+        def __len__(self) -> int:
+            return len(self._data["images"])
+
+        def __getitem__(self, index: int) -> "CIFAR10.Element":
+            return {key.removesuffix("s"): value[index] for key, value in self._data.items()}
+
+        def transform(self, transform: Callable[["CIFAR10.Element"], Any]) -> "CIFAR10.TransformedDataset":
+            return CIFAR10.TransformedDataset(self, transform)
+
+    class TransformedDataset(torch.utils.data.Dataset):
+        def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None:
+            self._dataset = dataset
+            self._transform = transform
+
+        def __len__(self) -> int:
+            return len(self._dataset)
+
+        def __getitem__(self, index: int) -> Any:
+            item = self._dataset[index]
+            return self._transform(*item) if isinstance(item, tuple) else self._transform(item)
+
+        def transform(self, transform: Callable[..., Any]) -> "CIFAR10.TransformedDataset":
+            return CIFAR10.TransformedDataset(self, transform)
+
+    def __init__(self, size: dict[str, int] = {}) -> None:
+        path = os.path.basename(self._URL)
+        if not os.path.exists(path):
+            print("Downloading CIFAR-10 dataset...", file=sys.stderr)
+            urllib.request.urlretrieve(self._URL, filename="{}.tmp".format(path))
+            os.rename("{}.tmp".format(path), path)
+
+        cifar = np.load(path)
+        for dataset in ["train", "dev", "test"]:
+            data = {key[len(dataset) + 1:]: cifar[key][:size.get(dataset, None)]
+                    for key in cifar if key.startswith(dataset)}
+            setattr(self, dataset, self.Dataset(data))
+
+    train: Dataset
+    dev: Dataset
+    test: Dataset
+
+    # Evaluation infrastructure.
+    @staticmethod
+    def evaluate(gold_dataset: Dataset, predictions: Sequence[int]) -> float:
+        gold = gold_dataset.data["labels"]
+
+        if len(predictions) != len(gold):
+            raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format(
+                len(predictions), len(gold)))
+
+        correct = sum(gold[i] == predictions[i] for i in range(len(gold)))
+        return 100 * correct / len(gold)
+
+    @staticmethod
+    def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float:
+        predictions = [int(line) for line in predictions_file]
+        return CIFAR10.evaluate(gold_dataset, predictions)
+
+
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate")
+    parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate")
+    args = parser.parse_args()
+
+    if args.evaluate:
+        with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file:
+            accuracy = CIFAR10.evaluate_file(getattr(CIFAR10(), args.dataset), predictions_file)
+        print("CIFAR10 accuracy: {:.2f}%".format(accuracy))
diff --git a/labs/05/cags_dataset.py b/labs/05/cags_dataset.py
index 782d19f..127bc21 100644
--- a/labs/05/cags_dataset.py
+++ b/labs/05/cags_dataset.py
@@ -1,7 +1,8 @@
+import array
 import os
 import sys
 import struct
-from typing import Any, Callable, Sequence, TextIO
+from typing import Any, Callable, Sequence, TextIO, TypedDict
 import urllib.request
 os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
 
@@ -28,45 +29,60 @@ class CAGS:
         "scottish_terrier", "shiba_inu", "staffordshire_bull_terrier",
         "wheaten_terrier", "yorkshire_terrier",
     ]
+    Element = TypedDict("Element", {"image": torch.Tensor, "mask": torch.Tensor, "label": torch.Tensor})
 
     _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/"
 
     class Dataset(torch.utils.data.Dataset):
-        def __init__(self, path: str, size: int) -> None:
-            self._path = path
-            self._data = None
+        def __init__(self, path: str, size: int, decode_on_demand: bool) -> None:
             self._size = size
 
+            arrays, indices = CAGS._load_data(path, size)
+            if decode_on_demand:
+                self._data, self._arrays, self._indices = None, arrays, indices
+            else:
+                self._data = [self._decode(arrays, indices, i) for i in range(size)]
+
         def __len__(self) -> int:
             return self._size
 
-        def __getitem__(self, index: int) -> dict[str, torch.Tensor]:
-            if self._data is None:
-                self._data = []
-                for entry in CAGS._load_data(self._path, self._size):
-                    entry["image"] = torchvision.io.decode_image(
-                        torch.from_numpy(entry["image"]), torchvision.io.ImageReadMode.RGB).permute(1, 2, 0)
-                    entry["mask"] = (torchvision.io.decode_image(torch.from_numpy(entry["mask"])).to(
-                        dtype=torch.float32) / 255).permute(1, 2, 0)
-                    entry["label"] = torch.tensor(entry["label"][0])
-                    self._data.append(entry)
-            return self._data[index]
-
-        def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torch.utils.data.Dataset:
+        def __getitem__(self, index: int) -> "CAGS.Element":
+            if self._data:
+                return self._data[index]
+            return self._decode(self._arrays, self._indices, index)
+
+        def transform(self, transform: Callable[["CAGS.Element"], Any]) -> "CAGS.TransformedDataset":
             return CAGS.TransformedDataset(self, transform)
 
+        def _decode(self, data: dict, indices: dict, index: int) -> "CAGS.Element":
+            return {
+                "image": torchvision.io.decode_image(
+                    torch.frombuffer(data["image"], dtype=torch.uint8, offset=indices["image"][:-1][index],
+                                     count=indices["image"][1:][index] - indices["image"][:-1][index]),
+                    torchvision.io.ImageReadMode.RGB).permute(1, 2, 0),
+                "mask": torchvision.io.decode_image(
+                    torch.frombuffer(data["mask"], dtype=torch.uint8, offset=indices["mask"][:-1][index],
+                                     count=indices["mask"][1:][index] - indices["mask"][:-1][index]),
+                    torchvision.io.ImageReadMode.GRAY).to(dtype=torch.float32).div(255).permute(1, 2, 0),
+                "label": torch.tensor(data["label"][index]),
+            }
+
     class TransformedDataset(torch.utils.data.Dataset):
-        def __init__(self, dataset: "Dataset", transform: Callable[[dict[str, torch.Tensor]], Any]) -> None:
+        def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None:
             self._dataset = dataset
             self._transform = transform
 
         def __len__(self) -> int:
-            return self._dataset._size
+            return len(self._dataset)
 
         def __getitem__(self, index: int) -> Any:
-            return self._transform(self._dataset[index])
+            item = self._dataset[index]
+            return self._transform(*item) if isinstance(item, tuple) else self._transform(item)
 
-    def __init__(self) -> None:
+        def transform(self, transform: Callable[..., Any]) -> "CAGS.TransformedDataset":
+            return CAGS.TransformedDataset(self, transform)
+
+    def __init__(self, decode_on_demand: bool = False) -> None:
         for dataset, size in [("train", 2_142), ("dev", 306), ("test", 612)]:
             path = "cags.{}.tfrecord".format(dataset)
             if not os.path.exists(path):
@@ -74,7 +90,7 @@ def __init__(self) -> None:
                 urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path))
                 os.rename("{}.tmp".format(path), path)
 
-            setattr(self, dataset, self.Dataset(path, size))
+            setattr(self, dataset, self.Dataset(path, size, decode_on_demand))
 
     train: Dataset
     dev: Dataset
@@ -82,24 +98,22 @@ def __init__(self) -> None:
 
     # TFRecord loading
     @staticmethod
-    def _load_data(path: str, items: int) -> list[dict[str, Any]]:
-        def get_value() -> int:
+    def _load_data(path: str, items: int) -> tuple[dict[str, array.array], dict[str, array.array]]:
+        def get_value() -> np.int64:
             nonlocal data, offset
             value = np.int64(data[offset] & 0x7F); start = offset; offset += 1
             while data[offset - 1] & 0x80:
                 value |= (data[offset] & 0x7F) << (7 * (offset - start)); offset += 1
             return value
 
-        def get_value_of_kind(kind: int) -> int:
+        def get_value_of_kind(kind: int) -> np.int64:
             nonlocal data, offset
             assert data[offset] == kind; offset += 1
             return get_value()
 
-        entries = []
+        arrays, indices = {}, {}
         with open(path, "rb") as file:
-            while len(entries) < items:
-                entries.append({})
-
+            for _ in range(items):
                 length = file.read(8); assert len(length) == 8
                 length, = struct.unpack("<Q", length)
                 assert len(file.read(4)) == 4
@@ -113,27 +127,27 @@ def get_value_of_kind(kind: int) -> int:
                     get_value_of_kind(0x0A)
                     length = get_value_of_kind(0x0A)
                     key = data[offset:offset + length].decode("utf-8"); offset += length
-
                     get_value_of_kind(0x12)
+                    if key not in arrays:
+                        arrays[key] = array.array({0x0A: "B", 0x1A: "q", 0x12: "f"}.get(data[offset], "B"))
+                        indices[key] = array.array("L", [0])
+
                     if data[offset] == 0x0A:
-                        get_value_of_kind(0x0A)
-                        length = get_value_of_kind(0x0A)
-                        entries[-1][key] = np.frombuffer(data, np.uint8, length, offset).copy(); offset += length
+                        length = get_value_of_kind(0x0A) and get_value_of_kind(0x0A)
+                        arrays[key].frombytes(data[offset:offset + length]); offset += length
                     elif data[offset] == 0x1A:
-                        get_value_of_kind(0x1A)
-                        length = get_value_of_kind(0x0A)
-                        values, target_offset = [], offset + length
+                        length = get_value_of_kind(0x1A) and get_value_of_kind(0x0A)
+                        target_offset = offset + length
                         while offset < target_offset:
-                            values.append(get_value())
-                        entries[-1][key] = np.array(values, dtype=np.int64)
+                            arrays[key].append(get_value())
                     elif data[offset] == 0x12:
-                        get_value_of_kind(0x12)
-                        length = get_value_of_kind(0x0A)
-                        entries[-1][key] = np.frombuffer(
-                            data, np.dtype("<f4"), length >> 2, offset).astype(np.float32).copy(); offset += length
+                        length = get_value_of_kind(0x12) and get_value_of_kind(0x0A)
+                        arrays[key].frombytes(np.frombuffer(
+                            data, np.dtype("<f4"), length >> 2, offset).astype(np.float32).tobytes()); offset += length
                     else:
                         raise ValueError("Unsupported data tag {}".format(data[offset]))
-        return entries
+                    indices[key].append(len(arrays[key]))
+        return arrays, indices
 
     # Keras IoU metric
     class MaskIoUMetric(keras.metrics.Mean):
@@ -203,18 +217,20 @@ def evaluate_segmentation_file(gold_dataset: Dataset, predictions_file: TextIO)
 if __name__ == "__main__":
     import argparse
     parser = argparse.ArgumentParser()
-    parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate")
     parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate")
+    parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate")
     parser.add_argument("--task", default="classification", type=str, help="Task to evaluate")
     args = parser.parse_args()
 
     if args.evaluate:
+        gold_dataset = getattr(CAGS(decode_on_demand=True), args.dataset)
+
         if args.task == "classification":
             with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file:
-                accuracy = CAGS.evaluate_classification_file(getattr(CAGS(), args.dataset), predictions_file)
+                accuracy = CAGS.evaluate_classification_file(gold_dataset, predictions_file)
             print("CAGS accuracy: {:.2f}%".format(accuracy))
 
         if args.task == "segmentation":
             with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file:
-                iou = CAGS.evaluate_segmentation_file(getattr(CAGS(), args.dataset), predictions_file)
+                iou = CAGS.evaluate_segmentation_file(gold_dataset, predictions_file)
             print("CAGS IoU: {:.2f}%".format(iou))
diff --git a/labs/05/cnn_manual.py b/labs/05/cnn_manual.py
index 26d3303..ac6c99b 100644
--- a/labs/05/cnn_manual.py
+++ b/labs/05/cnn_manual.py
@@ -72,13 +72,13 @@ def backward(
         if self._verify:
             inputs.requires_grad_(True)
             inputs.grad = self._kernel.value.grad = self._bias.value.grad = None
-            reference = keras.ops.relu(keras.ops.conv(inputs, self._kernel, self._stride) + self._bias)
+            reference = (outputs > 0) * (keras.ops.conv(inputs, self._kernel, self._stride) + self._bias)
             reference.backward(gradient=outputs_gradient, inputs=[inputs, self._kernel.value, self._bias.value])
             for name, computed, reference in zip(
-                    ["Inputs", "Kernel", "Bias"], [inputs_gradient, kernel_gradient, bias_gradient],
-                    [inputs.grad, self._kernel.value.grad, self._bias.value.grad]):
+                    ["Bias", "Kernel", "Inputs"], [bias_gradient, kernel_gradient, inputs_gradient],
+                    [self._bias.value.grad, self._kernel.value.grad, inputs.grad]):
                 np.testing.assert_allclose(keras.ops.convert_to_numpy(computed), keras.ops.convert_to_numpy(reference),
-                                           atol=1e-4, err_msg=name + " gradient differs!")
+                                           atol=2e-4, err_msg=name + " gradient differs!")
 
         # Return the inputs gradient, the layer variables, and their gradients.
         return inputs_gradient, [self._kernel, self._bias], [kernel_gradient, bias_gradient]
diff --git a/labs/06/Untitled-1.py b/labs/06/Untitled-1.py
new file mode 100644
index 0000000..f3c5870
--- /dev/null
+++ b/labs/06/Untitled-1.py
@@ -0,0 +1,201 @@
+#!/usr/bin/env python3
+import argparse
+import datetime
+import os
+import re
+
+import torch.utils
+import torch.utils.data
+os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+import keras
+import numpy as np
+import torch, torchvision
+
+import bboxes_utils
+from svhn_dataset import SVHN
+
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
+
+# TODO: Define reasonable defaults and optionally more parameters.
+# Also, you can set the number of threads to 0 to use all your CPU cores.
+parser = argparse.ArgumentParser()
+parser.add_argument("--batch_size", default=64, type=int, help="Batch size.")
+parser.add_argument("--epochs", default=10, type=int, help="Number of epochs.")
+parser.add_argument("--seed", default=42, type=int, help="Random seed.")
+parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+parser.add_argument("--learning_rate", default=0.001, type=float, help="Learning rate for training.")
+parser.add_argument("--image_size", default=224, type=int, help="A fixed image size.")
+parser.add_argument("--iou_threshold", default=0.7, type=int, help="The intersection over union threshold.")
+parser.add_argument("--model_file", default=None, type=str, help="Pretrained model to load.")
+
+
+class TorchTensorBoardCallback(keras.callbacks.Callback):
+    def __init__(self, path):
+        self._path = path
+        self._writers = {}
+
+    def writer(self, writer):
+        if writer not in self._writers:
+            import torch.utils.tensorboard
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))
+        return self._writers[writer]
+
+    def add_logs(self, writer, logs, step):
+        if logs:
+            for key, value in logs.items():
+                self.writer(writer).add_scalar(key, value, step)
+            self.writer(writer).flush()
+
+    def on_epoch_end(self, epoch, logs=None):
+        if logs:
+            if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer):
+                logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}
+            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1)
+            self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1)
+
+
+def main(args: argparse.Namespace) -> None:
+    # Set the random seed and the number of threads.
+    keras.utils.set_random_seed(args.seed)
+    if args.threads:
+        torch.set_num_threads(args.threads)
+        torch.set_num_interop_threads(args.threads)
+
+    # Create logdir name
+    args.logdir = os.path.join("logs", "{}-{}-{}".format(
+        os.path.basename(globals().get("__file__", "notebook")),
+        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
+    ))
+
+    # Load the data. The individual examples are dictionaries with the keys:
+    # - "image", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range,
+    # - "classes", a `[num_digits]` vector with classes of image digits,
+    # - "bboxes", a `[num_digits, 4]` vector with bounding boxes of image digits.
+    svhn = SVHN()
+    # e.g., image H=224, W=224, top level pixel = 7X7, anchor height, width = 224/7, 224/7=32,32
+    def get_anchors(backbone_output_pixel=14, image_size=args.image_size):
+        square_anchors = []
+        img_H, img_W = image_size, image_size
+        square_anchor_h, square_anchor_w = img_H//backbone_output_pixel, img_W//backbone_output_pixel
+        for h in range(0, img_H, square_anchor_h):
+            for w in range(0, img_W, square_anchor_w):
+                square_anchors.append([h, w, h+square_anchor_h, w+square_anchor_w])
+        return np.array(square_anchors)
+
+
+    def prepare_data(example):
+        gold_bboxes = example["bboxes"]/example["image"].shape[0]
+        image_resized = keras.ops.image.resize(example["image"], (args.image_size, args.image_size))
+        gold_classes, iou_threshold = example["classes"], args.iou_threshold
+        anchor_classes, anchor_bboxes = bboxes_utils.bboxes_training(anchors, gold_classes, gold_bboxes, iou_threshold)
+        anchor_classes_one_hot = keras.ops.one_hot(anchor_classes-1, svhn.LABELS)
+        classes_sample_weight = keras.ops.ones_like(anchor_classes)
+        bboxes_sample_weight = anchor_classes > 0
+        return image_resized, (anchor_classes_one_hot, anchor_bboxes), (classes_sample_weight, bboxes_sample_weight)
+
+    model = None
+    test = torch.utils.data.DataLoader(svhn.test, batch_size=args.batch_size)
+
+    if args.model_file:
+        model = keras.models.load_model(args.model_file)
+    else:
+
+        anchors = get_anchors()
+        svhn.train, svhn.dev, svhn.test = svhn.train.transform(prepare_data), svhn.dev.transform(prepare_data), svhn.test.transform(prepare_data)
+        print(svhn.train[0])
+        train = torch.utils.data.DataLoader(
+            svhn.train, batch_size=args.batch_size, shuffle=True) #num_workers=1, persistent_workers=True)
+        dev = torch.utils.data.DataLoader(svhn.dev, batch_size=args.batch_size)
+        #train_imgs, train_labels, train_sample_weights = np.array([e[0] for e in train]), np.array([e[1] for e in train]), np.array([e[2] for e in train])
+        #dev_imgs, dev_labels, dev_sample_weights = np.array([e[0] for e in dev]), np.array([e[1] for e in dev]), np.array([e[2] for e in dev])
+
+        # Load the EfficientNetV2-B0 model. It assumes the input images are
+        # represented in the [0-255] range.
+        backbone = keras.applications.EfficientNetV2B0(include_top=False)
+
+        # Extract features of different resolution. Assuming 224x224 input images
+        # (you can set this explicitly via `input_shape` of the above constructor),
+        # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112.
+        backbone = keras.Model(
+            inputs=backbone.input,
+            outputs=[backbone.get_layer(layer).output for layer in [
+                "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]]
+        )
+
+        # TODO: Create the model and train it
+        backbone.trainable = False
+        inputs = keras.layers.Input(shape=(args.image_size,args.image_size, 3))
+        # backbone outputs bottom to up (layer 1-5): block1a, block2b, block3b, block5e, top
+        # shapes: 7x7x1280, 14x14x112, 28x28x40, 56x56x24, 112x112x16
+        top, block5e, block3b, block2b, block1a = backbone(inputs)
+
+        def bn_relu(inputs):
+            return keras.layers.ReLU()(keras.layers.BatchNormalization()(inputs))
+
+        ### classification and bbox regression head
+        ### 9 is the anchor number for RetinaNet
+        def heads(input_feature, type="classification", anchor_number=1):
+            activ, output_size = None, 0
+            if type.lower() == "classification":
+                activ, output_size = "sigmoid", svhn.LABELS*anchor_number
+            elif type.lower() == "regression":
+                activ, output_size = None, 4*anchor_number
+            else:
+                print("Type can only be 'classification' or 'regression'!")
+            conv1 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(input_feature))
+            conv2 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(conv1))
+            conv3 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(conv2))
+            outputs = keras.layers.Conv2D(output_size, 3, 1, "same", activation=activ)(conv3)
+            outputs = keras.layers.Reshape((outputs.shape[1]*outputs.shape[2], outputs.shape[3]))(outputs)
+            return outputs
+
+        # only use the top layer output
+        feature = block5e
+        cls_output = heads(feature)
+        reg_output = heads(feature, "regression")
+
+        model = keras.Model(inputs, [cls_output, reg_output], name="baseline")
+        model.summary()
+
+        model.compile(
+            optimizer=keras.optimizers.Adam(learning_rate=args.learning_rate),
+            loss=(
+                keras.losses.BinaryFocalCrossentropy(),
+                keras.losses.Huber()),
+            metrics=[keras.metrics.BinaryCrossentropy(name="binaryce"),
+                    keras.metrics.MeanSquaredError(name="mse")],
+        )
+
+        model.fit(train, epochs=args.epochs, validation_data=dev)
+        model.save("svhn_model.keras")
+
+    # Generate test set annotations, but in `args.logdir` to allow parallel execution.
+    os.makedirs(args.logdir, exist_ok=True)
+    with open(os.path.join(args.logdir, "svhn_competition.txt"), "w", encoding="utf-8") as predictions_file:
+        # TODO: Predict the digits and their bounding boxes on the test set.
+        # Assume that for a single test image we get
+        # - `predicted_classes`: a 1D array with the predicted digits,
+        # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes;
+        pred_classes, pred_rcnns = model.predict(test)
+        print(pred_classes.shape, pred_rcnns.shape)
+        # shape of pred_classes, pred_rcnns: (4535, 196, 10) (4535, 196, 4)
+        for predicted_classes, predicted_bboxes in zip(pred_classes, pred_rcnns):
+            scores = torch.tensor(np.max(predicted_classes, axis=-1))
+            predicted_bboxes = torch.tensor(bboxes_utils.bboxes_from_rcnn(anchors, predicted_bboxes), dtype=torch.float32)
+            chosen_bboxes = torchvision.ops.nms(predicted_bboxes, scores, args.iou_threshold)
+            output = []
+            for cls, bbox_id in zip(predicted_classes, chosen_bboxes):
+                label = np.argmax(cls)
+                bbox = predicted_bboxes[bbox_id]
+                output += [label] + list(bbox)
+            print(*output, file=predictions_file)
+
+
+if __name__ == "__main__":
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+    main(args)
diff --git a/labs/06/Untitled-2.py b/labs/06/Untitled-2.py
new file mode 100644
index 0000000..63cf2b5
--- /dev/null
+++ b/labs/06/Untitled-2.py
@@ -0,0 +1,207 @@
+#!/usr/bin/env python3
+import argparse
+import datetime
+import os
+import re
+os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+import keras
+import numpy as np
+import torch, torchvision
+import pickle
+
+import bboxes_utils
+from svhn_dataset import SVHN
+
+# Jonas Glerup Røssum <jglr@itu.dk>
+# 31a0a96a-c590-4486-b194-f72765b2ce25
+# Xiao Wang <xiao.wang@student.uni-tuebingen.de>
+# 91d4d1d7-b800-4765-96b9-df098ac36a66
+
+# TODO: Define reasonable defaults and optionally more parameters.
+# Also, you can set the number of threads to 0 to use all your CPU cores.
+parser = argparse.ArgumentParser()
+parser.add_argument("--batch_size", default=64, type=int, help="Batch size.")
+parser.add_argument("--epochs", default=10, type=int, help="Number of epochs.")
+parser.add_argument("--seed", default=42, type=int, help="Random seed.")
+parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+parser.add_argument("--learning_rate", default=0.001, type=float, help="Learning rate for training.")
+parser.add_argument("--image_size", default=224, type=int, help="A fixed image size.")
+parser.add_argument("--iou_threshold", default=0.7, type=int, help="The intersection over union threshold.")
+
+
+class TorchTensorBoardCallback(keras.callbacks.Callback):
+    def __init__(self, path):
+        self._path = path
+        self._writers = {}
+
+    def writer(self, writer):
+        if writer not in self._writers:
+            import torch.utils.tensorboard
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))
+        return self._writers[writer]
+
+    def add_logs(self, writer, logs, step):
+        if logs:
+            for key, value in logs.items():
+                self.writer(writer).add_scalar(key, value, step)
+            self.writer(writer).flush()
+
+    def on_epoch_end(self, epoch, logs=None):
+        if logs:
+            if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer):
+                logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}
+            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1)
+            self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1)
+
+
+def main(args: argparse.Namespace) -> None:
+    # Set the random seed and the number of threads.
+    keras.utils.set_random_seed(args.seed)
+    if args.threads:
+        torch.set_num_threads(args.threads)
+        torch.set_num_interop_threads(args.threads)
+
+    # Create logdir name
+    args.logdir = os.path.join("logs", "{}-{}-{}".format(
+        os.path.basename(globals().get("__file__", "notebook")),
+        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
+    ))
+
+    print("📊 Loading data")
+
+    # Load the data. The individual examples are dictionaries with the keys:
+    # - "image", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range,
+    # - "classes", a `[num_digits]` vector with classes of image digits,
+    # - "bboxes", a `[num_digits, 4]` vector with bounding boxes of image digits.
+    svhn = SVHN()
+    # e.g., image H=224, W=224, top level pixel = 7X7, anchor height, width = 224/7, 224/7=32,32
+    def get_anchors(backbone_output_pixel=14, image_size=args.image_size):
+        square_anchors = []
+        img_H, img_W = image_size, image_size
+        square_anchor_h, square_anchor_w = img_H//backbone_output_pixel, img_W//backbone_output_pixel
+        for h in range(0, img_H, square_anchor_h):
+            for w in range(0, img_W, square_anchor_w):
+                square_anchors.append([h, w, h+square_anchor_h, w+square_anchor_w])
+        return np.array(square_anchors)
+
+    print("📊 Creating anchors")
+
+    anchors = get_anchors()
+
+
+
+    def prepare_data(example):
+        gold_bboxes = example["bboxes"]/example["image"].shape[0]
+        image_resized = keras.ops.image.resize(example["image"], (args.image_size, args.image_size))
+        gold_classes, iou_threshold = example["classes"], args.iou_threshold
+        anchor_classes, anchor_bboxes = bboxes_utils.bboxes_training(anchors, gold_classes, gold_bboxes, iou_threshold)
+        anchor_classes_one_hot = keras.ops.one_hot(anchor_classes-1, svhn.LABELS)
+        classes_sample_weight = keras.ops.ones_like(anchor_classes)
+        bboxes_sample_weight = anchor_classes > 0
+        return image_resized, (anchor_classes_one_hot, anchor_bboxes), (classes_sample_weight, bboxes_sample_weight)
+
+    print("📊 Transforming data")
+
+    print("length: ", len(svhn.train.transform(prepare_data)))
+    train_imgs, train_labels, train_weights = zip(*[(entry[0], entry[1], entry[2]) for entry in svhn.train.transform(prepare_data)])
+    dev_imgs, dev_labels, dev_weights = zip(*[(entry[0], entry[1], entry[2]) for entry in svhn.dev.transform(prepare_data)])
+    test_imgs, _, _ = zip(*[(entry[0], entry[1], entry[2]) for entry in svhn.test.transform(prepare_data)])
+
+    pickle.dump((train_imgs, train_labels, train_weights, dev_imgs, dev_labels, dev_weights, test_imgs), open("data.pkl", "wb"))
+    # train_imgs, train_labels, train_sample_weights, dev_imgs, dev_labels, dev_sample_weights, test_imgs = pickle.load(open("data.pkl", "rb"))
+
+    print("📊 Creating efficientnet")
+    # Load the EfficientNetV2-B0 model. It assumes the input images are
+    # represented in the [0-255] range.
+    backbone = keras.applications.EfficientNetV2B0(include_top=False)
+
+    print("📊 Instantiating model")
+    # Extract features of different resolution. Assuming 224x224 input images
+    # (you can set this explicitly via `input_shape` of the above constructor),
+    # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112.
+    backbone = keras.Model(
+        inputs=backbone.input,
+        outputs=[backbone.get_layer(layer).output for layer in [
+            "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]]
+    )
+
+    # TODO: Create the model and train it
+    backbone.trainable = False
+    inputs = keras.layers.Input(shape=(args.image_size,args.image_size, 3))
+    # backbone outputs bottom to up (layer 1-5): block1a, block2b, block3b, block5e, top
+    # shapes: 7x7x1280, 14x14x112, 28x28x40, 56x56x24, 112x112x16
+    top, block5e, block3b, block2b, block1a = backbone(inputs)
+
+    def bn_relu(inputs):
+        return keras.layers.ReLU()(keras.layers.BatchNormalization()(inputs))
+
+    ### classification and bbox regression head
+    ### 9 is the anchor number for RitinaNet
+    def heads(input_feature, type="classification", anchor_number=1):
+        activ, output_size = None, 0
+        if type.lower() == "classification":
+            activ, output_size = "sigmoid", svhn.LABELS*anchor_number
+        elif type.lower() == "regression":
+            activ, output_size = None, 4*anchor_number
+        else:
+            print("Type can only be 'classification' or 'regression'!")
+        conv1 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(input_feature))
+        conv2 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(conv1))
+        conv3 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(conv2))
+        outputs = keras.layers.Conv2D(output_size, 3, 1, "same", activation=activ)(conv3)
+        return outputs
+
+    print("📊 Preprocessing")
+
+    # only use the top layer output
+    feature = block5e
+    cls_output = heads(feature)
+    reg_output = heads(feature, "regression")
+
+    print("📊 Creating model")
+
+    model = keras.Model(inputs, [cls_output, reg_output], name="baseline")
+    model.summary()
+
+    print("📊 Compiling")
+
+    model.compile(
+        optimizer=keras.optimizers.Adam(learning_rate=args.learning_rate),
+        loss=(
+            keras.losses.BinaryFocalCrossentropy(),
+            keras.losses.Huber()),
+        metrics=["accuracy"],
+    )
+
+    print("📊 Training")
+
+    model.fit(train_imgs, train_labels,
+              batch_size=args.batch_size, epochs=args.epochs,
+              validation_data=(dev_imgs, dev_labels),
+              sample_weight = train_sample_weights,
+    )
+
+    # Generate test set annotations, but in `args.logdir` to allow parallel execution.
+    os.makedirs(args.logdir, exist_ok=True)
+    with open(os.path.join(args.logdir, "svhn_competition.txt"), "w", encoding="utf-8") as predictions_file:
+        # TODO: Predict the digits and their bounding boxes on the test set.
+        # Assume that for a single test image we get
+        # - `predicted_classes`: a 1D array with the predicted digits,
+        # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes;
+        pre_classes, pre_rcnns = model.predict(test_imgs)
+        pre_bboxes = bboxes_utils.bboxes_from_rcnn(anchors, pre_rcnns)
+        for predicted_classes, predicted_bboxes in zip(pre_classes, pre_bboxes):
+            scores = np.max(predicted_classes, axis=-1)
+            chosen_bboxes = torchvision.ops.nms(predicted_bboxes, scores, args.iou_threshold)
+            print(chosen_bboxes.shape, test_imgs.shape)
+            output = []
+            for label, bbox in zip(predicted_classes, chosen_bboxes):
+                output += [label] + list(bbox)
+            print(*output, file=predictions_file)
+
+
+if __name__ == "__main__":
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+    main(args)
diff --git a/labs/06/bboxes_utils.py b/labs/06/bboxes_utils.py
new file mode 100644
index 0000000..332ea79
--- /dev/null
+++ b/labs/06/bboxes_utils.py
@@ -0,0 +1,181 @@
+#!/usr/bin/env python3
+import argparse
+from typing import Callable
+import unittest
+
+import numpy as np
+
+# Bounding boxes and anchors are expected to be Numpy tensors,
+# where the last dimension has size 4.
+
+# For bounding boxes in pixel coordinates, the 4 values correspond to:
+TOP: int = 0
+LEFT: int = 1
+BOTTOM: int = 2
+RIGHT: int = 3
+
+
+def bboxes_area(bboxes: np.ndarray) -> np.ndarray:
+    """ Compute area of given set of bboxes.
+
+    Each bbox is parametrized as a four-tuple (top, left, bottom, right).
+
+    If the bboxes.shape is [..., 4], the output shape is bboxes.shape[:-1].
+    """
+    return np.maximum(bboxes[..., BOTTOM] - bboxes[..., TOP], 0) \
+        * np.maximum(bboxes[..., RIGHT] - bboxes[..., LEFT], 0)
+
+
+def bboxes_iou(xs: np.ndarray, ys: np.ndarray) -> np.ndarray:
+    """ Compute IoU of corresponding pairs from two sets of bboxes `xs` and `ys`.
+
+    Each bbox is parametrized as a four-tuple (top, left, bottom, right).
+
+    Note that broadcasting is supported, so passing inputs with
+    `xs.shape=[num_xs, 1, 4]` and `ys.shape=[1, num_ys, 4]` produces an output
+    with shape `[num_xs, num_ys]`, computing IoU for all pairs of bboxes from
+    `xs` and `ys`. Formally, the output shape is `np.broadcast(xs, ys).shape[:-1]`.
+    """
+    intersections = np.stack([
+        np.maximum(xs[..., TOP], ys[..., TOP]),
+        np.maximum(xs[..., LEFT], ys[..., LEFT]),
+        np.minimum(xs[..., BOTTOM], ys[..., BOTTOM]),
+        np.minimum(xs[..., RIGHT], ys[..., RIGHT]),
+    ], axis=-1)
+
+    xs_area, ys_area, intersections_area = bboxes_area(xs), bboxes_area(ys), bboxes_area(intersections)
+
+    return intersections_area / (xs_area + ys_area - intersections_area)
+
+
+def bboxes_to_rcnn(anchors: np.ndarray, bboxes: np.ndarray) -> np.ndarray:
+    """ Convert `bboxes` to a R-CNN-like representation relative to `anchors`.
+
+    The `anchors` and `bboxes` are arrays of four-tuples (top, left, bottom, right);
+    you can use the TOP, LEFT, BOTTOM, RIGHT constants as indices of the
+    respective coordinates.
+
+    The resulting representation of a single bbox is a four-tuple with:
+    - (bbox_y_center - anchor_y_center) / anchor_height
+    - (bbox_x_center - anchor_x_center) / anchor_width
+    - log(bbox_height / anchor_height)
+    - log(bbox_width / anchor_width)
+
+    If the `anchors.shape` is `[anchors_len, 4]` and `bboxes.shape` is `[anchors_len, 4]`,
+    the output shape is `[anchors_len, 4]`.
+    """
+
+    # TODO: Implement according to the docstring.
+    raise NotImplementedError()
+
+
+def bboxes_from_rcnn(anchors: np.ndarray, rcnns: np.ndarray) -> np.ndarray:
+    """ Convert R-CNN-like representation relative to `anchor` to a `bbox`.
+
+    If the `anchors.shape` is `[anchors_len, 4]` and `rcnns.shape` is `[anchors_len, 4]`,
+    the output shape is `[anchors_len, 4]`.
+    """
+
+    # TODO: Implement according to the docstring.
+    raise NotImplementedError()
+
+
+def bboxes_training(
+    anchors: np.ndarray, gold_classes: np.ndarray, gold_bboxes: np.ndarray, iou_threshold: float
+) -> tuple[np.ndarray, np.ndarray]:
+    """ Compute training data for object detection.
+
+    Arguments:
+    - `anchors` is an array of four-tuples (top, left, bottom, right)
+    - `gold_classes` is an array of zero-based classes of the gold objects
+    - `gold_bboxes` is an array of four-tuples (top, left, bottom, right)
+      of the gold objects
+    - `iou_threshold` is a given threshold
+
+    Returns:
+    - `anchor_classes` contains for every anchor either 0 for background
+      (if no gold object is assigned) or `1 + gold_class` if a gold object
+      with `gold_class` is assigned to it
+    - `anchor_bboxes` contains for every anchor a four-tuple
+      `(center_y, center_x, height, width)` representing the gold bbox of
+      a chosen object using parametrization of R-CNN; zeros if no gold object
+      was assigned to the anchor
+    If the `anchors` shape is `[anchors_len, 4]`, the `anchor_classes` shape
+    is `[anchors_len]` and the `anchor_bboxes` shape is `[anchors_len, 4]`.
+
+    Algorithm:
+    - First, for each gold object, assign it to an anchor with the largest IoU
+      (the anchor with smaller index if there are several). In case several gold
+      objects are assigned to a single anchor, use the gold object with smaller
+      index.
+    - For each unused anchor, find the gold object with the largest IoU
+      (again the gold object with smaller index if there are several), and if
+      the IoU is >= iou_threshold, assign the object to the anchor.
+    """
+
+    # TODO: First, for each gold object, assign it to an anchor with the
+    # largest IoU (the anchor with smaller index if there are several). In case
+    # several gold objects are assigned to a single anchor, use the gold object
+    # with smaller index.
+
+    # TODO: For each unused anchor, find the gold object with the largest IoU
+    # (again the gold object with smaller index if there are several), and if
+    # the IoU is >= threshold, assign the object to the anchor.
+
+    anchor_classes, anchor_bboxes = ..., ...
+
+    return anchor_classes, anchor_bboxes
+
+
+def main(args: argparse.Namespace) -> tuple[Callable, Callable, Callable]:
+    return bboxes_to_rcnn, bboxes_from_rcnn, bboxes_training
+
+
+class Tests(unittest.TestCase):
+    def test_bboxes_to_from_rcnn(self):
+        data = [
+            [[0, 0, 10, 10], [0, 0, 10, 10], [0, 0, 0, 0]],
+            [[0, 0, 10, 10], [5, 0, 15, 10], [.5, 0, 0, 0]],
+            [[0, 0, 10, 10], [0, 5, 10, 15], [0, .5, 0, 0]],
+            [[0, 0, 10, 10], [0, 0, 20, 30], [.5, 1, np.log(2), np.log(3)]],
+            [[0, 9, 10, 19], [2, 10, 5, 16], [-0.15, -0.1, -1.20397, -0.51083]],
+            [[5, 3, 15, 13], [7, 7, 10, 9], [-0.15, 0, -1.20397, -1.60944]],
+            [[7, 6, 17, 16], [9, 10, 12, 13], [-0.15, 0.05, -1.20397, -1.20397]],
+            [[5, 6, 15, 16], [7, 7, 10, 10], [-0.15, -0.25, -1.20397, -1.20397]],
+            [[6, 3, 16, 13], [8, 5, 12, 8], [-0.1, -0.15, -0.91629, -1.20397]],
+            [[5, 2, 15, 12], [9, 6, 12, 8], [0.05, 0, -1.20397, -1.60944]],
+            [[2, 10, 12, 20], [6, 11, 8, 17], [0, -0.1, -1.60944, -0.51083]],
+            [[10, 9, 20, 19], [12, 13, 17, 16], [-0.05, 0.05, -0.69315, -1.20397]],
+            [[6, 7, 16, 17], [10, 11, 12, 14], [0, 0.05, -1.60944, -1.20397]],
+            [[2, 2, 12, 12], [3, 5, 8, 8], [-0.15, -0.05, -0.69315, -1.20397]],
+        ]
+        # First run on individual anchors, and then on all together
+        for anchors, bboxes, rcnns in [map(lambda x: [x], row) for row in data] + [zip(*data)]:
+            anchors, bboxes, rcnns = [np.array(data, np.float32) for data in [anchors, bboxes, rcnns]]
+            np.testing.assert_almost_equal(bboxes_to_rcnn(anchors, bboxes), rcnns, decimal=3)
+            np.testing.assert_almost_equal(bboxes_from_rcnn(anchors, rcnns), bboxes, decimal=3)
+
+    def test_bboxes_training(self):
+        anchors = np.array([[0, 0, 10, 10], [0, 10, 10, 20], [10, 0, 20, 10], [10, 10, 20, 20]], np.float32)
+        for gold_classes, gold_bboxes, anchor_classes, anchor_bboxes, iou in [
+                [[1], [[14., 14, 16, 16]], [0, 0, 0, 2], [[0, 0, 0, 0]] * 3 + [[0, 0, np.log(.2), np.log(.2)]], 0.5],
+                [[2], [[0., 0, 20, 20]], [3, 0, 0, 0], [[.5, .5, np.log(2), np.log(2)]] + [[0, 0, 0, 0]] * 3, 0.26],
+                [[2], [[0., 0, 20, 20]], [3, 3, 3, 3],
+                 [[y, x, np.log(2), np.log(2)] for y in [.5, -.5] for x in [.5, -.5]], 0.24],
+                [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 0, 0, 1],
+                 [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [-0.35, -0.45, 0.53062, 0.40546]], 0.5],
+                [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 0, 2, 1],
+                 [[0, 0, 0, 0], [0, 0, 0, 0], [-0.1, 0.6, -0.22314, 0.69314], [-0.35, -0.45, 0.53062, 0.40546]], 0.3],
+                [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 1, 2, 1],
+                 [[0, 0, 0, 0], [0.65, -0.45, 0.53062, 0.40546], [-0.1, 0.6, -0.22314, 0.69314],
+                  [-0.35, -0.45, 0.53062, 0.40546]], 0.17],
+        ]:
+            gold_classes, anchor_classes = np.array(gold_classes, np.int32), np.array(anchor_classes, np.int32)
+            gold_bboxes, anchor_bboxes = np.array(gold_bboxes, np.float32), np.array(anchor_bboxes, np.float32)
+            computed_classes, computed_bboxes = bboxes_training(anchors, gold_classes, gold_bboxes, iou)
+            np.testing.assert_almost_equal(computed_classes, anchor_classes, decimal=3)
+            np.testing.assert_almost_equal(computed_bboxes, anchor_bboxes, decimal=3)
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/labs/06/svhn.ipynb b/labs/06/svhn.ipynb
new file mode 100644
index 0000000..c5d935f
--- /dev/null
+++ b/labs/06/svhn.ipynb
@@ -0,0 +1,375 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "c:\\Users\\jonas\\p\\cu\\NPFL138\\repo\\.venv\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "#!/usr/bin/env python3\n",
+    "import argparse\n",
+    "import datetime\n",
+    "import os\n",
+    "import re\n",
+    "os.environ.setdefault(\"KERAS_BACKEND\", \"torch\")  # Use PyTorch backend unless specified otherwise\n",
+    "\n",
+    "import keras\n",
+    "import numpy as np\n",
+    "import torch, torchvision\n",
+    "\n",
+    "import bboxes_utils\n",
+    "from svhn_dataset import SVHN\n",
+    "\n",
+    "# Jonas Glerup Røssum <jglr@itu.dk>\n",
+    "# 31a0a96a-c590-4486-b194-f72765b2ce25\n",
+    "# Xiao Wang <xiao.wang@student.uni-tuebingen.de>\n",
+    "# 91d4d1d7-b800-4765-96b9-df098ac36a66\n",
+    "\n",
+    "# TODO: Define reasonable defaults and optionally more parameters.\n",
+    "# Also, you can set the number of threads to 0 to use all your CPU cores.\n",
+    "parser = argparse.ArgumentParser()\n",
+    "parser.add_argument(\"--batch_size\", default=64, type=int, help=\"Batch size.\")\n",
+    "parser.add_argument(\"--epochs\", default=10, type=int, help=\"Number of epochs.\")\n",
+    "parser.add_argument(\"--seed\", default=42, type=int, help=\"Random seed.\")\n",
+    "parser.add_argument(\"--threads\", default=1, type=int, help=\"Maximum number of threads to use.\")\n",
+    "parser.add_argument(\"--learning_rate\", default=0.001, type=float, help=\"Learning rate for training.\")\n",
+    "parser.add_argument(\"--image_size\", default=224, type=int, help=\"A fixed image size.\")\n",
+    "parser.add_argument(\"--iou_threshold\", default=0.7, type=int, help=\"The intersection over union threshold.\")\n",
+    "\n",
+    "\n",
+    "class TorchTensorBoardCallback(keras.callbacks.Callback):\n",
+    "    def __init__(self, path):\n",
+    "        self._path = path\n",
+    "        self._writers = {}\n",
+    "\n",
+    "    def writer(self, writer):\n",
+    "        if writer not in self._writers:\n",
+    "            import torch.utils.tensorboard\n",
+    "            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))\n",
+    "        return self._writers[writer]\n",
+    "\n",
+    "    def add_logs(self, writer, logs, step):\n",
+    "        if logs:\n",
+    "            for key, value in logs.items():\n",
+    "                self.writer(writer).add_scalar(key, value, step)\n",
+    "            self.writer(writer).flush()\n",
+    "\n",
+    "    def on_epoch_end(self, epoch, logs=None):\n",
+    "        if logs:\n",
+    "            if isinstance(getattr(self.model, \"optimizer\", None), keras.optimizers.Optimizer):\n",
+    "                logs = logs | {\"learning_rate\": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}\n",
+    "            self.add_logs(\"train\", {k: v for k, v in logs.items() if not k.startswith(\"val_\")}, epoch + 1)\n",
+    "            self.add_logs(\"val\", {k[4:]: v for k, v in logs.items() if k.startswith(\"val_\")}, epoch + 1)\n",
+    "\n",
+    "args = parser.parse_args([] if \"__file__\" not in globals() else None)\n",
+    "\n",
+    "def get_anchors(backbone_output_pixel=14, image_size=args.image_size):\n",
+    "    square_anchors = []\n",
+    "    img_H, img_W = image_size, image_size\n",
+    "    square_anchor_h, square_anchor_w = img_H//backbone_output_pixel, img_W//backbone_output_pixel\n",
+    "    for h in range(0, img_H, square_anchor_h):\n",
+    "        for w in range(0, img_W, square_anchor_w):\n",
+    "            square_anchors.append([h, w, h+square_anchor_h, w+square_anchor_w])\n",
+    "    return np.array(square_anchors)\n",
+    "\n",
+    "def prepare_data(example):\n",
+    "    gold_bboxes = example[\"bboxes\"]/example[\"image\"].shape[0]\n",
+    "    image_resized = keras.ops.image.resize(example[\"image\"], (args.image_size, args.image_size))\n",
+    "    gold_classes, iou_threshold = example[\"classes\"], args.iou_threshold\n",
+    "    anchor_classes, anchor_bboxes = bboxes_utils.bboxes_training(anchors, gold_classes, gold_bboxes, iou_threshold)\n",
+    "    anchor_classes_one_hot = keras.ops.one_hot(anchor_classes-1, svhn.LABELS)\n",
+    "    classes_sample_weight = keras.ops.ones_like(anchor_classes)\n",
+    "    bboxes_sample_weight = anchor_classes > 0\n",
+    "    return image_resized, (anchor_classes_one_hot, anchor_bboxes), (classes_sample_weight, bboxes_sample_weight)\n",
+    "\n",
+    "def bn_relu(inputs):\n",
+    "    return keras.layers.ReLU()(keras.layers.BatchNormalization()(inputs))\n",
+    "\n",
+    "### classification and bbox regression head\n",
+    "### 9 is the anchor number for RitinaNet\n",
+    "def heads(input_feature, type=\"classification\", anchor_number=1):\n",
+    "    activ, output_size = None, 0\n",
+    "    if type.lower() == \"classification\":\n",
+    "        activ, output_size = \"sigmoid\", svhn.LABELS*anchor_number\n",
+    "    elif type.lower() == \"regression\":\n",
+    "        activ, output_size = None, 4*anchor_number\n",
+    "    else:\n",
+    "        print(\"Type can only be 'classification' or 'regression'!\")\n",
+    "    conv1 = bn_relu(keras.layers.Conv2D(256, 3, 1, \"same\")(input_feature))\n",
+    "    conv2 = bn_relu(keras.layers.Conv2D(256, 3, 1, \"same\")(conv1))\n",
+    "    conv3 = bn_relu(keras.layers.Conv2D(256, 3, 1, \"same\")(conv2))\n",
+    "    outputs = keras.layers.Conv2D(output_size, 3, 1, \"same\", activation=activ)(conv3)\n",
+    "    return outputs\n",
+    "\n",
+    "\n",
+    "\n",
+    "  # Set the random seed and the number of threads.\n",
+    "keras.utils.set_random_seed(args.seed)\n",
+    "if args.threads:\n",
+    "    torch.set_num_threads(args.threads)\n",
+    "    # torch.set_num_interop_threads(args.threads)\n",
+    "\n",
+    "# Create logdir name\n",
+    "args.logdir = os.path.join(\"logs\", \"{}-{}-{}\".format(\n",
+    "    os.path.basename(globals().get(\"__file__\", \"notebook\")),\n",
+    "    datetime.datetime.now().strftime(\"%Y-%m-%d_%H%M%S\"),\n",
+    "    \",\".join((\"{}={}\".format(re.sub(\"(.)[^_]*_?\", r\"\\1\", k), v) for k, v in sorted(vars(args).items())))\n",
+    "))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "📊 Loading data\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"📊 Loading data\")\n",
+    "\n",
+    "# Load the data. The individual examples are dictionaries with the keys:\n",
+    "# - \"image\", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range,\n",
+    "# - \"classes\", a `[num_digits]` vector with classes of image digits,\n",
+    "# - \"bboxes\", a `[num_digits, 4]` vector with bounding boxes of image digits.\n",
+    "svhn = SVHN()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'image': tensor([[[124,  93,  64],\n",
+       "          [128,  93,  63],\n",
+       "          [127,  92,  60],\n",
+       "          ...,\n",
+       "          [118,  82,  60],\n",
+       "          [115,  80,  58],\n",
+       "          [113,  78,  58]],\n",
+       " \n",
+       "         [[126,  95,  64],\n",
+       "          [129,  94,  62],\n",
+       "          [128,  94,  59],\n",
+       "          ...,\n",
+       "          [118,  82,  58],\n",
+       "          [115,  80,  58],\n",
+       "          [113,  78,  58]],\n",
+       " \n",
+       "         [[124,  93,  62],\n",
+       "          [128,  94,  59],\n",
+       "          [127,  93,  56],\n",
+       "          ...,\n",
+       "          [118,  82,  58],\n",
+       "          [115,  81,  56],\n",
+       "          [113,  78,  58]],\n",
+       " \n",
+       "         ...,\n",
+       " \n",
+       "         [[106,  74,  49],\n",
+       "          [108,  74,  49],\n",
+       "          [108,  74,  49],\n",
+       "          ...,\n",
+       "          [ 90,  72,  62],\n",
+       "          [ 89,  72,  62],\n",
+       "          [ 89,  72,  62]],\n",
+       " \n",
+       "         [[106,  74,  49],\n",
+       "          [108,  74,  49],\n",
+       "          [108,  74,  49],\n",
+       "          ...,\n",
+       "          [ 95,  75,  66],\n",
+       "          [ 95,  77,  67],\n",
+       "          [ 96,  78,  68]],\n",
+       " \n",
+       "         [[105,  73,  48],\n",
+       "          [107,  73,  48],\n",
+       "          [105,  71,  46],\n",
+       "          ...,\n",
+       "          [104,  79,  72],\n",
+       "          [104,  81,  73],\n",
+       "          [104,  81,  73]]], dtype=torch.uint8),\n",
+       " 'classes': array([4], dtype=int64),\n",
+       " 'bboxes': array([[ 5, 13, 32, 33]], dtype=int64)}"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "svhn.train[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "📊 Creating anchors\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"📊 Creating anchors\")\n",
+    "\n",
+    "anchors = get_anchors()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "📊 Transforming data\n"
+     ]
+    },
+    {
+     "ename": "ValueError",
+     "evalue": "too many values to unpack (expected 3)",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
+      "Cell \u001b[1;32mIn[5], line 3\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m📊 Transforming data\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m----> 3\u001b[0m train_imgs, train_labels, train_weights \u001b[38;5;241m=\u001b[39m svhn\u001b[38;5;241m.\u001b[39mtrain\u001b[38;5;241m.\u001b[39mtransform(prepare_data)\n\u001b[0;32m      4\u001b[0m dev_imgs, dev_labels, dev_weights \u001b[38;5;241m=\u001b[39m svhn\u001b[38;5;241m.\u001b[39mdev\u001b[38;5;241m.\u001b[39mtransform(prepare_data)\n\u001b[0;32m      5\u001b[0m test_imgs, _, _ \u001b[38;5;241m=\u001b[39m svhn\u001b[38;5;241m.\u001b[39mtest\u001b[38;5;241m.\u001b[39mtransform(prepare_data)\n",
+      "\u001b[1;31mValueError\u001b[0m: too many values to unpack (expected 3)"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"📊 Transforming data\")\n",
+    "\n",
+    "train_imgs, train_labels, train_weights = svhn.train.transform(prepare_data)\n",
+    "dev_imgs, dev_labels, dev_weights = svhn.dev.transform(prepare_data)\n",
+    "test_imgs, _, _ = svhn.test.transform(prepare_data)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "print(\"📊 Creating efficientnet\")\n",
+    "# Load the EfficientNetV2-B0 model. It assumes the input images are\n",
+    "# represented in the [0-255] range.\n",
+    "backbone = keras.applications.EfficientNetV2B0(include_top=False)\n",
+    "\n",
+    "print(\"📊 Instantiating model\")\n",
+    "# Extract features of different resolution. Assuming 224x224 input images\n",
+    "# (you can set this explicitly via `input_shape` of the above constructor),\n",
+    "# the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112.\n",
+    "backbone = keras.Model(\n",
+    "    inputs=backbone.input,\n",
+    "    outputs=[backbone.get_layer(layer).output for layer in [\n",
+    "        \"top_activation\", \"block5e_add\", \"block3b_add\", \"block2b_add\", \"block1a_project_activation\"]]\n",
+    ")\n",
+    "\n",
+    "# TODO: Create the model and train it\n",
+    "backbone.trainable = False\n",
+    "inputs = keras.layers.Input(shape=(args.image_size,args.image_size, 3))\n",
+    "# backbone outputs bottom to up (layer 1-5): block1a, block2b, block3b, block5e, top\n",
+    "# shapes: 7x7x1280, 14x14x112, 28x28x40, 56x56x24, 112x112x16\n",
+    "top, block5e, block3b, block2b, block1a = backbone(inputs)\n",
+    "\n",
+    "print(\"📊 Preprocessing\")\n",
+    "\n",
+    "# only use the top layer output\n",
+    "feature = block5e\n",
+    "cls_output = heads(feature)\n",
+    "reg_output = heads(feature, \"regression\")\n",
+    "\n",
+    "print(\"📊 Creating model\")\n",
+    "\n",
+    "model = keras.Model(inputs, [cls_output, reg_output], name=\"baseline\")\n",
+    "model.summary()\n",
+    "\n",
+    "print(\"📊 Compiling\")\n",
+    "\n",
+    "model.compile(\n",
+    "    optimizer=keras.optimizers.Adam(learning_rate=args.learning_rate),\n",
+    "    loss=(\n",
+    "        keras.losses.BinaryFocalCrossentropy(),\n",
+    "        keras.losses.Huber()),\n",
+    "    metrics=[\"accuracy\"],\n",
+    ")\n",
+    "\n",
+    "print(\"📊 Training\")\n",
+    "\n",
+    "model.fit(train_imgs, train_labels,\n",
+    "          batch_size=args.batch_size, epochs=args.epochs,\n",
+    "          validation_data=(dev_imgs, dev_labels),\n",
+    "          sample_weight = train_sample_weights,\n",
+    ")\n",
+    "\n",
+    "# Generate test set annotations, but in `args.logdir` to allow parallel execution.\n",
+    "os.makedirs(args.logdir, exist_ok=True)\n",
+    "with open(os.path.join(args.logdir, \"svhn_competition.txt\"), \"w\", encoding=\"utf-8\") as predictions_file:\n",
+    "    # TODO: Predict the digits and their bounding boxes on the test set.\n",
+    "    # Assume that for a single test image we get\n",
+    "    # - `predicted_classes`: a 1D array with the predicted digits,\n",
+    "    # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes;\n",
+    "    pre_classes, pre_rcnns = model.predict(test_imgs)\n",
+    "    pre_bboxes = bboxes_utils.bboxes_from_rcnn(anchors, pre_rcnns)\n",
+    "    for predicted_classes, predicted_bboxes in zip(pre_classes, pre_bboxes):\n",
+    "        scores = np.max(predicted_classes, axis=-1)\n",
+    "        chosen_bboxes = torchvision.ops.nms(predicted_bboxes, scores, args.iou_threshold)\n",
+    "        print(chosen_bboxes.shape, test_imgs.shape)\n",
+    "        output = []\n",
+    "        for label, bbox in zip(predicted_classes, chosen_bboxes):\n",
+    "            output += [label] + list(bbox)\n",
+    "        print(*output, file=predictions_file)\n",
+    "\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "undefined.undefined.undefined"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/labs/06/svhn_competition.py b/labs/06/svhn_competition.py
new file mode 100644
index 0000000..ef3e6d0
--- /dev/null
+++ b/labs/06/svhn_competition.py
@@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+import argparse
+import datetime
+import os
+import re
+os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+import keras
+import numpy as np
+import torch
+
+import bboxes_utils
+from svhn_dataset import SVHN
+
+# TODO: Define reasonable defaults and optionally more parameters.
+# Also, you can set the number of threads to 0 to use all your CPU cores.
+parser = argparse.ArgumentParser()
+parser.add_argument("--batch_size", default=..., type=int, help="Batch size.")
+parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.")
+parser.add_argument("--seed", default=42, type=int, help="Random seed.")
+parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+
+
+class TorchTensorBoardCallback(keras.callbacks.Callback):
+    def __init__(self, path):
+        self._path = path
+        self._writers = {}
+
+    def writer(self, writer):
+        if writer not in self._writers:
+            import torch.utils.tensorboard
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))
+        return self._writers[writer]
+
+    def add_logs(self, writer, logs, step):
+        if logs:
+            for key, value in logs.items():
+                self.writer(writer).add_scalar(key, value, step)
+            self.writer(writer).flush()
+
+    def on_epoch_end(self, epoch, logs=None):
+        if logs:
+            if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer):
+                logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}
+            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1)
+            self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1)
+
+
+def main(args: argparse.Namespace) -> None:
+    # Set the random seed and the number of threads.
+    keras.utils.set_random_seed(args.seed)
+    if args.threads:
+        torch.set_num_threads(args.threads)
+        torch.set_num_interop_threads(args.threads)
+
+    # Create logdir name
+    args.logdir = os.path.join("logs", "{}-{}-{}".format(
+        os.path.basename(globals().get("__file__", "notebook")),
+        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
+    ))
+
+    # Load the data. The individual examples are dictionaries with the keys:
+    # - "image", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range,
+    # - "classes", a `[num_digits]` vector with classes of image digits,
+    # - "bboxes", a `[num_digits, 4]` vector with bounding boxes of image digits.
+    svhn = SVHN()
+
+    # Load the EfficientNetV2-B0 model. It assumes the input images are
+    # represented in the [0-255] range.
+    backbone = keras.applications.EfficientNetV2B0(include_top=False)
+
+    # Extract features of different resolution. Assuming 224x224 input images
+    # (you can set this explicitly via `input_shape` of the above constructor),
+    # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112.
+    backbone = keras.Model(
+        inputs=backbone.input,
+        outputs=[backbone.get_layer(layer).output for layer in [
+            "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]]
+    )
+
+    # TODO: Create the model and train it
+    model = ...
+
+    # Generate test set annotations, but in `args.logdir` to allow parallel execution.
+    os.makedirs(args.logdir, exist_ok=True)
+    with open(os.path.join(args.logdir, "svhn_competition.txt"), "w", encoding="utf-8") as predictions_file:
+        # TODO: Predict the digits and their bounding boxes on the test set.
+        # Assume that for a single test image we get
+        # - `predicted_classes`: a 1D array with the predicted digits,
+        # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes;
+        for predicted_classes, predicted_bboxes in ...:
+            output = []
+            for label, bbox in zip(predicted_classes, predicted_bboxes):
+                output += [label] + list(bbox)
+            print(*output, file=predictions_file)
+
+
+if __name__ == "__main__":
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+    main(args)
diff --git a/labs/06/svhn_dataset.py b/labs/06/svhn_dataset.py
new file mode 100644
index 0000000..b111b19
--- /dev/null
+++ b/labs/06/svhn_dataset.py
@@ -0,0 +1,253 @@
+import array
+import os
+import sys
+import struct
+from typing import Any, Callable, Sequence, TextIO, TypedDict
+import urllib.request
+
+import numpy as np
+import torch
+import torchvision
+
+
+class SVHN:
+    LABELS: int = 10
+
+    # Type alias for a bounding box -- a list of floats.
+    BBox = list[float]
+
+    # The indices of the bounding box coordinates.
+    TOP: int = 0
+    LEFT: int = 1
+    BOTTOM: int = 2
+    RIGHT: int = 3
+
+    Element = TypedDict("Element", {"image": torch.Tensor, "classes": np.ndarray, "bboxes": np.ndarray})
+
+    _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/"
+
+    class Dataset(torch.utils.data.Dataset):
+        def __init__(self, path: str, size: int, decode_on_demand: bool) -> None:
+            self._size = size
+
+            arrays, indices = SVHN._load_data(path, size)
+            if decode_on_demand:
+                self._data, self._arrays, self._indices = None, arrays, indices
+            else:
+                self._data = [self._decode(arrays, indices, i) for i in range(size)]
+
+        def __len__(self) -> int:
+            return self._size
+
+        def __getitem__(self, index: int) -> "SVHN.Element":
+            if self._data:
+                return self._data[index]
+            return self._decode(self._arrays, self._indices, index)
+
+        def transform(self, transform: Callable[["SVHN.Element"], Any]) -> "SVHN.TransformedDataset":
+            return SVHN.TransformedDataset(self, transform)
+
+        def _decode(self, data: dict, indices: dict, index: int) -> "SVHN.Element":
+            return {
+                "image": torchvision.io.decode_image(
+                    torch.frombuffer(data["image"], dtype=torch.uint8, offset=indices["image"][:-1][index],
+                                     count=indices["image"][1:][index] - indices["image"][:-1][index]),
+                    torchvision.io.ImageReadMode.RGB).permute(1, 2, 0),
+                "classes": np.frombuffer(
+                    data["classes"], dtype=np.int64, offset=indices["classes"][:-1][index] << 3,
+                    count=indices["classes"][1:][index] - indices["classes"][:-1][index]),
+                "bboxes": np.frombuffer(
+                    data["bboxes"], dtype=np.int64, offset=indices["bboxes"][:-1][index] << 3,
+                    count=indices["bboxes"][1:][index] - indices["bboxes"][:-1][index]).reshape(-1, 4),
+            }
+
+    class TransformedDataset(torch.utils.data.Dataset):
+        def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None:
+            self._dataset = dataset
+            self._transform = transform
+
+        def __len__(self) -> int:
+            return len(self._dataset)
+
+        def __getitem__(self, index: int) -> Any:
+            item = self._dataset[index]
+            return self._transform(*item) if isinstance(item, tuple) else self._transform(item)
+
+        def transform(self, transform: Callable[..., Any]) -> "SVHN.TransformedDataset":
+            return SVHN.TransformedDataset(self, transform)
+
+    def __init__(self, decode_on_demand: bool = False) -> None:
+        for dataset, size in [("train", 10_000), ("dev", 1_267), ("test", 4_535)]:
+            path = "svhn.{}.tfrecord".format(dataset)
+            if not os.path.exists(path):
+                print("Downloading file {}...".format(path), file=sys.stderr)
+                urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path))
+                os.rename("{}.tmp".format(path), path)
+
+            setattr(self, dataset, self.Dataset(path, size, decode_on_demand))
+
+    train: Dataset
+    dev: Dataset
+    test: Dataset
+
+    # TFRecord loading
+    @staticmethod
+    def _load_data(path: str, items: int) -> tuple[dict[str, array.array], dict[str, array.array]]:
+        def get_value() -> np.int64:
+            nonlocal data, offset
+            value = np.int64(data[offset] & 0x7F); start = offset; offset += 1
+            while data[offset - 1] & 0x80:
+                value |= (data[offset] & 0x7F) << (7 * (offset - start)); offset += 1
+            return value
+
+        def get_value_of_kind(kind: int) -> np.int64:
+            nonlocal data, offset
+            assert data[offset] == kind; offset += 1
+            return get_value()
+
+        arrays, indices = {}, {}
+        with open(path, "rb") as file:
+            for _ in range(items):
+                length = file.read(8); assert len(length) == 8
+                length, = struct.unpack("<Q", length)
+                assert len(file.read(4)) == 4
+                data = file.read(length); assert len(data) == length
+                assert len(file.read(4)) == 4
+
+                offset = 0
+                length = get_value_of_kind(0x0A)
+                assert len(data) - offset == length
+                while offset < len(data):
+                    get_value_of_kind(0x0A)
+                    length = get_value_of_kind(0x0A)
+                    key = data[offset:offset + length].decode("utf-8"); offset += length
+                    get_value_of_kind(0x12)
+                    if key not in arrays:
+                        arrays[key] = array.array({0x0A: "B", 0x1A: "Q", 0x12: "f"}.get(data[offset], "B"))
+                        indices[key] = array.array("L", [0])
+
+                    if data[offset] == 0x0A:
+                        length = get_value_of_kind(0x0A) and get_value_of_kind(0x0A)
+                        arrays[key].frombytes(data[offset:offset + length]); offset += length
+                    elif data[offset] == 0x1A:
+                        length = get_value_of_kind(0x1A) and get_value_of_kind(0x0A)
+                        target_offset = offset + length
+                        while offset < target_offset:
+                            arrays[key].append(get_value())
+                    elif data[offset] == 0x12:
+                        length = get_value_of_kind(0x12) and get_value_of_kind(0x0A)
+                        arrays[key].frombytes(np.frombuffer(
+                            data, np.dtype("<f4"), length >> 2, offset).astype(np.float32).tobytes()); offset += length
+                    else:
+                        raise ValueError("Unsupported data tag {}".format(data[offset]))
+                    indices[key].append(len(arrays[key]))
+        return arrays, indices
+
+    # Evaluation infrastructure.
+    @staticmethod
+    def evaluate(
+        gold_dataset: "SVHN.Dataset", predictions: Sequence[tuple[list[int], list[BBox]]], iou_threshold: float = 0.5,
+    ) -> float:
+        def bbox_iou(x: SVHN.BBox, y: SVHN.BBox) -> float:
+            def area(bbox: SVHN.BBox) -> float:
+                return max(bbox[SVHN.BOTTOM] - bbox[SVHN.TOP], 0) * max(bbox[SVHN.RIGHT] - bbox[SVHN.LEFT], 0)
+            intersection = [max(x[SVHN.TOP], y[SVHN.TOP]), max(x[SVHN.LEFT], y[SVHN.LEFT]),
+                            min(x[SVHN.BOTTOM], y[SVHN.BOTTOM]), min(x[SVHN.RIGHT], y[SVHN.RIGHT])]
+            x_area, y_area, intersection_area = area(x), area(y), area(intersection)
+            return intersection_area / (x_area + y_area - intersection_area)
+
+        gold = [(np.array(example["classes"]), np.array(example["bboxes"])) for example in gold_dataset]
+
+        if len(predictions) != len(gold):
+            raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format(
+                len(predictions), len(gold)))
+
+        correct = 0
+        for (gold_classes, gold_bboxes), (prediction_classes, prediction_bboxes) in zip(gold, predictions):
+            if len(gold_classes) != len(prediction_classes):
+                continue
+
+            used = [False] * len(gold_classes)
+            for cls, bbox in zip(prediction_classes, prediction_bboxes):
+                best = None
+                for i in range(len(gold_classes)):
+                    if used[i] or gold_classes[i] != cls:
+                        continue
+                    iou = bbox_iou(bbox, gold_bboxes[i])
+                    if iou >= iou_threshold and (best is None or iou > best_iou):
+                        best, best_iou = i, iou
+                if best is None:
+                    break
+                used[best] = True
+            correct += all(used)
+
+        return 100 * correct / len(gold)
+
+    @staticmethod
+    def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float:
+        predictions = []
+        for line in predictions_file:
+            values = line.split()
+            if len(values) % 5:
+                raise RuntimeError("Each prediction must contain multiple of 5 numbers, found {}".format(len(values)))
+
+            predictions.append(([], []))
+            for i in range(0, len(values), 5):
+                predictions[-1][0].append(int(values[i]))
+                predictions[-1][1].append([float(value) for value in values[i + 1:i + 5]])
+
+        return SVHN.evaluate(gold_dataset, predictions)
+
+    # Visualization infrastructure.
+    @staticmethod
+    def visualize(image: np.ndarray, labels: list[Any], bboxes: list[BBox], show: bool):
+        """Visualize the given image plus recognized objects.
+
+        Arguments:
+        - `image` is NumPy input image with pixels in range [0-255];
+        - `labels` is a list of labels to be shown using the `str` method;
+        - `bboxes` is a list of `BBox`es (fourtuples TOP, LEFT, BOTTOM, RIGHT);
+        - `show` controls whether to show the figure or return it:
+          - if `True`, the figure is shown using `plt.show()`;
+          - if `False`, the `plt.Figure` instance is returned; it can be saved
+            to TensorBoard using a the `add_figure` method of a `SummaryWriter`.
+        """
+        import matplotlib.pyplot as plt
+
+        figure = plt.figure(figsize=(4, 4))
+        plt.axis("off")
+        plt.imshow(np.asarray(image, np.uint8))
+        for label, (top, left, bottom, right) in zip(labels, bboxes):
+            plt.gca().add_patch(plt.Rectangle(
+                [left, top], right - left, bottom - top, fill=False, edgecolor=[1, 0, 1], linewidth=2))
+            plt.gca().text(left, top, str(label), bbox={"facecolor": [1, 0, 1], "alpha": 0.5},
+                           clip_box=plt.gca().clipbox, clip_on=True, ha="left", va="top")
+
+        if show:
+            plt.show()
+        else:
+            return figure
+
+
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate")
+    parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate")
+    parser.add_argument("--visualize", default=None, type=str, help="Prediction file to visualize")
+    args = parser.parse_args()
+
+    if args.evaluate:
+        with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file:
+            accuracy = SVHN.evaluate_file(getattr(SVHN(decode_on_demand=True), args.dataset), predictions_file)
+        print("SVHN accuracy: {:.2f}%".format(accuracy))
+
+    if args.visualize:
+        with open(args.visualize, "r", encoding="utf-8-sig") as predictions_file:
+            for line, example in zip(predictions_file, getattr(SVHN(decode_on_demand=True), args.dataset)):
+                values = line.split()
+                classes, bboxes = [], []
+                for i in range(0, len(values), 5):
+                    classes.append(values[i])
+                    bboxes.append([float(value) for value in values[i + 1:i + 5]])
+                SVHN.visualize(example["image"], classes, bboxes, show=True)
diff --git a/labs/07/3d_recognition.py b/labs/07/3d_recognition.py
new file mode 100644
index 0000000..fefe6d1
--- /dev/null
+++ b/labs/07/3d_recognition.py
@@ -0,0 +1,81 @@
+#!/usr/bin/env python3
+import argparse
+import datetime
+import os
+import re
+os.environ.setdefault("KERAS_BACKEND", "torch")  # Use PyTorch backend unless specified otherwise
+
+import keras
+import numpy as np
+import torch
+
+from modelnet import ModelNet
+
+# TODO: Define reasonable defaults and optionally more parameters.
+# Also, you can set the number of threads to 0 to use all your CPU cores.
+parser = argparse.ArgumentParser()
+parser.add_argument("--batch_size", default=..., type=int, help="Batch size.")
+parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.")
+parser.add_argument("--modelnet", default=20, type=int, help="ModelNet dimension.")
+parser.add_argument("--seed", default=42, type=int, help="Random seed.")
+parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+
+
+class TorchTensorBoardCallback(keras.callbacks.Callback):
+    def __init__(self, path):
+        self._path = path
+        self._writers = {}
+
+    def writer(self, writer):
+        if writer not in self._writers:
+            import torch.utils.tensorboard
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))
+        return self._writers[writer]
+
+    def add_logs(self, writer, logs, step):
+        if logs:
+            for key, value in logs.items():
+                self.writer(writer).add_scalar(key, value, step)
+            self.writer(writer).flush()
+
+    def on_epoch_end(self, epoch, logs=None):
+        if logs:
+            if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer):
+                logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}
+            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1)
+            self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1)
+
+
+def main(args: argparse.Namespace) -> None:
+    # Set the random seed and the number of threads.
+    keras.utils.set_random_seed(args.seed)
+    if args.threads:
+        torch.set_num_threads(args.threads)
+        torch.set_num_interop_threads(args.threads)
+
+    # Create logdir name
+    args.logdir = os.path.join("logs", "{}-{}-{}".format(
+        os.path.basename(globals().get("__file__", "notebook")),
+        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
+    ))
+
+    # Load the data
+    modelnet = ModelNet(args.modelnet)
+
+    # TODO: Create the model and train it
+    model = ...
+
+    # Generate test set annotations, but in `args.logdir` to allow parallel execution.
+    os.makedirs(args.logdir, exist_ok=True)
+    with open(os.path.join(args.logdir, "3d_recognition.txt"), "w", encoding="utf-8") as predictions_file:
+        # TODO: Predict the probabilities on the test set
+        test_probabilities = model.predict(...)
+
+        for probs in test_probabilities:
+            print(np.argmax(probs), file=predictions_file)
+
+
+if __name__ == "__main__":
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+    main(args)
diff --git a/labs/07/modelnet.py b/labs/07/modelnet.py
new file mode 100644
index 0000000..2e7d513
--- /dev/null
+++ b/labs/07/modelnet.py
@@ -0,0 +1,108 @@
+import os
+import sys
+from typing import Any, Callable, Sequence, TextIO, TypedDict
+import urllib.request
+
+import numpy as np
+import torch
+
+
+class ModelNet:
+    # The D, H, W are set in the constructor depending
+    # on requested resolution and are only instance variables.
+    D: int
+    H: int
+    W: int
+    C: int = 1
+    LABELS: list[str] = [
+        "bathtub", "bed", "chair", "desk", "dresser", "monitor", "night_stand", "sofa", "table", "toilet",
+    ]
+    Element = TypedDict("Element", {"grid": np.ndarray, "label": np.ndarray})
+    Elements = TypedDict("Elements", {"grids": np.ndarray, "labels": np.ndarray})
+
+    _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/modelnet{}.npz"
+
+    class Dataset(torch.utils.data.Dataset):
+        def __init__(self, data: "ModelNet.Elements", seed: int = 42) -> None:
+            self._data = data
+
+        @property
+        def data(self) -> "ModelNet.Elements":
+            return self._data
+
+        def __len__(self) -> int:
+            return len(self._data["grids"])
+
+        def __getitem__(self, index: int) -> "ModelNet.Element":
+            return {key.removesuffix("s"): value[index] for key, value in self._data.items()}
+
+        def transform(self, transform: Callable[["ModelNet.Element"], Any]) -> "ModelNet.TransformedDataset":
+            return ModelNet.TransformedDataset(self, transform)
+
+    class TransformedDataset(torch.utils.data.Dataset):
+        def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None:
+            self._dataset = dataset
+            self._transform = transform
+
+        def __len__(self) -> int:
+            return len(self._dataset)
+
+        def __getitem__(self, index: int) -> Any:
+            item = self._dataset[index]
+            return self._transform(*item) if isinstance(item, tuple) else self._transform(item)
+
+        def transform(self, transform: Callable[..., Any]) -> "ModelNet.TransformedDataset":
+            return ModelNet.TransformedDataset(self, transform)
+
+    # The resolution parameter can be either 20 or 32.
+    def __init__(self, resolution: int) -> None:
+        assert resolution in [20, 32], "Only 20 or 32 resolution is supported"
+
+        self.D = self.H = self.W = resolution
+        url = self._URL.format(resolution)
+
+        path = os.path.basename(url)
+        if not os.path.exists(path):
+            print("Downloading {} dataset...".format(path), file=sys.stderr)
+            urllib.request.urlretrieve(url, filename="{}.tmp".format(path))
+            os.rename("{}.tmp".format(path), path)
+
+        modelnet = np.load(path)
+        for dataset, _size in [("train", 3_718), ("dev", 273), ("test", 908)]:
+            data = dict((key[len(dataset) + 1:], modelnet[key]) for key in modelnet if key.startswith(dataset))
+            setattr(self, dataset, self.Dataset(data))
+
+    train: Dataset
+    dev: Dataset
+    test: Dataset
+
+    # Evaluation infrastructure.
+    @staticmethod
+    def evaluate(gold_dataset: Dataset, predictions: Sequence[int]) -> float:
+        gold = gold_dataset.data["labels"]
+
+        if len(predictions) != len(gold):
+            raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format(
+                len(predictions), len(gold)))
+
+        correct = sum(gold[i] == predictions[i] for i in range(len(gold)))
+        return 100 * correct / len(gold)
+
+    @staticmethod
+    def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float:
+        predictions = [int(line) for line in predictions_file]
+        return ModelNet.evaluate(gold_dataset, predictions)
+
+
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate")
+    parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate")
+    parser.add_argument("--dim", default=20, type=int, help="ModelNet dimensionality to use")
+    args = parser.parse_args()
+
+    if args.evaluate:
+        with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file:
+            accuracy = ModelNet.evaluate_file(getattr(ModelNet(args.dim), args.dataset), predictions_file)
+        print("ModelNet accuracy: {:.2f}%".format(accuracy))
diff --git a/labs/08/morpho_analyzer.py b/labs/08/morpho_analyzer.py
new file mode 100644
index 0000000..d0f0fa7
--- /dev/null
+++ b/labs/08/morpho_analyzer.py
@@ -0,0 +1,45 @@
+import os
+import sys
+import urllib.request
+import zipfile
+
+
+class MorphoAnalyzer:
+    """ Loads a morphological analyses in a vertical format.
+
+    The analyzer provides only a method `get(word: str)` returning a list
+    of analyses, each containing two fields `lemma` and `tag`.
+    If an analysis of the word is not found, an empty list is returned.
+    """
+
+    _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/"
+
+    class LemmaTag:
+        def __init__(self, lemma: str, tag: str) -> None:
+            self.lemma = lemma
+            self.tag = tag
+
+        def __repr__(self) -> str:
+            return "(lemma: {}, tag: {})".format(self.lemma, self.tag)
+
+    def __init__(self, dataset: str) -> None:
+        path = "{}.zip".format(dataset)
+        if not os.path.exists(path):
+            print("Downloading dataset {}...".format(dataset), file=sys.stderr)
+            urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path))
+            os.rename("{}.tmp".format(path), path)
+
+        self.analyses = {}
+        with zipfile.ZipFile(path, "r") as zip_file:
+            with zip_file.open("{}.txt".format(dataset), "r") as analyses_file:
+                for line in analyses_file:
+                    line = line.decode("utf-8").rstrip("\n")
+                    columns = line.split("\t")
+
+                    analyses = []
+                    for i in range(1, len(columns) - 1, 2):
+                        analyses.append(self.LemmaTag(columns[i], columns[i + 1]))
+                    self.analyses[columns[0]] = analyses
+
+    def get(self, word: str) -> list[LemmaTag]:
+        return self.analyses.get(word, [])
diff --git a/labs/08/morpho_dataset.py b/labs/08/morpho_dataset.py
new file mode 100644
index 0000000..5a47c41
--- /dev/null
+++ b/labs/08/morpho_dataset.py
@@ -0,0 +1,253 @@
+import os
+import sys
+from typing import Any, BinaryIO, Callable, Sequence, TextIO, TypedDict
+import urllib.request
+import zipfile
+
+import torch
+
+
+# A class for managing mapping between strings and indices.
+# It provides:
+# - `__len__`: number of strings in the vocabulary
+# - `string(index: int) -> str`: string for a given index to the vocabulary
+# - `strings(indices: Sequence[int]) -> list[str]`: list of strings for given indices
+# - `index(string: str) -> int`: index of a given string in the vocabulary
+# - `indices(strings: Sequence[str]) -> list[int]`: list of indices for given strings
+class Vocabulary:
+    PAD: int = 0
+    UNK: int = 1
+
+    def __init__(self, strings: Sequence[str]) -> None:
+        self._strings = ["[PAD]", "[UNK]"]
+        self._strings.extend(strings)
+        self._string_map = {string: index for index, string in enumerate(self._strings)}
+
+    def __len__(self) -> int:
+        return len(self._strings)
+
+    def string(self, index: int) -> str:
+        return self._strings[index]
+
+    def strings(self, indices: Sequence[int]) -> list[str]:
+        return [self._strings[index] for index in indices]
+
+    def index(self, string: str) -> int:
+        return self._string_map.get(string, Vocabulary.UNK)
+
+    def indices(self, strings: Sequence[str]) -> list[int]:
+        return [self._string_map.get(string, Vocabulary.UNK) for string in strings]
+
+
+# Loads a morphological dataset in a vertical format.
+# - The data consists of three datasets
+#   - `train`
+#   - `dev`
+#   - `test`
+# - Each dataset is a `torch.utils.data.Dataset` providing
+#   - `__len__`: number of sentences in the dataset
+#   - `__getitem__`: return the requested sentence as an `Element`
+#     instance, which is a dictionary with keys "forms"/"lemmas"/"tags",
+#     each being a list of strings
+#   - `forms`, `lemmas`, `tags`: instances of type `Factor` containing
+#     the following fields:
+#     - `strings`: a Python list containing input sentences, each being
+#       a list of strings (forms/lemmas/tags)
+#     - `word_vocab`: a `Vocabulary` object capable of mapping words to
+#       indices. It is constructed on the train set and shared by the dev
+#       and test sets
+#     - `char_vocab`: a `Vocabulary` object capable of mapping characters
+#       to  indices. It is constructed on the train set and shared by the dev
+#       and test sets
+#   - `cle_batch`: a method for creating inputs for character-level embeddings.
+#     It takes a list of sentences, each being a list of string forms, and produces
+#     a tuple of two tensors:
+#     - `unique_forms` with shape `[num_unique_forms, max_form_length]` containing
+#       each unique form as a sequence of character ids
+#     - `forms_indices` with shape `[num_sentences, max_sentence_length]`
+#       containing for every form its index in `unique_forms`
+class MorphoDataset:
+    PAD: int = 0
+    UNK: int = 1
+    BOW: int = 2
+    EOW: int = 3
+
+    _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/"
+
+    Element = TypedDict("Element", {"forms": list[str], "lemmas": list[str], "tags": list[str]})
+
+    class Factor:
+        word_vocab: Vocabulary
+        char_vocab: Vocabulary
+        strings: list[list[str]]
+
+        def __init__(self) -> None:
+            self.strings = []
+
+        def finalize(self, train: Any | None = None) -> None:
+            # Create vocabularies
+            if train:
+                self.word_vocab = train.word_vocab
+                self.char_vocab = train.char_vocab
+            else:
+                strings = sorted(set(string for sentence in self.strings for string in sentence))
+                self.word_vocab = Vocabulary(strings)
+
+                bow_eow = ["[BOW]", "[EOW]"]
+                self.char_vocab = Vocabulary(bow_eow + sorted(set(char for string in strings for char in string)))
+
+    class Dataset(torch.utils.data.Dataset):
+        def __init__(self, data_file: BinaryIO, train: Any | None = None, max_sentences: int | None = None) -> None:
+            # Create factors
+            self._factors = (MorphoDataset.Factor(), MorphoDataset.Factor(), MorphoDataset.Factor())
+            self._factors_tensors = None
+
+            # Load the data
+            self._size = 0
+            in_sentence = False
+            for line in data_file:
+                line = line.decode("utf-8").rstrip("\r\n")
+                if line:
+                    if not in_sentence:
+                        for factor in self._factors:
+                            factor.strings.append([])
+                        self._size += 1
+
+                    columns = line.split("\t")
+                    assert len(columns) == len(self._factors)
+                    for column, factor in zip(columns, self._factors):
+                        factor.strings[-1].append(column)
+
+                    in_sentence = True
+                else:
+                    in_sentence = False
+                    if max_sentences is not None and self._size >= max_sentences:
+                        break
+
+            # Finalize the mappings
+            for i, factor in enumerate(self._factors):
+                factor.finalize(train._factors[i] if train else None)
+
+        @property
+        def forms(self) -> "MorphoDataset.Factor":
+            return self._factors[0]
+
+        @property
+        def lemmas(self) -> "MorphoDataset.Factor":
+            return self._factors[1]
+
+        @property
+        def tags(self) -> "MorphoDataset.Factor":
+            return self._factors[2]
+
+        def __len__(self) -> int:
+            return self._size
+
+        def __getitem__(self, index: int) -> "MorphoDataset.Element":
+            return {"forms": self.forms.strings[index],
+                    "lemmas": self.lemmas.strings[index],
+                    "tags": self.tags.strings[index]}
+
+        def transform(self, transform: Callable[["MorphoDataset.Element"], Any]) -> "MorphoDataset.TransformedDataset":
+            return MorphoDataset.TransformedDataset(self, transform)
+
+        def cle_batch(self, forms: list[list[str]]) -> tuple[torch.Tensor, torch.Tensor]:
+            unique_strings = list(set(form for sentence in forms for form in sentence))
+            unique_string_map = {form: index + 1 for index, form in enumerate(unique_strings)}
+            unique_forms = torch.nn.utils.rnn.pad_sequence(
+                [torch.tensor([MorphoDataset.UNK])]
+                + [torch.tensor(self.forms.char_vocab.indices(form)) for form in unique_strings], batch_first=True)
+            forms_indices = torch.nn.utils.rnn.pad_sequence(
+                [torch.tensor([unique_string_map[form] for form in sentence]) for sentence in forms], batch_first=True)
+            return unique_forms, forms_indices
+
+    class TransformedDataset(torch.utils.data.Dataset):
+        def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None:
+            self._dataset = dataset
+            self._transform = transform
+
+        def __len__(self) -> int:
+            return len(self._dataset)
+
+        def __getitem__(self, index: int) -> Any:
+            item = self._dataset[index]
+            return self._transform(*item) if isinstance(item, tuple) else self._transform(item)
+
+        def transform(self, transform: Callable[..., Any]) -> "MorphoDataset.TransformedDataset":
+            return MorphoDataset.TransformedDataset(self, transform)
+
+    def __init__(self, dataset, max_sentences=None):
+        path = "{}.zip".format(dataset)
+        if not os.path.exists(path):
+            print("Downloading dataset {}...".format(dataset), file=sys.stderr)
+            urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path))
+            os.rename("{}.tmp".format(path), path)
+
+        with zipfile.ZipFile(path, "r") as zip_file:
+            for dataset in ["train", "dev", "test"]:
+                with zip_file.open("{}_{}.txt".format(os.path.splitext(path)[0], dataset), "r") as dataset_file:
+                    setattr(self, dataset, self.Dataset(
+                        dataset_file, train=self.train if dataset != "train" else None,
+                        max_sentences=max_sentences))
+
+    train: Dataset
+    dev: Dataset
+    test: Dataset
+
+    # Evaluation infrastructure.
+    @staticmethod
+    def evaluate(gold_dataset: "MorphoDataset.Factor", predictions: Sequence[str]) -> float:
+        gold_sentences = gold_dataset.strings
+
+        predicted_sentences, in_sentence = [], False
+        for line in predictions:
+            line = line.rstrip("\n")
+            if not line:
+                in_sentence = False
+            else:
+                if not in_sentence:
+                    predicted_sentences.append([])
+                    in_sentence = True
+                predicted_sentences[-1].append(line)
+
+        if len(predicted_sentences) != len(gold_sentences):
+            raise RuntimeError("The predictions contain different number of sentences than gold data: {} vs {}".format(
+                len(predicted_sentences), len(gold_sentences)))
+
+        correct, total = 0, 0
+        for i, (predicted_sentence, gold_sentence) in enumerate(zip(predicted_sentences, gold_sentences)):
+            if len(predicted_sentence) != len(gold_sentence):
+                raise RuntimeError("Predicted sentence {} has different number of words than gold: {} vs {}".format(
+                    i + 1, len(predicted_sentence), len(gold_sentence)))
+            correct += sum(predicted == gold for predicted, gold in zip(predicted_sentence, gold_sentence))
+            total += len(predicted_sentence)
+
+        return 100 * correct / total
+
+    @staticmethod
+    def evaluate_file(gold_dataset: "MorphoDataset.Factor", predictions_file: TextIO) -> float:
+        predictions = predictions_file.readlines()
+        return MorphoDataset.evaluate(gold_dataset, predictions)
+
+
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate")
+    parser.add_argument("--corpus", default="czech_pdt", type=str, help="The corpus to evaluate")
+    parser.add_argument("--dataset", default="dev", type=str, help="The dataset to evaluate (dev/test)")
+    parser.add_argument("--task", default="tagger", type=str, help="Task to evaluate (tagger/lemmatizer)")
+    args = parser.parse_args()
+
+    if args.evaluate:
+        gold = getattr(MorphoDataset(args.corpus), args.dataset)
+        if args.task == "tagger":
+            gold = gold.tags
+        elif args.task == "lemmatizer":
+            gold = gold.lemmas
+        else:
+            raise ValueError("Unknown task '{}', valid values are only 'tagger' or 'lemmatizer'".format(args.task))
+
+        with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file:
+            accuracy = MorphoDataset.evaluate_file(gold, predictions_file)
+        print("{} accuracy: {:.2f}%".format(args.task.title(), accuracy))
diff --git a/labs/08/sequence_classification.py b/labs/08/sequence_classification.py
new file mode 100644
index 0000000..05a43bb
--- /dev/null
+++ b/labs/08/sequence_classification.py
@@ -0,0 +1,222 @@
+#!/usr/bin/env python3
+import argparse
+import datetime
+import os
+import re
+
+os.environ.setdefault(
+    "KERAS_BACKEND", "torch"
+)  # Use PyTorch backend unless specified otherwise
+
+import keras
+import numpy as np
+import torch
+
+parser = argparse.ArgumentParser()
+# These arguments will be set appropriately by ReCodEx, even if you change them.
+parser.add_argument("--batch_size", default=16, type=int, help="Batch size.")
+parser.add_argument(
+    "--clip_gradient", default=None, type=float, help="Norm for gradient clipping."
+)
+parser.add_argument("--epochs", default=20, type=int, help="Number of epochs.")
+parser.add_argument(
+    "--hidden_layer", default=0, type=int, help="Additional hidden layer after RNN."
+)
+parser.add_argument(
+    "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx."
+)
+parser.add_argument(
+    "--rnn",
+    default="LSTM",
+    choices=["LSTM", "GRU", "SimpleRNN"],
+    help="RNN layer type.",
+)
+parser.add_argument("--rnn_dim", default=10, type=int, help="RNN layer dimension.")
+parser.add_argument("--seed", default=42, type=int, help="Random seed.")
+parser.add_argument(
+    "--sequence_dim", default=1, type=int, help="Sequence element dimension."
+)
+parser.add_argument("--sequence_length", default=50, type=int, help="Sequence length.")
+parser.add_argument(
+    "--test_sequences", default=1000, type=int, help="Number of testing sequences."
+)
+parser.add_argument(
+    "--threads", default=1, type=int, help="Maximum number of threads to use."
+)
+parser.add_argument(
+    "--train_sequences", default=10000, type=int, help="Number of training sequences."
+)
+# If you add more arguments, ReCodEx will keep them with your default values.
+
+
+class TorchTensorBoardCallback(keras.callbacks.Callback):
+    def __init__(self, path):
+        self._path = path
+        self._writers = {}
+
+    def writer(self, writer):
+        if writer not in self._writers:
+            import torch.utils.tensorboard
+
+            self._writers[writer] = torch.utils.tensorboard.SummaryWriter(
+                os.path.join(self._path, writer)
+            )
+        return self._writers[writer]
+
+    def add_logs(self, writer, logs, step):
+        if logs:
+            for key, value in logs.items():
+                self.writer(writer).add_scalar(key, value, step)
+            self.writer(writer).flush()
+
+    def on_epoch_end(self, epoch, logs=None):
+        if logs:
+            if isinstance(
+                getattr(self.model, "optimizer", None), keras.optimizers.Optimizer
+            ):
+                logs = logs | {
+                    "learning_rate": keras.ops.convert_to_numpy(
+                        self.model.optimizer.learning_rate
+                    )
+                }
+            self.add_logs(
+                "train",
+                {k: v for k, v in logs.items() if not k.startswith("val_")},
+                epoch + 1,
+            )
+            self.add_logs(
+                "val",
+                {k[4:]: v for k, v in logs.items() if k.startswith("val_")},
+                epoch + 1,
+            )
+
+
+# Dataset for generating sequences, with labels predicting whether the cumulative sum
+# is odd/even.
+class Dataset:
+    def __init__(
+        self, sequences_num: int, sequence_length: int, sequence_dim: int, seed: int
+    ) -> None:
+        sequences = np.zeros([sequences_num, sequence_length, sequence_dim], np.int32)
+        labels = np.zeros([sequences_num, sequence_length, 1], bool)
+        generator = np.random.RandomState(seed)
+        for i in range(sequences_num):
+            sequences[i, :, 0] = generator.randint(
+                0, max(2, sequence_dim), size=[sequence_length]
+            )
+            labels[i, :, 0] = np.bitwise_and(np.cumsum(sequences[i, :, 0]), 1)
+            if sequence_dim > 1:
+                sequences[i] = np.eye(sequence_dim)[sequences[i, :, 0]]
+        self._data = {"sequences": sequences.astype(np.float32), "labels": labels}
+        self._size = sequences_num
+
+    @property
+    def data(self) -> dict[str, np.ndarray]:
+        return self._data
+
+    @property
+    def size(self) -> int:
+        return self._size
+
+
+class Model(keras.Model):
+    def __init__(self, args: argparse.Namespace) -> None:
+        # Construct the model.
+        sequences = keras.Input(shape=[args.sequence_length, args.sequence_dim])
+
+        # DO: Process the sequence using a RNN with type `args.rnn` and
+        # with dimensionality `args.rnn_dim`. Use `return_sequences=True`
+        # to get outputs for all sequence elements.
+        #
+        # Prefer `keras.layers.{LSTM,GRU,SimpleRNN}` to
+        # `keras.layers.RNN` wrapper with `keras.layers.{LSTM,GRU,SimpleRNN}Cell`,
+        # because the former is considerably faster (even if the GPU support in
+        # PyTorch is not optimal in the current Keras 3.2.1.)
+
+        layer_type = (
+            keras.layers.LSTM
+            if args.rnn == "LSTM"
+            else keras.layers.GRU
+            if args.rnn == "GRU"
+            else keras.layers.SimpleRNN
+        )
+
+        hidden = layer_type(units=args.rnn_dim, return_sequences=True)(sequences)
+
+        # DO: If `args.hidden_layer` is nonzero, process the result using
+        # a ReLU-activated fully connected layer with `args.hidden_layer` units.
+
+        if args.hidden_layer:
+            hidden = keras.layers.Dense(args.hidden_layer, activation="relu")(hidden)
+
+        # DO: Generate `predictions` using a fully connected layer
+        # with one output and sigmoid activation.
+
+        predictions = keras.layers.Dense(1, activation="sigmoid")(hidden)
+
+        super().__init__(inputs=sequences, outputs=predictions)
+
+        self.compile(
+            # DO: Create an Adam optimizer, passing the option `clipnorm=args.clip_gradient`
+            # to clip the gradient, with `None` representing no clipping (the default).
+            optimizer=keras.optimizers.Adam(clipnorm=args.clip_gradient),
+            loss=keras.losses.BinaryCrossentropy(),
+            metrics=[keras.metrics.BinaryAccuracy("accuracy")],
+        )
+
+        self.tb_callback = TorchTensorBoardCallback(args.logdir)
+
+
+def main(args: argparse.Namespace) -> dict[str, float]:
+    # Set the random seed and the number of threads.
+    keras.utils.set_random_seed(args.seed)
+    if args.threads:
+        torch.set_num_threads(args.threads)
+        torch.set_num_interop_threads(args.threads)
+
+    # Create logdir name
+    args.logdir = os.path.join(
+        "logs",
+        "{}-{}-{}".format(
+            os.path.basename(globals().get("__file__", "notebook")),
+            datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+            ",".join(
+                (
+                    "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v)
+                    for k, v in sorted(vars(args).items())
+                )
+            ),
+        ),
+    )
+
+    # Create the data
+    train = Dataset(
+        args.train_sequences, args.sequence_length, args.sequence_dim, seed=42
+    )
+    test = Dataset(
+        args.test_sequences, args.sequence_length, args.sequence_dim, seed=43
+    )
+
+    # Create the model and train
+    model = Model(args)
+
+    logs = model.fit(
+        train.data["sequences"],
+        train.data["labels"],
+        batch_size=args.batch_size,
+        epochs=args.epochs,
+        validation_data=(test.data["sequences"], test.data["labels"]),
+        callbacks=[model.tb_callback],
+    )
+
+    # Return development metrics for ReCodEx to validate.
+    return {
+        metric: values[-1]
+        for metric, values in logs.history.items()
+        if metric.startswith("val_")
+    }
+
+
+if __name__ == "__main__":
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+    main(args)
diff --git a/labs/08/tagger_cle.py b/labs/08/tagger_cle.py
new file mode 100644
index 0000000..b2cda59
--- /dev/null
+++ b/labs/08/tagger_cle.py
@@ -0,0 +1,562 @@
+#!/usr/bin/env python3
+import argparse
+import datetime
+import os
+import re
+
+import numpy as np
+import torch
+import torchmetrics
+
+from morpho_dataset import MorphoDataset
+
+parser = argparse.ArgumentParser()
+# These arguments will be set appropriately by ReCodEx, even if you change them.
+parser.add_argument("--batch_size", default=10, type=int, help="Batch size.")
+parser.add_argument("--cle_dim", default=32, type=int, help="CLE embedding dimension.")
+parser.add_argument("--epochs", default=5, type=int, help="Number of epochs.")
+parser.add_argument(
+    "--max_sentences",
+    default=None,
+    type=int,
+    help="Maximum number of sentences to load.",
+)
+parser.add_argument(
+    "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx."
+)
+parser.add_argument(
+    "--rnn", default="LSTM", choices=["LSTM", "GRU"], help="RNN layer type."
+)
+parser.add_argument("--rnn_dim", default=64, type=int, help="RNN layer dimension.")
+parser.add_argument("--seed", default=42, type=int, help="Random seed.")
+parser.add_argument(
+    "--threads", default=1, type=int, help="Maximum number of threads to use."
+)
+parser.add_argument("--we_dim", default=64, type=int, help="Word embedding dimension.")
+parser.add_argument(
+    "--word_masking",
+    default=0.0,
+    type=float,
+    help="Mask words with the given probability.",
+)
+# If you add more arguments, ReCodEx will keep them with your default values.
+
+
+class TrainableModule(torch.nn.Module):
+    """A simple Keras-like module for training with raw PyTorch.
+
+    The module provides fit/evaluate/predict methods, computes loss and metrics,
+    and generates both TensorBoard and console logs. By default, it uses GPU
+    if available, and CPU otherwise. Additionally, it offers a Keras-like
+    initialization of the weights.
+
+    The current implementation supports models with either single input or
+    a tuple of inputs; however, only one output is currently supported.
+    """
+
+    from torch.utils.tensorboard import SummaryWriter as _SummaryWriter
+    from time import time as _time
+    from tqdm import tqdm as _tqdm
+
+    def configure(
+        self,
+        *,
+        optimizer=None,
+        schedule=None,
+        loss=None,
+        metrics={},
+        logdir=None,
+        device="auto",
+    ):
+        """Configure the module process.
+
+        - `optimizer` is the optimizer to use for training;
+        - `schedule` is an optional learning rate scheduler used after every batch;
+        - `loss` is the loss function to minimize;
+        - `metrics` is a dictionary of additional metrics to compute;
+        - `logdir` is an optional directory where TensorBoard logs should be written;
+        - `device` is the device to use; when "auto", `cuda` is used when available, `cpu` otherwise.
+        """
+        self.optimizer = optimizer
+        self.schedule = schedule
+        self.loss, self.loss_metric = loss, torchmetrics.MeanMetric()
+        self.metrics = torchmetrics.MetricCollection(metrics)
+        self.logdir, self._writers = logdir, {}
+        self.device = torch.device(
+            ("cuda" if torch.cuda.is_available() else "cpu")
+            if device == "auto"
+            else device
+        )
+        self.to(self.device)
+
+    def load_weights(self, path, device="auto"):
+        """Load the model weights from the given path."""
+        self.device = torch.device(
+            ("cuda" if torch.cuda.is_available() else "cpu")
+            if device == "auto"
+            else device
+        )
+        self.load_state_dict(torch.load(path, map_location=self.device))
+
+    def save_weights(self, path):
+        """Save the model weights to the given path."""
+        state_dict = self.state_dict()
+        torch.save(state_dict, path)
+
+    def fit(self, dataloader, epochs, dev=None, callbacks=[], verbose=1):
+        """Train the model on the given dataset.
+
+        - `dataloader` is the training dataset, each element a pair of inputs and an output;
+          the inputs can be either a single tensor or a tuple of tensors;
+        - `dev` is an optional development dataset;
+        - `epochs` is the number of epochs to train;
+        - `callbacks` is a list of callbacks to call after each epoch with
+          arguments `self`, `epoch`, and `logs`;
+        - `verbose` controls the verbosity: 0 for silent, 1 for persistent progress bar,
+          2 for a progress bar only when writing to a console.
+        """
+        for epoch in range(epochs):
+            self.train()
+            self.loss_metric.reset()
+            self.metrics.reset()
+            start = self._time()
+            epoch_message = f"Epoch={epoch+1}/{epochs}"
+            data_and_progress = self._tqdm(
+                dataloader,
+                epoch_message,
+                unit="batch",
+                leave=False,
+                disable=None if verbose == 2 else not verbose,
+            )
+            for xs, y in data_and_progress:
+                xs, y = (
+                    tuple(
+                        x.to(self.device)
+                        for x in (xs if isinstance(xs, tuple) else (xs,))
+                    ),
+                    y.to(self.device),
+                )
+                logs = self.train_step(xs, y)
+                message = [epoch_message] + [
+                    f"{k}={v:.{0<abs(v)<2e-4 and '3g' or '4f'}}"
+                    for k, v in logs.items()
+                ]
+                data_and_progress.set_description(" ".join(message), refresh=False)
+            if dev is not None:
+                logs |= {
+                    "dev_" + k: v for k, v in self.evaluate(dev, verbose=0).items()
+                }
+            for callback in callbacks:
+                callback(self, epoch, logs)
+            self.add_logs(
+                "train",
+                {k: v for k, v in logs.items() if not k.startswith("dev_")},
+                epoch + 1,
+            )
+            self.add_logs(
+                "dev",
+                {k[4:]: v for k, v in logs.items() if k.startswith("dev_")},
+                epoch + 1,
+            )
+            verbose and print(
+                epoch_message,
+                "{:.1f}s".format(self._time() - start),
+                *[
+                    f"{k}={v:.{0<abs(v)<2e-4 and '3g' or '4f'}}"
+                    for k, v in logs.items()
+                ],
+            )
+        return logs
+
+    def train_step(self, xs, y):
+        """An overridable method performing a single training step.
+
+        A dictionary with the loss and metrics should be returned."""
+        self.zero_grad()
+        y_pred = self.forward(*xs)
+        loss = self.loss(y_pred, y)
+        loss.backward()
+        with torch.no_grad():
+            self.optimizer.step()
+            self.schedule is not None and self.schedule.step()
+            self.loss_metric.update(loss)
+            self.metrics.update(y_pred, y)
+            return (
+                {"loss": self.loss_metric.compute()}
+                | ({"lr": self.schedule.get_last_lr()[0]} if self.schedule else {})
+                | self.metrics.compute()
+            )
+
+    def evaluate(self, dataloader, verbose=1):
+        """An evaluation of the model on the given dataset.
+
+        - `dataloader` is the dataset to evaluate on, each element a pair of inputs
+          and an output, the inputs either a single tensor or a tuple of tensors;
+        - `verbose` controls the verbosity: 0 for silent, 1 for a single message."""
+        self.eval()
+        self.loss_metric.reset()
+        self.metrics.reset()
+        for xs, y in dataloader:
+            xs, y = (
+                tuple(
+                    x.to(self.device) for x in (xs if isinstance(xs, tuple) else (xs,))
+                ),
+                y.to(self.device),
+            )
+            logs = self.test_step(xs, y)
+        verbose and print(
+            "Evaluation",
+            *[f"{k}={v:.{0<abs(v)<2e-4 and '3g' or '4f'}}" for k, v in logs.items()],
+        )
+        return logs
+
+    def test_step(self, xs, y):
+        """An overridable method performing a single evaluation step.
+
+        A dictionary with the loss and metrics should be returned."""
+        with torch.no_grad():
+            y_pred = self.forward(*xs)
+            self.loss_metric.update(self.loss(y_pred, y))
+            self.metrics.update(y_pred, y)
+            return {"loss": self.loss_metric.compute()} | self.metrics.compute()
+
+    def predict(self, dataloader, as_numpy=True):
+        """Compute predictions for the given dataset.
+
+        - `dataloader` is the dataset to predict on, each element either
+          directly the input or a tuple whose first element is the input;
+          the input can be either a single tensor or a tuple of tensors;
+        - `as_numpy` is a flag controlling whether the output should be
+          converted to a numpy array or kept as a PyTorch tensor.
+
+        The method returns a Python list whose elements are predictions
+        of the individual examples. Note that if the input was padded, so
+        will be the predictions, which will then need to be trimmed."""
+        self.eval()
+        predictions = []
+        for batch in dataloader:
+            xs = batch[0] if isinstance(batch, tuple) else batch
+            xs = tuple(
+                x.to(self.device) for x in (xs if isinstance(xs, tuple) else (xs,))
+            )
+            batch = self.predict_step(xs)
+            predictions.extend(batch.numpy(force=True) if as_numpy else batch)
+        return predictions
+
+    def predict_step(self, xs):
+        """An overridable method performing a single prediction step."""
+        with torch.no_grad():
+            return self.forward(*xs)
+
+    def writer(self, writer):
+        """Possibly create and return a TensorBoard writer for the given name."""
+        if writer not in self._writers:
+            self._writers[writer] = self._SummaryWriter(
+                os.path.join(self.logdir, writer)
+            )
+        return self._writers[writer]
+
+    def add_logs(self, writer, logs, step):
+        """Log the given dictionary to TensorBoard with a given name and step number."""
+        if logs and self.logdir:
+            for key, value in logs.items():
+                self.writer(writer).add_scalar(key, value, step)
+            self.writer(writer).flush()
+
+    @staticmethod
+    def keras_init(module):
+        """Initialize weights using the Keras defaults."""
+        if isinstance(
+            module,
+            (
+                torch.nn.Linear,
+                torch.nn.Conv1d,
+                torch.nn.Conv2d,
+                torch.nn.Conv3d,
+                torch.nn.ConvTranspose1d,
+                torch.nn.ConvTranspose2d,
+                torch.nn.ConvTranspose3d,
+            ),
+        ):
+            torch.nn.init.xavier_uniform_(module.weight)
+            if module.bias is not None:
+                torch.nn.init.zeros_(module.bias)
+        if isinstance(module, (torch.nn.Embedding, torch.nn.EmbeddingBag)):
+            torch.nn.init.uniform_(module.weight, -0.05, 0.05)
+        if isinstance(module, (torch.nn.RNNBase, torch.nn.RNNCellBase)):
+            for name, parameter in module.named_parameters():
+                "weight_ih" in name and torch.nn.init.xavier_uniform_(parameter)
+                "weight_hh" in name and torch.nn.init.orthogonal_(parameter)
+                "bias" in name and torch.nn.init.zeros_(parameter)
+                if "bias" in name and isinstance(
+                    module, (torch.nn.LSTM, torch.nn.LSTMCell)
+                ):
+                    parameter.data[module.hidden_size : module.hidden_size * 2] = 1
+
+
+class Model(TrainableModule):
+    class MaskElements(torch.nn.Module):
+        """A layer randomly masking elements with a given value."""
+
+        def __init__(self, mask_probability, mask_value):
+            super().__init__()
+            self._mask_probability = mask_probability
+            self._mask_value = mask_value
+
+        def forward(self, inputs):
+            # Only mask during training and when the mask probability is non-zero.
+            if self.training and self._mask_probability:
+                # DID: Generate a mask tensor of `torch.float32`s of the same shape
+                # as `inputs` using either `torch.rand` or `torch.rand_like`.
+                # Then replace the inputs elements whose mask value is less than
+                # `self._mask_probability` with the value of `self._mask_value`.
+                inputs = torch.where(
+                    torch.rand_like(inputs, dtype=torch.float32)
+                    < self._mask_probability,
+                    self._mask_value,
+                    inputs,
+                )
+            return inputs
+
+    def __init__(self, args: argparse.Namespace, train: MorphoDataset.Dataset) -> None:
+        super().__init__()
+
+        # Create all needed layers.
+        # DID: Create a word masking layer `self.MaskElements` with the given
+        # `args.word_masking` probability and `MorphoDataset.UNK` as the masking value.
+        self._word_masking = self.MaskElements(args.word_masking, MorphoDataset.UNK)
+
+        # DID: Create a `torch.nn.Embedding` layer for embedding the character ids
+        # from `train.forms.char_vocab` to dimensionality `args.cle_dim`.
+        self._char_embedding = torch.nn.Embedding(
+            len(train.forms.char_vocab), args.cle_dim
+        )
+
+        # DID: Create a `torch.nn.GRU` layer processing the character embeddings,
+        # producing output of dimensionality `args.cle_dim`, concatenating the
+        # outputs of forward and backward directions. Also pass `batch_first=True`.
+        self._char_rnn = torch.nn.GRU(
+            input_size=args.cle_dim,
+            hidden_size=args.cle_dim,
+            bidirectional=True,
+            batch_first=True,
+        )
+
+        # DO:(tagger_we) Create a `torch.nn.Embedding` layer, embedding the form ids
+        # from `train.forms.word_vocab` to dimensionality `args.we_dim`.
+        self._word_embedding = torch.nn.Embedding(
+            len(train.forms.word_vocab), args.we_dim
+        )
+
+        # DID: Create an RNN layer, either `torch.nn.LSTM` or `torch.nn.GRU` depending
+        # on `args.rnn`. The layer should be bidirectional (`bidirectional=True`), summing
+        # the outputs of forward and backward directions. The layer processes the above
+        # embeddings generated by the `self._word_embedding` layer, **now concatenated
+        # with the character-level embeddings**, and produces output of dimensionality
+        # `args.rnn_dim`; pass `batch_first=True` to the constructor.
+        self._word_rnn = (torch.nn.LSTM if args.rnn == "LSTM" else torch.nn.GRU)(
+            input_size=args.we_dim + 2 * args.cle_dim,
+            hidden_size=args.rnn_dim,
+            bidirectional=True,
+            batch_first=True,
+        )
+
+        # TODO(tagger_we): Create an output linear layer (`torch.nn.Linear`) processing the RNN output,
+        # producing logits for tag prediction; `train.tags.word_vocab` is the tag vocabulary.
+        self._output_layer = torch.nn.Linear(args.rnn_dim, len(train.tags.word_vocab))
+
+        # Initialize the layers using the Keras-inspired initialization. You can try
+        # removing this line to see how much worse the default PyTorch initialization is.
+        self.apply(self.keras_init)
+
+    def forward(
+        self,
+        form_ids: torch.Tensor,
+        unique_forms: torch.Tensor,
+        form_indices: torch.Tensor,
+    ) -> torch.Tensor:
+        # DID: Mask the input `form_ids` using the `self._word_masking` layer.
+        form_ids = self._word_masking(form_ids)
+
+        # DID(tagger_we): Embed the masked `form_ids` using the word embedding layer.
+        hidden = self._word_embedding(form_ids)
+
+        # DID: Embed the `unique_forms` using the character embedding layer.
+        cle = self._char_embedding(unique_forms)
+
+        # DID: Pass the character embeddings through the character-level RNN.
+        # As during word-level RNN, start by packing the input sequence.
+        packed = torch.nn.utils.rnn.pack_padded_sequence(
+            input=cle,
+            lengths=torch.sum(unique_forms != MorphoDataset.PAD, dim=-1),
+            batch_first=True,
+            enforce_sorted=False,
+        )
+
+        # Pass the `PackedSequence` through the character RNN. Note that this time
+        # we are interested only in the second output (the last hidden state of the RNN).
+        _, cle = self._char_rnn(packed)
+
+        forward_pass = cle[0, :, :]
+        backward_pass = cle[1, :, :]
+
+        # DID: Concatenate the states of the forward and backward directions.
+        cle = torch.cat((forward_pass, backward_pass), dim=-1)
+
+        # DID: With `cle` being the character-level embeddings of the unique forms
+        # of shape `[num_unique_forms, 2 * cle_dim]`, create the representation of the
+        # (not necessary unique) sentence forms by indexing the character-level
+        # embeddings with the `form_indices`. The result should have a shape
+        # `[batch_size, max_sentence_length, 2 * cle_dim]`. You can use for example
+        # the `torch.nn.functional.embedding` function.
+        cle = torch.nn.functional.embedding(form_indices, cle)
+
+        # DID: Concatenate the word embeddings with the character-level embeddings (in this order).
+        hidden = torch.cat((hidden, cle), dim=-1)
+
+        # DID(tagger_we): Process the embeddings through the RNN layer. Because the sentences
+        # have different length, you have to use `torch.nn.utils.rnn.pack_padded_sequence`
+        # to construct a variable-length `PackedSequence` from the input. You need to compute
+        # the length of each sentence in the batch (by counting non-`MorphoDataset.PAD` tokens);
+        # note that these lengths must be on CPU, so you might need to use the `.cpu()` method.
+        # Finally, also pass `batch_first=True` and `enforce_sorted=False` to the call.
+        packed = torch.nn.utils.rnn.pack_padded_sequence(
+            input=hidden,
+            lengths=torch.sum(form_ids != MorphoDataset.PAD, dim=-1),
+            batch_first=True,
+            enforce_sorted=False,
+        )
+
+        # Pass the `PackedSequence` through the RNN.
+        hidden, _ = self._word_rnn(packed)
+
+        # DID(tagger_we): Unpack the RNN output using the `torch.nn.utils.rnn.pad_packed_sequence` with
+        # `batch_first=True` argument. Then sum the outputs of forward and backward directions.
+        stacked, _ = torch.nn.utils.rnn.pad_packed_sequence(hidden, batch_first=True)
+
+        forward_pass, backward_pass = torch.chunk(stacked, 2, dim=-1)
+
+        hidden = forward_pass + backward_pass
+
+        # DID(tagger_we): Pass the RNN output through the output layer. Such an output has a shape
+        # `[batch_size, sequence_length, num_tags]`, but the loss and the metric expect
+        # the `num_tags` dimension to be in front (`[batch_size, num_tags, sequence_length]`),
+        # so you need to reorder the dimension.
+
+        hidden = self._output_layer(hidden).permute(0, 2, 1)
+
+        return hidden
+
+
+def main(args: argparse.Namespace) -> dict[str, float]:
+    # Set the random seed and the number of threads.
+    np.random.seed(args.seed)
+    torch.manual_seed(args.seed)
+    if args.threads:
+        torch.set_num_threads(args.threads)
+        torch.set_num_interop_threads(args.threads)
+
+    # Create logdir name
+    args.logdir = os.path.join(
+        "logs",
+        "{}-{}-{}".format(
+            os.path.basename(globals().get("__file__", "notebook")),
+            datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+            ",".join(
+                (
+                    "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v)
+                    for k, v in sorted(vars(args).items())
+                )
+            ),
+        ),
+    )
+
+    # Load the data
+    morpho = MorphoDataset("czech_cac", max_sentences=args.max_sentences)
+
+    # Create the model and train
+    model = Model(args, morpho.train)
+
+    def prepare_tagging_data(example):
+        # TODO(tagger_we): Construct a single example, each consisting of the following pair:
+        # - a PyTorch tensor of integer ids of input forms as input,
+        # - a PyTorch tensor of integer tag ids as targets.
+        # To create the ids, use `word_vocab` of `morpho.train.forms` and `morpho.train.tags`.
+        form_ids = torch.tensor(
+            morpho.train.forms.word_vocab.indices(example["forms"]),
+            dtype=torch.int64,
+        )
+        tag_ids = torch.tensor(
+            morpho.train.tags.word_vocab.indices(example["tags"]),
+            dtype=torch.int64,
+        )
+        # Note that compared to `tagger_we`, we also return the original
+        # forms in order to be able to compute the character-level embeddings.
+        return form_ids, example["forms"], tag_ids
+
+    train = morpho.train.transform(prepare_tagging_data)
+    dev = morpho.dev.transform(prepare_tagging_data)
+
+    def prepare_batch(data):
+        # Construct a single batch, where `data` is a list of examples
+        # generated by `prepare_tagging_data`.
+        form_ids, forms, tag_ids = zip(*data)
+        # TODO(tagger_we): Combine `form_ids` into a single tensor, padding shorter
+        # sequences to length of the longest sequence in the batch with zeros
+        # using `torch.nn.utils.rnn.pad_sequence` with `batch_first=True` argument.
+        form_ids = torch.nn.utils.rnn.pad_sequence(form_ids, batch_first=True)
+        # DID: Create required inputs for the character-level embeddings using
+        # the provided `morpho.train.cle_batch` function on `forms`. The function
+        # returns a pair of two PyTorch tensors:
+        # - `unique_forms` with shape `[num_unique_forms, max_form_length]` containing
+        #   each unique form as a sequence of character ids,
+        # - `forms_indices` with shape `[num_sentences, max_sentence_length]`
+        #   containing for every form its index in `unique_forms`.
+        unique_forms, forms_indices = morpho.train.cle_batch(forms)
+        # TODO(tagger_we): Process `tag_ids` analogously to `form_ids`.
+        tag_ids = torch.nn.utils.rnn.pad_sequence(tag_ids, batch_first=True)
+        return (form_ids, unique_forms, forms_indices), tag_ids
+
+    train = torch.utils.data.DataLoader(
+        train, batch_size=args.batch_size, collate_fn=prepare_batch, shuffle=True
+    )
+    dev = torch.utils.data.DataLoader(
+        dev, batch_size=args.batch_size, collate_fn=prepare_batch
+    )
+
+    model.configure(
+        # TODO(tagger_we): Create the optimizer by creating an instance of
+        # `torch.optim.Adam`which will train the `model.parameters()`.
+        optimizer=torch.optim.Adam(model.parameters()),
+        # TODO(tagger_we): Use `torch.nn.CrossEntropyLoss` to instantiate the loss function.
+        # Pass `ignore_index=morpho.PAD` to the constructor so that the padded
+        # tags are ignored during the loss computation. Note that the loss
+        # expects the input to be of shape `[batch_size, num_tags, sequence_length]`.
+        loss=torch.nn.CrossEntropyLoss(
+            ignore_index=morpho.PAD,
+        ),
+        # TODO(tagger_we): Create a `torchmetrics.Accuracy` metric, passing "multiclass" as
+        # the first argument, `num_classes` set to the number of unique tags, and
+        # again `ignore_index=morpho.PAD` to ignore the padded tags.
+        metrics={
+            "accuracy": torchmetrics.Accuracy(
+                "multiclass",
+                num_classes=len(morpho.train.tags.word_vocab),
+                ignore_index=morpho.PAD,
+            )
+        },
+        logdir=args.logdir,
+        device="cpu",
+    )
+
+    logs = model.fit(train, dev=dev, epochs=args.epochs)
+
+    # Return development metrics for ReCodEx     to validate.
+    return {
+        metric: value for metric, value in logs.items() if metric.startswith("dev_")
+    }
+
+
+if __name__ == "__main__":
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+    main(args)
diff --git a/labs/08/tagger_competition.py b/labs/08/tagger_competition.py
new file mode 100644
index 0000000..ab8b80c
--- /dev/null
+++ b/labs/08/tagger_competition.py
@@ -0,0 +1,297 @@
+#!/usr/bin/env python3
+import argparse
+import datetime
+import os
+import re
+
+import numpy as np
+import torch
+import torchmetrics
+
+from morpho_analyzer import MorphoAnalyzer
+from morpho_dataset import MorphoDataset
+
+from tagger_cle1 import Model
+# from tagger_model import Model
+# TODO: Always use masking!!!
+
+# TODO: Define reasonable defaults and optionally more parameters.
+# Also, you can set the number of threads to 0 to use all your CPU cores.
+parser = argparse.ArgumentParser()
+parser.add_argument("--batch_size", default=64, type=int, help="Batch size.")
+parser.add_argument("--cle_dim", default=32, type=int, help="CLE embedding dimension.")
+parser.add_argument("--epochs", default=3, type=int, help="Number of epochs.")
+parser.add_argument("--rnn", default="LSTM", choices=["LSTM", "GRU"], help="RNN layer type.")
+parser.add_argument("--rnn_dim", default=64, type=int, help="RNN layer dimension.")
+parser.add_argument("--seed", default=42, type=int, help="Random seed.")
+parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.")
+parser.add_argument("--we_dim", default=64, type=int, help="Word embedding dimension.")
+parser.add_argument("--word_masking", default=0.05, type=float, help="Mask words with the given probability.")
+
+
+class TrainableModule(torch.nn.Module):
+    """A simple Keras-like module for training with raw PyTorch.
+
+    The module provides fit/evaluate/predict methods, computes loss and metrics,
+    and generates both TensorBoard and console logs. By default, it uses GPU
+    if available, and CPU otherwise. Additionally, it offers a Keras-like
+    initialization of the weights.
+
+    The current implementation supports models with either single input or
+    a tuple of inputs; however, only one output is currently supported.
+    """
+    from torch.utils.tensorboard import SummaryWriter as _SummaryWriter
+    from time import time as _time
+    from tqdm import tqdm as _tqdm
+
+    def configure(self, *, optimizer=None, schedule=None, loss=None, metrics={}, logdir=None, device="auto"):
+        """Configure the module process.
+
+        - `optimizer` is the optimizer to use for training;
+        - `schedule` is an optional learning rate scheduler used after every batch;
+        - `loss` is the loss function to minimize;
+        - `metrics` is a dictionary of additional metrics to compute;
+        - `logdir` is an optional directory where TensorBoard logs should be written;
+        - `device` is the device to use; when "auto", `cuda` is used when available, `cpu` otherwise.
+        """
+        self.optimizer = optimizer
+        self.schedule = schedule
+        self.loss, self.loss_metric = loss, torchmetrics.MeanMetric()
+        self.metrics = torchmetrics.MetricCollection(metrics)
+        self.logdir, self._writers = logdir, {}
+        self.device = torch.device(("cuda" if torch.cuda.is_available() else "cpu") if device == "auto" else device)
+        self.to(self.device)
+
+    def load_weights(self, path, device="auto"):
+        """Load the model weights from the given path."""
+        self.device = torch.device(("cuda" if torch.cuda.is_available() else "cpu") if device == "auto" else device)
+        self.load_state_dict(torch.load(path, map_location=self.device))
+
+    def save_weights(self, path):
+        """Save the model weights to the given path."""
+        state_dict = self.state_dict()
+        torch.save(state_dict, path)
+
+    def fit(self, dataloader, epochs, dev=None, callbacks=[], verbose=1):
+        """Train the model on the given dataset.
+
+        - `dataloader` is the training dataset, each element a pair of inputs and an output;
+          the inputs can be either a single tensor or a tuple of tensors;
+        - `dev` is an optional development dataset;
+        - `epochs` is the number of epochs to train;
+        - `callbacks` is a list of callbacks to call after each epoch with
+          arguments `self`, `epoch`, and `logs`;
+        - `verbose` controls the verbosity: 0 for silent, 1 for persistent progress bar,
+          2 for a progress bar only when writing to a console.
+        """
+        for epoch in range(epochs):
+            self.train()
+            self.loss_metric.reset()
+            self.metrics.reset()
+            start = self._time()
+            epoch_message = f"Epoch={epoch+1}/{epochs}"
+            data_and_progress = self._tqdm(
+                dataloader, epoch_message, unit="batch", leave=False, disable=None if verbose == 2 else not verbose)
+            for xs, y in data_and_progress:
+                xs, y = tuple(x.to(self.device) for x in (xs if isinstance(xs, tuple) else (xs,))), y.to(self.device)
+                logs = self.train_step(xs, y)
+                message = [epoch_message] + [f"{k}={v:.{0<abs(v)<2e-4 and '3g' or '4f'}}" for k, v in logs.items()]
+                data_and_progress.set_description(" ".join(message), refresh=False)
+            if dev is not None:
+                logs |= {"dev_" + k: v for k, v in self.evaluate(dev, verbose=0).items()}
+            for callback in callbacks:
+                callback(self, epoch, logs)
+            self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("dev_")}, epoch + 1)
+            self.add_logs("dev", {k[4:]: v for k, v in logs.items() if k.startswith("dev_")}, epoch + 1)
+            verbose and print(epoch_message, "{:.1f}s".format(self._time() - start),
+                              *[f"{k}={v:.{0<abs(v)<2e-4 and '3g' or '4f'}}" for k, v in logs.items()])
+        return logs
+
+    def train_step(self, xs, y):
+        """An overridable method performing a single training step.
+
+        A dictionary with the loss and metrics should be returned."""
+        self.zero_grad()
+        y_pred = self.forward(*xs)
+        loss = self.compute_loss(y_pred, y, *xs)
+        loss.backward()
+        with torch.no_grad():
+            self.optimizer.step()
+            self.schedule is not None and self.schedule.step()
+            self.loss_metric.update(loss)
+            return {"loss": self.loss_metric.compute()} \
+                | ({"lr": self.schedule.get_last_lr()[0]} if self.schedule else {}) \
+                | self.compute_metrics(y_pred, y, *xs, training=True)
+
+    def compute_loss(self, y_pred, y, *xs):
+        """Compute the loss of the model given the inputs, predictions, and target outputs."""
+        return self.loss(y_pred, y)
+
+    def compute_metrics(self, y_pred, y, *xs, training):
+        """Compute and return metrics given the inputs, predictions, and target outputs."""
+        self.metrics.update(y_pred, y)
+        return self.metrics.compute()
+
+    def evaluate(self, dataloader, verbose=1):
+        """An evaluation of the model on the given dataset.
+
+        - `dataloader` is the dataset to evaluate on, each element a pair of inputs
+          and an output, the inputs either a single tensor or a tuple of tensors;
+        - `verbose` controls the verbosity: 0 for silent, 1 for a single message."""
+        self.eval()
+        self.loss_metric.reset()
+        self.metrics.reset()
+        for xs, y in dataloader:
+            xs, y = tuple(x.to(self.device) for x in (xs if isinstance(xs, tuple) else (xs,))), y.to(self.device)
+            logs = self.test_step(xs, y)
+        verbose and print("Evaluation", *[f"{k}={v:.{0<abs(v)<2e-4 and '3g' or '4f'}}" for k, v in logs.items()])
+        return logs
+
+    def test_step(self, xs, y):
+        """An overridable method performing a single evaluation step.
+
+        A dictionary with the loss and metrics should be returned."""
+        with torch.no_grad():
+            y_pred = self.forward(*xs)
+            self.loss_metric.update(self.compute_loss(y_pred, y, *xs))
+            return {"loss": self.loss_metric.compute()} | self.compute_metrics(y_pred, y, *xs, training=False)
+
+    def predict(self, dataloader, as_numpy=True):
+        """Compute predictions for the given dataset.
+
+        - `dataloader` is the dataset to predict on, each element either
+          directly the input or a tuple whose first element is the input;
+          the input can be either a single tensor or a tuple of tensors;
+        - `as_numpy` is a flag controlling whether the output should be
+          converted to a numpy array or kept as a PyTorch tensor.
+
+        The method returns a Python list whose elements are predictions
+        of the individual examples. Note that if the input was padded, so
+        will be the predictions, which will then need to be trimmed."""
+        self.eval()
+        predictions = []
+        for batch in dataloader:
+            xs = batch[0] if isinstance(batch, tuple) else batch
+            xs = tuple(x.to(self.device) for x in (xs if isinstance(xs, tuple) else (xs,)))
+            predictions.extend(self.predict_step(xs, as_numpy=as_numpy))
+        return predictions
+
+    def predict_step(self, xs, as_numpy=True):
+        """An overridable method performing a single prediction step."""
+        with torch.no_grad():
+            batch = self.forward(*xs)
+            return batch.numpy(force=True) if as_numpy else batch
+
+    def writer(self, writer):
+        """Possibly create and return a TensorBoard writer for the given name."""
+        if writer not in self._writers:
+            self._writers[writer] = self._SummaryWriter(os.path.join(self.logdir, writer))
+        return self._writers[writer]
+
+    def add_logs(self, writer, logs, step):
+        """Log the given dictionary to TensorBoard with a given name and step number."""
+        if logs and self.logdir:
+            for key, value in logs.items():
+                self.writer(writer).add_scalar(key, value, step)
+            self.writer(writer).flush()
+
+    @staticmethod
+    def keras_init(module):
+        """Initialize weights using the Keras defaults."""
+        if isinstance(module, (torch.nn.Linear, torch.nn.Conv1d, torch.nn.Conv2d, torch.nn.Conv3d,
+                               torch.nn.ConvTranspose1d, torch.nn.ConvTranspose2d, torch.nn.ConvTranspose3d)):
+            torch.nn.init.xavier_uniform_(module.weight)
+            if module.bias is not None:
+                torch.nn.init.zeros_(module.bias)
+        if isinstance(module, (torch.nn.Embedding, torch.nn.EmbeddingBag)):
+            torch.nn.init.uniform_(module.weight, -0.05, 0.05)
+        if isinstance(module, (torch.nn.RNNBase, torch.nn.RNNCellBase)):
+            for name, parameter in module.named_parameters():
+                "weight_ih" in name and torch.nn.init.xavier_uniform_(parameter)
+                "weight_hh" in name and torch.nn.init.orthogonal_(parameter)
+                "bias" in name and torch.nn.init.zeros_(parameter)
+                if "bias" in name and isinstance(module, (torch.nn.LSTM, torch.nn.LSTMCell)):
+                    parameter.data[module.hidden_size:module.hidden_size * 2] = 1
+
+
+def main(args: argparse.Namespace) -> None:
+    # Set the random seed and the number of threads.
+    np.random.seed(args.seed)
+    torch.manual_seed(args.seed)
+    if args.threads:
+        torch.set_num_threads(args.threads)
+        torch.set_num_interop_threads(args.threads)
+
+    # Create logdir name
+    args.logdir = os.path.join("logs", "{}-{}-{}".format(
+        os.path.basename(globals().get("__file__", "notebook")),
+        datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+        ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items())))
+    ))
+
+    # Load the data. Using analyses is only optional.
+    morpho = MorphoDataset("czech_pdt")
+    analyses = MorphoAnalyzer("czech_pdt_analyses")
+
+    # TODO: Create the model and train it
+    model = Model(args, morpho.train)
+
+    def prepare_tagging_data(example):
+        form_ids = torch.tensor(data=morpho.train.forms.word_vocab.indices(example["forms"]), dtype=torch.int64)
+        tag_ids = torch.tensor(data=morpho.train.tags.word_vocab.indices(example["tags"]), dtype=torch.int64)
+        return form_ids, example["forms"], tag_ids
+    train = morpho.train.transform(prepare_tagging_data)
+    dev = morpho.dev.transform(prepare_tagging_data)
+
+    # Create a function that prepares test data
+    def prepare_testing_data(example):
+        form_ids = torch.tensor(data=morpho.test.forms.word_vocab.indices(example["forms"]), dtype=torch.int64)
+        return form_ids, example["forms"]
+    test = morpho.test.transform(prepare_testing_data)
+
+    def prepare_batch(data):
+        form_ids, forms, tag_ids = zip(*data)
+        form_ids = torch.nn.utils.rnn.pad_sequence(sequences=form_ids, batch_first=True)
+        unique_forms, forms_indices = morpho.train.cle_batch(forms)
+        tag_ids = torch.nn.utils.rnn.pad_sequence(sequences=tag_ids, batch_first=True)
+        return (form_ids, unique_forms, forms_indices), tag_ids
+    train = torch.utils.data.DataLoader(train, batch_size=args.batch_size, collate_fn=prepare_batch, shuffle=True)
+    dev = torch.utils.data.DataLoader(dev, batch_size=args.batch_size, collate_fn=prepare_batch)
+
+    # Create a function that creates test batches
+    def prepare_test_batch(data):
+        form_ids, forms, tag_ids = zip(*data)
+        form_ids = torch.nn.utils.rnn.pad_sequence(sequences=form_ids, batch_first=True)
+        unique_forms, forms_indices = morpho.train.cle_batch(forms)
+        return (form_ids, unique_forms, forms_indices)
+    test = torch.utils.data.DataLoader(test, batch_size=args.batch_size, collate_fn=prepare_test_batch)
+
+    model.configure(
+        optimizer=torch.optim.Adam(model.parameters()),
+        loss=torch.nn.CrossEntropyLoss(ignore_index=morpho.PAD),
+        metrics={"accuracy": torchmetrics.Accuracy(task="multiclass", num_classes=len(morpho.train.tags.word_vocab), ignore_index=morpho.PAD)},
+        logdir=args.logdir,
+    )
+
+    model.fit(train, dev=dev, epochs=args.epochs)
+
+    # Save model
+
+    model.save_weights(os.path.join(args.logdir, "model.pth"))
+
+    # Generate test set annotations, but in `args.logdir` to allow parallel execution.
+    os.makedirs(args.logdir, exist_ok=True)
+    with open(os.path.join(args.logdir, "tagger_competition.txt"), "w", encoding="utf-8") as predictions_file:
+        # TODO: Predict the tags on the test set; update the following code
+        # if you use other output structure than in tagger_we.
+        predictions = model.predict(test)
+
+        for predicted_tags, forms in zip(predictions, morpho.test.forms.strings):
+            for predicted_tag in np.argmax(predicted_tags[:, :len(forms)], axis=0):
+                print(morpho.train.tags.word_vocab.string(predicted_tag), file=predictions_file)
+            print(file=predictions_file)
+
+
+if __name__ == "__main__":
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+    main(args)
diff --git a/labs/08/tagger_we.ps1 b/labs/08/tagger_we.ps1
new file mode 100644
index 0000000..de55694
--- /dev/null
+++ b/labs/08/tagger_we.ps1
@@ -0,0 +1,10 @@
+"👉 TEST 1"
+" Expected: Epoch=1/1 3.1s loss=2.3541 accuracy=0.3138 dev_loss=2.0320 dev_accuracy=0.3611"
+# Actual: Epoch=1/1 3.6s loss=2.3641 accuracy=0.2857 dev_loss=2.0174 dev_accuracy=0.3669
+python ./labs/08/tagger_we.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16
+
+"👉 TEST 2"
+" Expected: Epoch=1/1 3.2s loss=2.1970 accuracy=0.4233 dev_loss=1.5569 dev_accuracy=0.5121"
+# Actual: Epoch=1/1 3.5s loss=2.2395 accuracy=0.4611 dev_loss=1.5898 dev_accuracy=0.5481
+python ./labs/08/tagger_we.py --epochs=1 --max_sentences=1000 --rnn=GRU --rnn_dim=16
+
diff --git a/labs/08/tagger_we.py b/labs/08/tagger_we.py
new file mode 100644
index 0000000..f1b48ae
--- /dev/null
+++ b/labs/08/tagger_we.py
@@ -0,0 +1,468 @@
+#!/usr/bin/env python3
+import argparse
+import datetime
+import os
+import re
+
+import numpy as np
+import torch
+import torchmetrics
+
+from morpho_dataset import MorphoDataset
+
+parser = argparse.ArgumentParser()
+# These arguments will be set appropriately by ReCodEx, even if you change them.
+parser.add_argument("--batch_size", default=10, type=int, help="Batch size.")
+parser.add_argument("--epochs", default=5, type=int, help="Number of epochs.")
+parser.add_argument(
+    "--max_sentences",
+    default=None,
+    type=int,
+    help="Maximum number of sentences to load.",
+)
+parser.add_argument(
+    "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx."
+)
+parser.add_argument(
+    "--rnn", default="LSTM", choices=["LSTM", "GRU"], help="RNN layer type."
+)
+parser.add_argument("--rnn_dim", default=64, type=int, help="RNN layer dimension.")
+parser.add_argument("--seed", default=42, type=int, help="Random seed.")
+parser.add_argument(
+    "--threads", default=1, type=int, help="Maximum number of threads to use."
+)
+parser.add_argument("--we_dim", default=128, type=int, help="Word embedding dimension.")
+# If you add more arguments, ReCodEx will keep them with your default values.
+
+
+class TrainableModule(torch.nn.Module):
+    """A simple Keras-like module for training with raw PyTorch.
+
+    The module provides fit/evaluate/predict methods, computes loss and metrics,
+    and generates both TensorBoard and console logs. By default, it uses GPU
+    if available, and CPU otherwise. Additionally, it offers a Keras-like
+    initialization of the weights.
+
+    The current implementation supports models with either single input or
+    a tuple of inputs; however, only one output is currently supported.
+    """
+
+    from torch.utils.tensorboard import SummaryWriter as _SummaryWriter
+    from time import time as _time
+    from tqdm import tqdm as _tqdm
+
+    def configure(
+        self,
+        *,
+        optimizer=None,
+        schedule=None,
+        loss=None,
+        metrics={},
+        logdir=None,
+        device="auto",
+    ):
+        """Configure the module process.
+
+        - `optimizer` is the optimizer to use for training;
+        - `schedule` is an optional learning rate scheduler used after every batch;
+        - `loss` is the loss function to minimize;
+        - `metrics` is a dictionary of additional metrics to compute;
+        - `logdir` is an optional directory where TensorBoard logs should be written;
+        - `device` is the device to use; when "auto", `cuda` is used when available, `cpu` otherwise.
+        """
+        self.optimizer = optimizer
+        self.schedule = schedule
+        self.loss, self.loss_metric = loss, torchmetrics.MeanMetric()
+        self.metrics = torchmetrics.MetricCollection(metrics)
+        self.logdir, self._writers = logdir, {}
+        self.device = torch.device(
+            ("cuda" if torch.cuda.is_available() else "cpu")
+            if device == "auto"
+            else device
+        )
+        self.to(self.device)
+
+    def load_weights(self, path, device="auto"):
+        """Load the model weights from the given path."""
+        self.device = torch.device(
+            ("cuda" if torch.cuda.is_available() else "cpu")
+            if device == "auto"
+            else device
+        )
+        self.load_state_dict(torch.load(path, map_location=self.device))
+
+    def save_weights(self, path):
+        """Save the model weights to the given path."""
+        state_dict = self.state_dict()
+        torch.save(state_dict, path)
+
+    def fit(self, dataloader, epochs, dev=None, callbacks=[], verbose=1):
+        """Train the model on the given dataset.
+
+        - `dataloader` is the training dataset, each element a pair of inputs and an output;
+          the inputs can be either a single tensor or a tuple of tensors;
+        - `dev` is an optional development dataset;
+        - `epochs` is the number of epochs to train;
+        - `callbacks` is a list of callbacks to call after each epoch with
+          arguments `self`, `epoch`, and `logs`;
+        - `verbose` controls the verbosity: 0 for silent, 1 for persistent progress bar,
+          2 for a progress bar only when writing to a console.
+        """
+        for epoch in range(epochs):
+            self.train()
+            self.loss_metric.reset()
+            self.metrics.reset()
+            start = self._time()
+            epoch_message = f"Epoch={epoch+1}/{epochs}"
+            data_and_progress = self._tqdm(
+                dataloader,
+                epoch_message,
+                unit="batch",
+                leave=False,
+                disable=None if verbose == 2 else not verbose,
+            )
+            for xs, y in data_and_progress:
+                xs, y = (
+                    tuple(
+                        x.to(self.device)
+                        for x in (xs if isinstance(xs, tuple) else (xs,))
+                    ),
+                    y.to(self.device),
+                )
+                logs = self.train_step(xs, y)
+                message = [epoch_message] + [
+                    f"{k}={v:.{0<abs(v)<2e-4 and '3g' or '4f'}}"
+                    for k, v in logs.items()
+                ]
+                data_and_progress.set_description(" ".join(message), refresh=False)
+            if dev is not None:
+                logs |= {
+                    "dev_" + k: v for k, v in self.evaluate(dev, verbose=0).items()
+                }
+            for callback in callbacks:
+                callback(self, epoch, logs)
+            self.add_logs(
+                "train",
+                {k: v for k, v in logs.items() if not k.startswith("dev_")},
+                epoch + 1,
+            )
+            self.add_logs(
+                "dev",
+                {k[4:]: v for k, v in logs.items() if k.startswith("dev_")},
+                epoch + 1,
+            )
+            verbose and print(
+                epoch_message,
+                "{:.1f}s".format(self._time() - start),
+                *[
+                    f"{k}={v:.{0<abs(v)<2e-4 and '3g' or '4f'}}"
+                    for k, v in logs.items()
+                ],
+            )
+        return logs
+
+    def train_step(self, xs, y):
+        """An overridable method performing a single training step.
+
+        A dictionary with the loss and metrics should be returned."""
+        self.zero_grad()
+        y_pred = self.forward(*xs)
+        loss = self.loss(y_pred, y)
+        loss.backward()
+        with torch.no_grad():
+            self.optimizer.step()
+            self.schedule is not None and self.schedule.step()
+            self.loss_metric.update(loss)
+            self.metrics.update(y_pred, y)
+            return (
+                {"loss": self.loss_metric.compute()}
+                | ({"lr": self.schedule.get_last_lr()[0]} if self.schedule else {})
+                | self.metrics.compute()
+            )
+
+    def evaluate(self, dataloader, verbose=1):
+        """An evaluation of the model on the given dataset.
+
+        - `dataloader` is the dataset to evaluate on, each element a pair of inputs
+          and an output, the inputs either a single tensor or a tuple of tensors;
+        - `verbose` controls the verbosity: 0 for silent, 1 for a single message."""
+        self.eval()
+        self.loss_metric.reset()
+        self.metrics.reset()
+        for xs, y in dataloader:
+            xs, y = (
+                tuple(
+                    x.to(self.device) for x in (xs if isinstance(xs, tuple) else (xs,))
+                ),
+                y.to(self.device),
+            )
+            logs = self.test_step(xs, y)
+        verbose and print(
+            "Evaluation",
+            *[f"{k}={v:.{0<abs(v)<2e-4 and '3g' or '4f'}}" for k, v in logs.items()],
+        )
+        return logs
+
+    def test_step(self, xs, y):
+        """An overridable method performing a single evaluation step.
+
+        A dictionary with the loss and metrics should be returned."""
+        with torch.no_grad():
+            y_pred = self.forward(*xs)
+            self.loss_metric.update(self.loss(y_pred, y))
+            self.metrics.update(y_pred, y)
+            return {"loss": self.loss_metric.compute()} | self.metrics.compute()
+
+    def predict(self, dataloader, as_numpy=True):
+        """Compute predictions for the given dataset.
+
+        - `dataloader` is the dataset to predict on, each element either
+          directly the input or a tuple whose first element is the input;
+          the input can be either a single tensor or a tuple of tensors;
+        - `as_numpy` is a flag controlling whether the output should be
+          converted to a numpy array or kept as a PyTorch tensor.
+
+        The method returns a Python list whose elements are predictions
+        of the individual examples. Note that if the input was padded, so
+        will be the predictions, which will then need to be trimmed."""
+        self.eval()
+        predictions = []
+        for batch in dataloader:
+            xs = batch[0] if isinstance(batch, tuple) else batch
+            xs = tuple(
+                x.to(self.device) for x in (xs if isinstance(xs, tuple) else (xs,))
+            )
+            batch = self.predict_step(xs)
+            predictions.extend(batch.numpy(force=True) if as_numpy else batch)
+        return predictions
+
+    def predict_step(self, xs):
+        """An overridable method performing a single prediction step."""
+        with torch.no_grad():
+            return self.forward(*xs)
+
+    def writer(self, writer):
+        """Possibly create and return a TensorBoard writer for the given name."""
+        if writer not in self._writers:
+            self._writers[writer] = self._SummaryWriter(
+                os.path.join(self.logdir, writer)
+            )
+        return self._writers[writer]
+
+    def add_logs(self, writer, logs, step):
+        """Log the given dictionary to TensorBoard with a given name and step number."""
+        if logs and self.logdir:
+            for key, value in logs.items():
+                self.writer(writer).add_scalar(key, value, step)
+            self.writer(writer).flush()
+
+    @staticmethod
+    def keras_init(module):
+        """Initialize weights using the Keras defaults."""
+        if isinstance(
+            module,
+            (
+                torch.nn.Linear,
+                torch.nn.Conv1d,
+                torch.nn.Conv2d,
+                torch.nn.Conv3d,
+                torch.nn.ConvTranspose1d,
+                torch.nn.ConvTranspose2d,
+                torch.nn.ConvTranspose3d,
+            ),
+        ):
+            torch.nn.init.xavier_uniform_(module.weight)
+            if module.bias is not None:
+                torch.nn.init.zeros_(module.bias)
+        if isinstance(module, (torch.nn.Embedding, torch.nn.EmbeddingBag)):
+            torch.nn.init.uniform_(module.weight, -0.05, 0.05)
+        if isinstance(module, (torch.nn.RNNBase, torch.nn.RNNCellBase)):
+            for name, parameter in module.named_parameters():
+                "weight_ih" in name and torch.nn.init.xavier_uniform_(parameter)
+                "weight_hh" in name and torch.nn.init.orthogonal_(parameter)
+                "bias" in name and torch.nn.init.zeros_(parameter)
+                if "bias" in name and isinstance(
+                    module, (torch.nn.LSTM, torch.nn.LSTMCell)
+                ):
+                    parameter.data[module.hidden_size : module.hidden_size * 2] = 1
+
+
+class Model(TrainableModule):
+    def __init__(self, args: argparse.Namespace, train: MorphoDataset.Dataset) -> None:
+        super().__init__()
+
+        # Create all needed layers.
+        # DO: Create a `torch.nn.Embedding` layer, embedding the form ids
+        # from `train.forms.word_vocab` to dimensionality `args.we_dim`.
+        self._word_embedding = torch.nn.Embedding(
+            num_embeddings=len(train.forms.word_vocab), embedding_dim=args.we_dim
+        )
+
+        # DO: Create an RNN layer, either `torch.nn.LSTM` or `torch.nn.GRU` depending
+        # on `args.rnn`. The layer should be bidirectional (`bidirectional=True`), summing
+        # the outputs of forward and backward directions. The layer processes the word
+        # embeddings generated by the `self._word_embedding` layer and produces output
+        # of dimensionality `args.rnn_dim`. Finally, pass `batch_first=True` to the constructor.
+        self._word_rnn = (
+            torch.nn.LSTM(
+                input_size=args.we_dim,
+                hidden_size=args.rnn_dim,
+                bidirectional=True,
+                batch_first=True,
+            )
+            if args.rnn == "LSTM"
+            else torch.nn.GRU(
+                input_size=args.we_dim,
+                hidden_size=args.rnn_dim,
+                bidirectional=True,
+                batch_first=True,
+            )
+        )
+
+        # DO: Create an output linear layer (`torch.nn.Linear`) processing the RNN output,
+        # producing logits for tag prediction; `train.tags.word_vocab` is the tag vocabulary.
+        self._output_layer = torch.nn.Linear(args.rnn_dim, len(train.tags.word_vocab))
+
+        # Initialize the layers using the Keras-inspired initialization. You can try
+        # removing this line to see how much worse the default PyTorch initialization is.
+        self.apply(self.keras_init)
+
+    def forward(self, form_ids: torch.Tensor) -> torch.Tensor:
+        # DO: Start by embedding the `form_ids` using the word embedding layer.
+        hidden = self._word_embedding(form_ids)
+
+        # DO: Process the embedded forms through the RNN layer. Because the sentences
+        # have different length, you have to use `torch.nn.utils.rnn.pack_padded_sequence`
+        # to construct a variable-length `PackedSequence` from the input. You need to compute
+        # the length of each sentence in the batch (by counting non-`MorphoDataset.PAD` tokens);
+        # note that these lengths must be on CPU, so you might need to use the `.cpu()` method.
+        # Finally, also pass `batch_first=True` and `enforce_sorted=False` to the call.
+        packed = torch.nn.utils.rnn.pack_padded_sequence(
+            hidden,
+            form_ids.ne(MorphoDataset.PAD).sum(dim=1).cpu(),
+            batch_first=True,
+            enforce_sorted=False,
+        )
+
+        # Pass the `PackedSequence` through the RNN.
+        hidden, _ = self._word_rnn(packed)
+
+        # DO: Unpack the RNN output using the `torch.nn.utils.rnn.pad_packed_sequence` with
+        # `batch_first=True` argument. Then sum the outputs of forward and backward directions.
+        stacked, _ = torch.nn.utils.rnn.pad_packed_sequence(hidden, batch_first=True)
+
+        forward_pass, backward_pass = torch.chunk(stacked, 2, dim=-1)
+
+        hidden = forward_pass + backward_pass
+
+        # DO: Pass the RNN output through the output layer. Such an output has a shape
+        # `[batch_size, sequence_length, num_tags]`, but the loss and the metric expect
+        # the `num_tags` dimension to be in front (`[batch_size, num_tags, sequence_length]`),
+        # so you need to reorder the dimension.
+        hidden = self._output_layer(hidden).permute(0, 2, 1)
+
+        return hidden
+
+
+def main(args: argparse.Namespace) -> dict[str, float]:
+    # Set the random seed and the number of threads.
+    np.random.seed(args.seed)
+    torch.manual_seed(args.seed)
+    if args.threads:
+        torch.set_num_threads(args.threads)
+        torch.set_num_interop_threads(args.threads)
+
+    # Create logdir name
+    args.logdir = os.path.join(
+        "logs",
+        "{}-{}-{}".format(
+            os.path.basename(globals().get("__file__", "notebook")),
+            datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"),
+            ",".join(
+                (
+                    "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v)
+                    for k, v in sorted(vars(args).items())
+                )
+            ),
+        ),
+    )
+
+    # Load the data
+    morpho = MorphoDataset("czech_cac", max_sentences=args.max_sentences)
+
+    # Create the model and train
+    model = Model(args, morpho.train)
+
+    def prepare_tagging_data(example):
+        # DO: Construct a single example, each consisting of the following pair:
+        # - a PyTorch tensor of integer ids of input forms as input,
+        # - a PyTorch tensor of integer tag ids as targets.
+        # To create the ids, use `word_vocab` of `morpho.train.forms` and `morpho.train.tags`.
+
+        form_ids = torch.tensor(
+            morpho.train.forms.word_vocab.indices(example["forms"]),
+            dtype=torch.int64,
+        )
+        tag_ids = torch.tensor(
+            morpho.train.tags.word_vocab.indices(example["tags"]),
+            dtype=torch.int64,
+        )
+        return form_ids, tag_ids
+
+    train = morpho.train.transform(prepare_tagging_data)
+    dev = morpho.dev.transform(prepare_tagging_data)
+
+    def prepare_batch(data):
+        # Construct a single batch, where `data` is a list of examples
+        # generated by `prepare_tagging_data`.
+        form_ids, tag_ids = zip(*data)
+        # DO: Combine `form_ids` into a single tensor, padding shorter
+        # sequences to length of the longest sequence in the batch with zeros
+        # using `torch.nn.utils.rnn.pad_sequence` with `batch_first=True` argument.
+        form_ids = torch.nn.utils.rnn.pad_sequence(form_ids, batch_first=True)
+        # DO: Process `tag_ids` analogously to `form_ids`.
+        tag_ids = torch.nn.utils.rnn.pad_sequence(tag_ids, batch_first=True)
+        return form_ids, tag_ids
+
+    train = torch.utils.data.DataLoader(
+        train, batch_size=args.batch_size, collate_fn=prepare_batch, shuffle=True
+    )
+    dev = torch.utils.data.DataLoader(
+        dev, batch_size=args.batch_size, collate_fn=prepare_batch
+    )
+
+    model.configure(
+        # DO: Create the optimizer by creating an instance of
+        # `torch.optim.Adam`which will train the `model.parameters()`.
+        optimizer=torch.optim.Adam(model.parameters()),
+        # DO: Use `torch.nn.CrossEntropyLoss` to instantiate the loss function.
+        # Pass `ignore_index=morpho.PAD` to the constructor so that the padded
+        # tags are ignored during the loss computation. Note that the loss
+        # expects the input to be of shape `[batch_size, num_tags, sequence_length]`.
+        loss=torch.nn.CrossEntropyLoss(
+            ignore_index=morpho.PAD,
+        ),
+        # DO: Create a `torchmetrics.Accuracy` metric, passing "multiclass" as
+        # the first argument, `num_classes` set to the number of unique tags, and
+        # again `ignore_index=morpho.PAD` to ignore the padded tags.
+        metrics={
+            "accuracy": torchmetrics.Accuracy(
+                "multiclass",
+                num_classes=len(morpho.train.tags.word_vocab),
+                ignore_index=morpho.PAD,
+            )
+        },
+        logdir=args.logdir,
+    )
+
+    logs = model.fit(train, dev=dev, epochs=args.epochs)
+
+    # Return development metrics for ReCodEx to validate.
+    return {
+        metric: value for metric, value in logs.items() if metric.startswith("dev_")
+    }
+
+
+if __name__ == "__main__":
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+    main(args)
diff --git a/labs/09/.gitignore b/labs/09/.gitignore
new file mode 100644
index 0000000..426a1aa
--- /dev/null
+++ b/labs/09/.gitignore
@@ -0,0 +1,2 @@
+/cs_lemma_20k/
+/en_lemma_20k/
diff --git a/labs/09/common_voice_cs.py b/labs/09/common_voice_cs.py
new file mode 100644
index 0000000..e90600e
--- /dev/null
+++ b/labs/09/common_voice_cs.py
@@ -0,0 +1,243 @@
+import array
+import os
+import struct
+import sys
+from typing import Any, Callable, Sequence, TextIO, TypedDict
+import urllib.request
+
+import numpy as np
+import torch
+import torchaudio
+import torchmetrics
+
+
+# A class for managing mapping between strings and indices.
+# It provides:
+# - `__len__`: number of strings in the vocabulary
+# - `string(index: int) -> str`: string for a given index to the vocabulary
+# - `strings(indices: Sequence[int]) -> list[str]`: list of strings for given indices
+# - `index(string: str) -> int`: index of a given string in the vocabulary
+# - `indices(strings: Sequence[str]) -> list[int]`: list of indices for given strings
+class Vocabulary:
+    def __init__(self, strings: Sequence[str]) -> None:
+        self._strings = list(strings)
+        self._string_map = {string: index for index, string in enumerate(self._strings)}
+
+    def __len__(self) -> int:
+        return len(self._strings)
+
+    def string(self, index: int) -> str:
+        return self._strings[index]
+
+    def strings(self, indices: Sequence[int]) -> list[str]:
+        return [self._strings[index] for index in indices]
+
+    def index(self, string: str) -> int:
+        return self._string_map[string]
+
+    def indices(self, strings: Sequence[str]) -> list[int]:
+        return [self._string_map[string] for string in strings]
+
+
+class CommonVoiceCs:
+    MFCC_DIM: int = 13
+
+    LETTERS: list[str] = [
+        " ", "a", "á", "ä", "b", "c", "č", "d", "ď", "e", "é", "è", "ě",
+        "f", "g", "h", "i", "í", "ï", "j", "k", "l", "m", "n", "ň", "o",
+        "ó", "ö", "p", "q", "r", "ř", "s", "š", "t", "ť", "u", "ú", "ů",
+        "ü", "v", "w", "x", "y", "ý", "z", "ž",
+    ]
+
+    Element = TypedDict("Element", {"mfccs": torch.Tensor, "sentence": str})
+
+    _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/"
+
+    class Dataset(torch.utils.data.Dataset):
+        def __init__(self, path: str, size: int, decode_on_demand: bool) -> None:
+            self._size = size
+
+            arrays, indices = CommonVoiceCs._load_data(path, size)
+            if decode_on_demand:
+                self._data, self._arrays, self._indices = None, arrays, indices
+            else:
+                self._data = [self._decode(arrays, indices, i) for i in range(size)]
+
+        def __len__(self) -> int:
+            return self._size
+
+        def __getitem__(self, index: int) -> "CommonVoiceCs.Element":
+            if self._data:
+                return self._data[index]
+            return self._decode(self._arrays, self._indices, index)
+
+        def transform(self, transform: Callable[["CommonVoiceCs.Element"], Any]) -> "CommonVoiceCs.TransformedDataset":
+            return CommonVoiceCs.TransformedDataset(self, transform)
+
+        def _decode(self, data: dict, indices: dict, index: int) -> "CommonVoiceCs.Element":
+            return {
+                "mfccs": torch.frombuffer(
+                    data["mfccs"], dtype=torch.float32, offset=indices["mfccs"][:-1][index],
+                    count=indices["mfccs"][1:][index] - indices["mfccs"][:-1][index]).view(-1, CommonVoiceCs.MFCC_DIM),
+                "sentence": data["sentence"][
+                    indices["sentence"][index]:indices["sentence"][index + 1]].tobytes().decode("utf-8"),
+            }
+
+    class TransformedDataset(torch.utils.data.Dataset):
+        def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None:
+            self._dataset = dataset
+            self._transform = transform
+
+        def __len__(self) -> int:
+            return len(self._dataset)
+
+        def __getitem__(self, index: int) -> Any:
+            item = self._dataset[index]
+            return self._transform(*item) if isinstance(item, tuple) else self._transform(item)
+
+        def transform(self, transform: Callable[..., Any]) -> "CommonVoiceCs.TransformedDataset":
+            return CommonVoiceCs.TransformedDataset(self, transform)
+
+    def __init__(self, decode_on_demand: bool = False) -> None:
+        for dataset, size in [("train", 9_773), ("dev", 904), ("test", 3_240)]:
+            path = "common_voice_cs.{}.tfrecord".format(dataset)
+            if not os.path.exists(path):
+                print("Downloading file {}...".format(path), file=sys.stderr)
+                urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path))
+                os.rename("{}.tmp".format(path), path)
+
+            setattr(self, dataset, self.Dataset(path, size, decode_on_demand))
+
+        self._letters_vocab = Vocabulary(self.LETTERS)
+
+    train: Dataset
+    dev: Dataset
+    test: Dataset
+
+    @property
+    def letters_vocab(self) -> Vocabulary:
+        return self._letters_vocab
+
+    # TFRecord loading
+    @staticmethod
+    def _load_data(path: str, items: int) -> tuple[dict[str, array.array], dict[str, array.array]]:
+        def get_value() -> np.int64:
+            nonlocal data, offset
+            value = np.int64(data[offset] & 0x7F); start = offset; offset += 1
+            while data[offset - 1] & 0x80:
+                value |= (data[offset] & 0x7F) << (7 * (offset - start)); offset += 1
+            return value
+
+        def get_value_of_kind(kind: int) -> np.int64:
+            nonlocal data, offset
+            assert data[offset] == kind; offset += 1
+            return get_value()
+
+        arrays, indices = {}, {}
+        with open(path, "rb") as file:
+            for _ in range(items):
+                length = file.read(8); assert len(length) == 8
+                length, = struct.unpack("<Q", length)
+                assert len(file.read(4)) == 4
+                data = file.read(length); assert len(data) == length
+                assert len(file.read(4)) == 4
+
+                offset = 0
+                length = get_value_of_kind(0x0A)
+                assert len(data) - offset == length
+                while offset < len(data):
+                    get_value_of_kind(0x0A)
+                    length = get_value_of_kind(0x0A)
+                    key = data[offset:offset + length].decode("utf-8"); offset += length
+                    get_value_of_kind(0x12)
+                    if key not in arrays:
+                        arrays[key] = array.array({0x0A: "B", 0x1A: "q", 0x12: "f"}.get(data[offset], "B"))
+                        indices[key] = array.array("L", [0])
+
+                    if data[offset] == 0x0A:
+                        length = get_value_of_kind(0x0A) and get_value_of_kind(0x0A)
+                        arrays[key].frombytes(data[offset:offset + length]); offset += length
+                    elif data[offset] == 0x1A:
+                        length = get_value_of_kind(0x1A) and get_value_of_kind(0x0A)
+                        target_offset = offset + length
+                        while offset < target_offset:
+                            arrays[key].append(get_value())
+                    elif data[offset] == 0x12:
+                        length = get_value_of_kind(0x12) and get_value_of_kind(0x0A)
+                        arrays[key].frombytes(np.frombuffer(
+                            data, np.dtype("<f4"), length >> 2, offset).astype(np.float32).tobytes()); offset += length
+                    else:
+                        raise ValueError("Unsupported data tag {}".format(data[offset]))
+                    indices[key].append(len(arrays[key]))
+        return arrays, indices
+
+    # Methods for generating MFCCs.
+    def load_audio(self, path: str, target_sample_rate: int | None = None) -> tuple[torch.Tensor, int]:
+        audio, sample_rate = torchaudio.load(path)
+        if target_sample_rate is not None and target_sample_rate != sample_rate:
+            audio = torchaudio.functional.resample(audio, sample_rate, target_sample_rate)
+            sample_rate = target_sample_rate
+        return torch.mean(audio, dim=0), sample_rate
+
+    # Note that while the dataset MFCCs were generated using an implementation
+    # functionally equivalent to the following, different resampling was used,
+    # so the values are not exactly the same.
+    def mfcc_extract(self, audio: torch.Tensor, sample_rate: int = 16_000) -> torch.Tensor:
+        assert sample_rate == 16000, "Only 16k sample rate is supported"
+
+        if not hasattr(self, "_mfcc_fn"):
+            # Compute a 1024-point STFT with frames of 64 ms and 75% overlap.
+            # Then warp the linear scale spectrograms into the mel-scale.
+            # Compute a stabilized log to get log-magnitude mel-scale spectrograms.
+            # Finally, compute MFCCs from log-mel-spectrograms and take the first
+            # `CommonVoiceCs.MFCC_DIM=13` of them.
+            self._mfcc_fn = torchaudio.transforms.MFCC(
+                sample_rate=16_000, n_mfcc=self.MFCC_DIM, log_mels=True,
+                melkwargs={"n_fft":1024, "win_length":1024, "hop_length":256,
+                           "f_min": 80., "f_max": 7600., "n_mels": 80, "center": False}
+            )
+        # Compute MFCCs of shape `[sequence_length, CommonVoiceCs.MFCC_DIM=13]`.
+        mfccs = self._mfcc_fn(audio).permute(1, 0)
+        return mfccs
+
+    # Torchmetric for computing mean edit distance
+    class EditDistanceMetric(torchmetrics.MeanMetric):
+        def update(self, pred: Sequence[Sequence[Any]], true: Sequence[Sequence[Any]]) -> None:
+            edit_distances = []
+            for y_pred, y_true in zip(pred, true):
+                edit_distances.append(torchaudio.functional.edit_distance(y_pred, y_true) / len(y_true))
+            return super().update(edit_distances)
+
+    # Evaluation infrastructure
+    @staticmethod
+    def evaluate(gold_dataset: Dataset, predictions: Sequence[str]) -> float:
+        gold = [example["sentence"] for example in gold_dataset]
+
+        if len(predictions) != len(gold):
+            raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format(
+                len(predictions), len(gold)))
+
+        edit_distance = CommonVoiceCs.EditDistanceMetric()
+        for gold_sentence, prediction in zip(gold, predictions):
+            edit_distance([prediction], [gold_sentence])
+        return 100 * edit_distance.compute()
+
+    @staticmethod
+    def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float:
+        predictions = []
+        for line in predictions_file:
+            predictions.append(line.rstrip("\n"))
+        return CommonVoiceCs.evaluate(gold_dataset, predictions)
+
+
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate")
+    parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate")
+    args = parser.parse_args()
+
+    if args.evaluate:
+        with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file:
+            edit_distance = CommonVoiceCs.evaluate_file(getattr(CommonVoiceCs(), args.dataset), predictions_file)
+        print("CommonVoiceCs edit distance: {:.2f}%".format(edit_distance))
diff --git a/labs/09/projector_export.py b/labs/09/projector_export.py
new file mode 100644
index 0000000..babfabd
--- /dev/null
+++ b/labs/09/projector_export.py
@@ -0,0 +1,33 @@
+#!/usr/bin/env python
+import argparse
+import os
+
+import numpy as np
+import torch
+import torch.utils.tensorboard
+
+if __name__ == "__main__":
+    # Parse arguments
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input_embeddings", type=str, help="Embedding file to use.")
+    parser.add_argument("--elements", default=20000, type=int, help="Words to export.")
+    parser.add_argument("--output_dir", default="embeddings", type=str, help="Output directory.")
+    args = parser.parse_args([] if "__file__" not in globals() else None)
+
+    # Generate the embeddings for the projector
+    with open(args.input_embeddings, "r") as embedding_file:
+        _, dim = map(int, embedding_file.readline().split())
+
+        embeddings = np.zeros([args.elements, dim], np.float32)
+        words = []
+        for i, line in zip(range(args.elements), embedding_file):
+            word, *embedding = line.split()
+            words.append(word)
+            embeddings[i] = list(map(float, embedding))
+
+    # Save the embeddings
+    torch.utils.tensorboard.SummaryWriter(args.output_dir).add_embedding(
+        torch.tensor(embeddings),
+        metadata=words,
+        tag="embeddings",
+    )
diff --git a/lectures/lecture06.md b/lectures/lecture06.md
new file mode 100644
index 0000000..3631cee
--- /dev/null
+++ b/lectures/lecture06.md
@@ -0,0 +1,20 @@
+### Lecture: 6. Object Detection
+#### Date: Mar 25
+#### Slides: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides/?06
+#### Reading: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides.pdf/npfl138-2324-06.pdf, PDF Slides
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-czech.mp4, CZ Lecture
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-czech.practicals.mp4, CZ Practicals
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-english.mp4, EN Lecture
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-english.practicals.mp4, EN Practicals
+#### Questions: #lecture_6_questions
+#### Lecture assignment: bboxes_utils
+#### Lecture assignment: svhn_competition
+
+- R-CNN [[R-CNN](https://arxiv.org/abs/1311.2524)]
+- Fast R-CNN [[Fast R-CNN](https://arxiv.org/abs/1504.08083)]
+- Proposing RoIs using Faster R-CNN [[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)]
+- Mask R-CNN [[Mask R-CNN](https://arxiv.org/abs/1703.06870)]
+- Feature Pyramid Networks [[Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144)]
+- Focal Loss, RetinaNet [[Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002)]
+- _EfficientDet [[EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070)]_
+- Group Normalization [[Group Normalization](https://arxiv.org/abs/1803.08494)]
diff --git a/lectures/lecture07.md b/lectures/lecture07.md
new file mode 100644
index 0000000..e37111f
--- /dev/null
+++ b/lectures/lecture07.md
@@ -0,0 +1,5 @@
+### Lecture: 7. Easter Monday
+#### Date: Apr 01
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-07-czech.practicals.mp4, CZ Practicals
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-07-english.practicals.mp4, EN Practicals
+#### Lecture assignment: 3d_recognition
diff --git a/lectures/lecture08.md b/lectures/lecture08.md
new file mode 100644
index 0000000..c5c8752
--- /dev/null
+++ b/lectures/lecture08.md
@@ -0,0 +1,26 @@
+### Lecture: 8. Recurrent Neural Networks
+#### Date: Apr 8
+#### Slides: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides/?08
+#### Reading: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides.pdf/npfl138-2324-08.pdf, PDF Slides
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-08-czech.mp4, CZ Lecture
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-08-english.mp4, EN Lecture
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-08-english.practicals-svhn_competition.mp4, EN SVHN Competition
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-08-english.practicals.mp4, EN Practicals
+#### Questions: #lecture_8_questions
+#### Lecture assignment: sequence_classification
+#### Lecture assignment: tagger_we
+#### Lecture assignment: tagger_cle
+#### Lecture assignment: tagger_competition
+
+- Sequence modelling using Recurrent Neural Networks (RNN) [Chapter 10 until Section 10.2.1 (excluding) of DLB]
+- The challenge of long-term dependencies [Section 10.7 of DLB]
+- Long Short-Term Memory (LSTM) [Section 10.10.1 of DLB, _[Sepp Hochreiter, Jürgen Schmidhuber (1997): Long short-term memory](http://www.bioinf.jku.at/publications/older/2604.pdf), [Felix A. Gers, Jürgen Schmidhuber, Fred Cummins (2000): Learning to Forget: Continual Prediction with LSTM](ftp://ftp.idsia.ch/pub/juergen/FgGates-NC.pdf)_]
+- Gated Recurrent Unit (GRU) [Section 10.10.2 of DLB, _[Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation](https://arxiv.org/abs/1406.1078)_]
+- Highway Networks [[Training Very Deep Networks](https://arxiv.org/abs/1507.06228)]
+- RNN Regularization
+  - Variational Dropout [[A Theoretically Grounded Application of Dropout in Recurrent Neural Networks](https://arxiv.org/abs/1512.05287)]
+  - Layer Normalization [[Layer Normalization](https://arxiv.org/abs/1607.06450)]
+- Bidirectional RNN [Section 10.3 of DLB]
+- Word Embeddings [Section 14.2.4 of DLB]
+- Character-level embeddings using Recurrent neural networks [C2W model from [Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation](http://arxiv.org/abs/1508.02096)]
+- _Character-level embeddings using Convolutional neural networks [CharCNN from [Character-Aware Neural Language Models](https://arxiv.org/abs/1508.06615)]_
diff --git a/lectures/lecture09.md b/lectures/lecture09.md
new file mode 100644
index 0000000..395c6ca
--- /dev/null
+++ b/lectures/lecture09.md
@@ -0,0 +1,21 @@
+### Lecture: 9. Structured Prediction, CTC, Word2Vec
+#### Date: Apr 15
+#### Slides: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides/?09
+#### Reading: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides.pdf/npfl138-2324-09.pdf, PDF Slides
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-09-czech.mp4, CZ Lecture
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-09-czech.practicals.mp4, CZ Practicals
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-09-english.mp4, EN Lecture
+#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-09-english.practicals.mp4, EN Practicals
+#### Questions: #lecture_9_questions
+#### Lecture assignment: tensorboard_projector
+#### Lecture assignment: tagger_ner
+#### Lecture assignment: ctc_loss
+#### Lecture assignment: speech_recognition
+
+- Structured prediction
+- Connectionist Temporal Classification (CTC) loss [[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.cs.toronto.edu/~graves/icml_2006.pdf)]
+- `Word2vec` word embeddings, notably the CBOW and Skip-gram architectures [[Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/abs/1301.3781)]
+  - Hierarchical softmax [Section 12.4.3.2 of DLB or [Distributed Representations of Words and Phrases and their Compositionality](https://arxiv.org/abs/1310.4546)]
+  - Negative sampling [Distributed Representations of Words and Phrases and their Compositionality](https://arxiv.org/abs/1310.4546)]
+- _Character-level embeddings using character n-grams [Described simultaneously in several papers as Charagram ([Charagram: Embedding Words and Sentences via Character n-grams](https://arxiv.org/abs/1607.02789)), Subword Information ([Enriching Word Vectors with Subword Information](https://arxiv.org/abs/1607.04606) or SubGram ([SubGram: Extending Skip-Gram Word Representation with Substrings](http://link.springer.com/chapter/10.1007/978-3-319-45510-5_21))]_
+- _ELMO [[Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer: Deep contextualized word representations](https://arxiv.org/abs/1802.05365)]_
diff --git a/slides/06/06.md b/slides/06/06.md
new file mode 100644
index 0000000..b795c5d
--- /dev/null
+++ b/slides/06/06.md
@@ -0,0 +1,711 @@
+title: NPFL138, Lecture 6
+class: title, langtech, cc-by-sa
+
+# Object Detection
+
+## Milan Straka
+
+### March 25, 2024
+
+---
+section: FastR-CNN
+class: middle, center
+# Beyond Image Classification
+
+# Beyond Image Classification
+
+---
+# Beyond Image Classification
+
+![w=70%,f=right](../01/object_detection.svgz)
+
+- Object detection (including location)
+<br clear="both">
+
+~~~
+![w=70%,f=right](../01/image_segmentation.svgz)
+
+- Image segmentation
+<br clear="both">
+
+~~~
+![w=70%,f=right](../01/human_pose_estimation.jpg)
+
+- Human pose estimation
+
+---
+# Beyond Image Classification
+
+![w=100%,v=middle](cv_tasks.jpg)
+
+---
+# Object Localization
+
+![w=100%](object_localization.png)
+
+We can perform object localization by jointly predicting the bounding box
+coordinates using regression.
+
+---
+# R-CNN
+
+![w=42%,f=right](roi_generation.jpg)
+
+To be able to recognize and localize _several_ objects, assume we were given
+multiple interesting regions of the image, called **regions of interest** (RoI).
+For each of them, we decide:
+- whether it contains an object;
+- the location of the object relative to the RoI.
+
+~~~
+![w=45%,f=right](rcnn_architecture.svgz)
+
+In R-CNN, we start with a network pre-trained on ImageNet (VGG-16 is used in the
+original paper), and we use it to process _every RoI_, rescaling every one of
+them to the size of $224×224$.
+
+~~~
+For every RoI, two sibling heads are added:
+- _classification head_ predicts either _background_ or one of $K$ object types
+  ($K+1$ in total),
+~~~
+- _bounding box regression head_ predicts 4 bounding box parameters relative
+  to RoI.
+
+---
+# R-CNN – Bounding Boxes
+
+A bounding box is parametrized as follows. Let $x_r, y_r, w_r, h_r$ be
+center coordinates and width and height of the RoI respectively, and let $x, y, w, h$ be
+parameters of the bounding box. We represent the bounding box relative
+to the RoI as follows:
+$$\begin{aligned}
+t_x &= (x - x_r)/w_r, & t_y &= (y - y_r)/h_r, \\
+t_w &= \log (w/w_r), & t_h &= \log (h/h_r).
+\end{aligned}$$
+
+~~~
+In Fast R-CNN, the $\textrm{smooth}_{L_1}$ loss, or **Huber loss**, is employed for bounding box parameters:
+
+![w=19.5%,f=right](huber_loss.svgz)
+
+$$\textrm{smooth}_{L_1}(x) = \begin{cases}
+  0.5x^2    & \textrm{if }|x| < 1, \\
+  |x| - 0.5 & \textrm{otherwise}.
+\end{cases}$$
+
+~~~
+The complete loss is then ($λ=1$ is used in the Fast R-CNN paper)
+$$L(ĉ, t̂, c, t) = L_\textrm{cls}(ĉ, c) + λ ⋅ [c ≥ 1] ⋅
+  ∑\nolimits_{i ∈ \lbrace \mathrm{x, y, w, h}\rbrace} \textrm{smooth}_{L_1}(t̂_i - t_i).$$
+
+---
+# R-CNN – Bounding Boxes
+
+The described bounding box representation is usually called `CXCYWH`:
+
+![w=60%,h=center](bbox_representation_cxcywh.webp)
+
+---
+# R-CNN – Bounding Boxes
+
+In the datasets, the bounding boxes are usually represented using `XYXY` format:
+
+![w=60%,h=center](bbox_representation_xyxy.webp)
+
+---
+# R-CNN – Bounding Boxes
+
+Finally, you could also come across the `XYWH` format:
+
+![w=60%,h=center](bbox_representation_xywh.webp)
+
+---
+# Fast R-CNN Architecture
+
+The R-CNN is slow, because it needs to process every RoI by the convolutional
+backbone. To speed it up, we might want to first process the whole image by the
+backbone and only then extract a fixed-size representation for every RoI.
+
+~~~
+
+We achieve that using **RoI pooling**, replacing the last max-pool $14×14 → 7×7$
+VGG layer.
+
+![w=50%](roi_projection.svgz)![w=50%,mw=50%,h=center](roi_pooling.svgz)
+
+During RoI pooling, we obtain a $7×7$ RoI representation by first projecting the
+RoI to the $14×14$ resolution and then computing each of the $7×7$ values by
+**max-pooling** the corresponding “pixels” of the convolutional image features.
+
+---
+# Fast R-CNN
+
+![w=85%,h=center](fast_rcnn_rumcajs.svgz)
+
+~~~
+![w=85%,h=center](fast_rcnn_vgg.png)
+
+---
+# Fast R-CNN and R-CNN Comparison
+
+![w=100%](fast_rcnn_architecture.svgz)
+
+---
+# Fast R-CNN Architecture
+
+![w=100%,v=middle](fast_rcnn.jpg)
+
+---
+# Fast R-CNN Training and Inference
+
+## Intersection over Union
+For two bounding boxes (or two masks) the _intersection over union_ (_IoU_)
+is a ratio of the intersection of the boxes (or masks) and the union
+of the boxes (or masks).
+
+~~~
+## Choosing RoIs for Training
+During training, we use 2 images with 64 RoIs each. The RoIs are selected
+so that 25% have intersection over union (IoU) overlap of at least 0.5
+with ground-truth boxes; the others are chosen to have the IoU in range $[0.1, 0.5)$,
+the so-called _hard examples_.
+
+~~~
+## Running Inference
+During inference, we utilize all RoIs, but a single object can be found in
+several of them. To choose the most salient prediction, we perform **non-maximum
+suppression** – we ignore predictions which have an overlap with a higher
+scoring prediction of the _same class_, where the overlap is computed using IoU
+(0.3 threshold is used in the paper). Higher scoring predictions are the ones
+with higher probability from the _classification head_.
+
+---
+# Object Detection Evaluation
+
+## Average Precision
+Evaluation is performed using _Average Precision_ ($\mathit{AP}$ or $\mathit{AP}_{50}$).
+
+We assume all bounding boxes (or masks) produced by a system have confidence
+values which can be used to rank them. Then, for a single class, we take the
+boxes (or masks) in the order of the ranks and generate precision/recall curve,
+considering a bounding box correct if it has IoU at least 50% with any
+ground-truth box.
+
+![w=60%,mw=50%,h=center](precision_recall_person.svgz)![w=60%,mw=50%,h=center](precision_recall_bottle.svgz)
+
+---
+# Object Detection Evaluation – Average Precision
+
+The general idea of AP is to compute the area under the precision/recall curve.
+
+![w=80%,mw=49%,h=center](precision_recall_curve.png)
+
+~~~
+![w=80%,mw=49%,h=center](precision_recall_curve_interpolated.jpg)
+
+We start by interpolating the precision/recall curve, so that it is always
+nonincreasing.
+
+~~~
+![w=80%,mw=49%,h=center,f=right](average_precision.jpg)
+
+Finally, the average precision for a single class is an average of precision at
+recall $0.0, 0.1, 0.2, …, 1.0$.
+
+~~~
+The final AP is a mean of average precision of all classes.
+
+---
+class: tablewide
+style: table {line-height: 1}
+# Object Detection Evaluation – Average Precision
+
+For the COCO dataset, the AP is computed slightly differently. First, it is an
+average over 101 recall points $0.00, 0.01, 0.02, …, 1.00$.
+
+~~~
+In the original metric, IoU of 50% is enough to consider a prediction valid.
+We can generalize the definition to $\mathit{AP}_{t}$, where an object
+prediction is considered valid if IoU is at least $t$%.
+
+~~~
+The main COCO metric, denoted just $\mathit{AP}$, is the mean of
+$\mathit{AP}_{50},\mathit{AP}_{55}, \mathit{AP}_{60}, …, \mathit{AP}_{95}$.
+
+~~~
+| Metric | Description |
+|:------:|:------------|
+| $\mathit{AP}$ | Mean of $\mathit{AP}_{50},\mathit{AP}_{55}, \mathit{AP}_{60}, \mathit{AP}_{65}, …, \mathit{AP}_{95}$ |
+| $\mathit{AP}_{50}$ | AP at IoU 50% |
+| $\mathit{AP}_{75}$ | AP at IoU 75% |
+~~~
+| $\mathit{AP}_{S}$ | AP for small objects: $\textit{area} < 32^2$ |
+| $\mathit{AP}_{M}$ | AP for medium objects: $32^2 < \textit{area} < 96^2$ |
+| $\mathit{AP}_{L}$ | AP for large objects: $96^2 < \textit{area}$ |
+
+
+---
+section: FasterR-CNN
+# Faster R-CNN
+
+![w=40%,f=right](fast_rcnn_speed.svgz)
+
+Even if Fast R-CNN is much faster then R-CNN, it can still be improved,
+considering that the most problematic and time consuming part is generating the RoIs.
+<br clear="both">
+
+~~~
+![w=30%,f=right](faster_rcnn_architecture.png)
+
+Faster R-CNN extends Fast R-CNN by including a **region proposal
+network (RPN)**, whose goal is to generate the RoIs automatically.
+
+~~~
+The regional proposal network produces the so-called **region proposals**,
+which then play the role of RoIs in the rest of the pipeline (i.e.,
+the Fast R-CNN).
+
+~~~
+The region proposals are generated similarly to how predictions are generated
+in Fast R-CNN. We start with several **anchors** and from each anchor
+we generate either a single region proposal or nothing.
+
+---
+# Faster R-CNN – Anchors
+
+If we consider the $14×14$ VGG backbone output, each “pixel” corresponds
+to a region of size $16×16$ in the original image.
+
+![w=45%,h=center](anchor_net.svgz)
+
+~~~
+We can therefore interpret each value in the $14×14$ output as a representation
+of a part of the image _centered_ in the corresponding image region, and try
+predicting a region proposal from **every one** of them.
+
+~~~
+We call the dense grid of image regions from which we are predicting the
+proposals the **anchors**. They have fixed size, and in practice we use
+_several_ anchors per position.
+
+---
+# Faster R-CNN
+
+For every anchor, we classify it in two classes (background, object)
+and also predict the region proposal bounding box relatively to the anchor,
+exactly as in (Fast) R-CNN.
+
+~~~
+![w=58%,f=right](faster_rcnn_rpn.svgz)
+
+We perform the classification and the bounding box regression by first
+running a $3×3$ convolution followed by ReLU on the $14×14$ VGG output,
+and then attaching the two heads.
+~~~
+Assuming there are $A$ anchors on every position:
+- the classification head generates $2A$ outputs, performing $\softmax$ on every
+  2 of them;
+- the regression head generates $4A$ region proposal coordinates.
+
+~~~
+The authors consider 3 scales $(128^2, 256^2, 512^2)$ and 3 aspect ratios
+$(1:1, 1:2, 2:1)$.
+
+---
+# Faster R-CNN
+
+During training, we generate
+- positive training examples for every anchor that has the highest IoU with
+  a ground-truth box;
+~~~
+- furthermore, a positive example is also any anchor with
+  IoU at least 0.7 for any ground-truth box;
+~~~
+- negative training examples for every anchor that has IoU at most 0.3 with all
+  ground-truth boxes;
+~~~
+- the positive and negative examples are generated with a ratio _up to_ 1:1
+  (less, if there are not enough positive examples; each minibatch consits of
+  a single image and 256 anchors).
+
+~~~
+During inference, we consider all predicted non-background regions, run
+non-maximum suppression on them using a 0.7 IoU threshold, and then take $N$
+top-scored regions (i.e., the ones with the highest probability from the
+classification head) – the paper uses 300 proposals, compared to 2000 in the Fast
+R-CNN.
+
+---
+# Faster R-CNN
+
+![w=94%,h=center](faster_rcnn_performance.svgz)
+
+---
+# Two-stage Detectors
+
+The Faster R-CNN is a so-called **two-stage** detector, where the regions are
+refined twice – once in the region proposal network, and then in the final
+bounding box regressor.
+
+~~~
+Several **single-stage** detector architectures have been proposed, mainly
+because they are faster and smaller, but until circa 2017 the two-stage
+detectors achieved better results.
+
+---
+section: MaskR-CNN
+# Mask R-CNN
+
+Straightforward extension of Faster R-CNN able to produce image segmentation
+(i.e., masks for every object).
+
+![w=100%,mh=80%,v=middle](../01/image_segmentation.svgz)
+
+---
+# Mask R-CNN – Architecture
+
+![w=100%,v=middle](mask_rcnn_architecture.png)
+
+---
+# Mask R-CNN – RoIAlign
+
+More precise alignment is required for the RoI in order to predict the masks.
+Instead of quantization and max-pooling in RoI pooling, **RoIAlign** uses bilinear
+interpolation of features at four regularly sampled locations in each RoI bin
+and averages them.
+
+![w=68%,mw=50%,h=center](roi_pooling.svgz)![w=68%,mw=50%,h=center](mask_rcnn_roialign.svgz)
+
+~~~
+TorchVision provides `torchvision.ops.roi_align` and `torchvision.ops.roi_pool`.
+
+---
+# Mask R-CNN
+
+Masks are predicted in a third branch of the object detector.
+
+- Higher resolution of the mask is usually needed (at least $14×14$, or even more).
+- The masks are predicted for each class separately.
+- The masks are predicted using convolutions instead of fully connected layers
+  (the upscaling convolutions are $2×2$ with stride 2).
+
+![w=79%,h=center](mask_rcnn_heads.svgz)
+
+~~~
+Improvements from Nov 2021: all convs (except for the output layer) are followed
+by BN, the _class&bbox_ head uses 4 convs instead of 2 MLPs, RPN contains
+two convs instead of one.
+
+---
+# Mask R-CNN
+
+![w=100%,v=middle](mask_rcnn_ablation.svgz)
+
+---
+# Mask R-CNN – Human Pose Estimation
+
+![w=80%,h=center](../01/human_pose_estimation.jpg)
+
+~~~
+- Testing applicability of Mask R-CNN architecture.
+
+- Keypoints (e.g., left shoulder, right elbow, …) are detected
+  as independent one-hot masks of size $56×56$ with $\softmax$ output function.
+
+~~~
+![w=70%,h=center](mask_rcnn_hpe_performance.svgz)
+
+---
+section: FPN
+# Feature Pyramid Networks
+
+![w=85%,h=center](fpn_overview.svgz)
+
+---
+# Feature Pyramid Networks
+
+![w=62%,h=center](fpn_architecture.svgz)
+
+---
+# Feature Pyramid Networks
+
+![w=56%,h=center](fpn_architecture_detailed.svgz)
+
+---
+# Feature Pyramid Networks
+
+We employ FPN as a backbone in Faster R-CNN.
+
+~~~
+Assuming ResNet-like network with $224×224$ input, we denote $C_2, C_3, …, C_5$
+the image features of the last convolutional layer of size $56×56, 28×28, …,
+7×7$ (i.e., $C_i$ indicates a downscaling of $2^i$).
+~~~
+The FPN representations incorporating the smaller resolution features are
+denoted as $P_2, …, P_5$, each consisting of 256 channels; the classification
+heads are shared.
+
+~~~
+In both the RPN and the Fast R-CNN, authors utilize the $P_2, …, P_5$
+representations, considering single-size anchors for every $P_i$ (of size
+$32^2, 64^2, 128^2, 256^2$, respectively). However, three aspect ratios
+$(1:1, 1:2, 2:1)$ are still used.
+
+~~~
+![w=100%](fpn_results.svgz)
+
+---
+section: FocalLoss
+# Focal Loss
+
+![w=46%,f=right](fast_rcnn_rumcajs.svgz)
+
+For single-stage object detection architectures, _class imbalance_ has been
+identified as the main issue preventing obtaining performance comparable to
+two-stage detectors. In a single-stage detector, there can be tens of thousands
+of anchors, with only dozens of useful training examples.
+
+~~~
+![w=46%,f=right](focal_loss_graph.svgz)
+
+Cross-entropy loss is computed as
+$$𝓛_\textrm{cross-entropy} = -\log p_\textrm{model}(y | x).$$
+
+~~~
+Focal-loss (loss focused on hard examples) is proposed as
+$$𝓛_\textrm{focal-loss} = -(1 - p_\textrm{model}(y | x))^γ ⋅ \log p_\textrm{model}(y | x).$$
+
+---
+# Focal Loss
+
+For $γ=0$, focal loss is equal to cross-entropy loss.
+
+~~~
+Authors reported that $γ=2$ worked best for them for training a single-stage
+detector.
+
+~~~
+![w=100%,mh=75%,v=bottom](focal_loss_cdf.svgz)
+
+---
+# Focal Loss and Class Imbalance
+
+Focal loss is connected to another solution to class imbalance – we might
+introduce weighting factor $α ∈ (0, 1)$ for one class and $1 - α$ for the other
+class, arriving at
+$$ -α_y ⋅ \log p_\textrm{model}(y | x).$$
+
+~~~
+The weight $α$ might be set to the inverse class frequency or treated as
+a hyperparameter.
+
+~~~
+Even if weighting focuses more on low-frequent class, it does not distinguish
+between easy and hard examples, contrary to focal loss.
+
+~~~
+In practice, the focal loss is usually used together with class weighting:
+$$ -α_y ⋅ (1 - p_\textrm{model}(y | x))^γ ⋅ \log p_\textrm{model}(y | x).$$
+For example, authors report that $α=0.25$ (weight of the rare class) works best with $γ=2$.
+
+---
+section: RetinaNet
+# RetinaNet
+
+RetinaNet is a single-stage detector, using feature pyramid network
+architecture. Built on top of ResNet architecture, the feature pyramid
+contains levels $P_3$ through $P_7$, with each $P_l$ having 256 channels
+and resolution $2^l$ times lower than the input. On each pyramid level $P_l$,
+we consider 9 anchors for every position, with 3 different aspect ratios ($1$, $1:2$, $2:1$)
+and with 3 different sizes $(\{2^0, 2^{1/3}, 2^{2/3}\} ⋅ 4 ⋅ 2^l)^2$.
+
+~~~
+Note that ResNet provides only $C_3$ to $C_5$ features. $C_6$ is computed
+using a $3×3$ convolution with stride 2 on $C_5$, and $C_7$ is obtained
+by applying ReLU followed by another $3×3$ stride-2 convolution. The $C_6$ and
+$C_7$ are included to improve large object detection.
+
+---
+# RetinaNet – Architecture
+
+The classification head and the boundary regression heads are fully
+convolutional and do not share parameters (but classification heads are shared
+across levels, and so are the boundary regression heads), generating
+$\mathit{anchors} ⋅ \mathit{classes}$ sigmoids and $\mathit{anchors}$ bounding
+boxes per position.
+
+![w=100%](retinanet.svgz)
+
+---
+# RetinaNet
+
+During training, anchors are assigned to ground-truth object boxes if IoU is at
+least 0.5; to background if IoU with any ground-truth region is at most 0.4
+(the rest of anchors is ignored during training).
+~~~
+The classification head is trained using focal loss with $γ=2$ and $α=0.25$ (but
+according to the paper, all values of $γ$ in $[0.5, 5]$ range work well); the
+boundary regression head is trained using $\textrm{smooth}_{L_1}$ loss as in
+Fast(er) R-CNN.
+
+~~~
+During inference, at most 1000 objects with at least 5% probability from all
+pyramid levels are considered, and all of them are combined using non-maximum
+suppression with a threshold of 0.5. Fixed-size training and testing is used,
+with sizes 400, 500, …, 800 pixels.
+
+~~~
+![w=68%](retinanet_results.svgz)![w=32%](retinanet_graph.svgz)
+
+---
+# RetinaNet – Ablations
+
+Ablations use ResNet-50-FPN backbone trained and tested with 600-pixel images.
+
+![w=80%,h=center](retinanet_ablations.svgz)
+
+---
+section: EfficientDet
+# EfficientDet – Architecture
+
+EfficientDet builds up on EfficientNet, and it delivered state-of-the-art performance
+in Nov 2019 with minimum time and space requirements (however, its performance
+has already been surpassed significantly). It is a single-scale detector similar
+to RetinaNet, which:
+
+~~~
+- uses EfficientNet as a backbone;
+~~~
+- employs compound scaling;
+~~~
+- uses a newly proposed BiFPN, “efficient bidirectional cross-scale connections
+  and weighted feature fusion”.
+
+~~~
+![w=78%,h=center](efficientdet_architecture.svgz)
+
+---
+# EfficientDet – BiFPN
+
+In multi-scale fusion in FPN, information flows only from the pyramid levels
+with smaller resolution to the levels with higher resolution.
+
+![w=80%,h=center](efficientdet_bifpn.svgz)
+
+~~~
+BiFPN consists of several rounds of bidirectional flows. Each bidirectional flow
+employs residual connections and does not include nodes that have only one input
+edge with no feature fusion. All operations are $3×3$ separable convolutions with
+batch normalization and ReLU, upsampling is done by repeating rows and columns
+and downsampling by max-pooling.
+
+---
+# EfficientDet – Weighted BiFPN
+
+When combining features with different resolutions, it is common to resize them
+to the same resolution and sum them – therefore, all set of features are
+considered to be of the same importance. The authors however argue that features
+from different resolution contribute to the final result _unequally_ and propose
+to combine them with trainable weighs.
+
+~~~
+- **Softmax-based fusion**: In each BiFPN node, we create a trainable weight
+  $w_i$ for every input $⇶I_i$ and the final combination (after resize, before
+  a convolution) is
+  $$∑_i \frac{e^{w_i}}{∑\nolimits_j e^{w_j}} ⇶I_i.$$
+
+~~~
+- **Fast normalized fusion**: Authors propose a simpler alternative of
+  weighting:
+  $$∑_i \frac{\ReLU(w_i)}{ε + ∑\nolimits_j \ReLU(w_j)} ⇶I_i.$$
+  It uses $ε=0.0001$ for stability and is up to 30% faster on a GPU.
+
+
+---
+# EfficientDet – Compound Scaling
+
+Similar to EfficientNet, authors propose to scale various dimensions of the
+network, using a single compound coefficient $ϕ$.
+
+~~~
+After performing a grid search:
+- the width of BiFPN is scaled as $W_\mathit{BiFPN} = 64 ⋅ 1.35^ϕ,$
+- the depth of BiFPN is scaled as $D_\mathit{BiFPN} = 3 + ϕ,$
+- the box/class predictor has the same width as BiFPN and depth $D_\mathit{class} = 3 + \lfloor ϕ/3 \rfloor,$
+- input image resolution increases according to $R_\mathit{image} = 512 + 128 ⋅ ϕ.$
+
+![w=45%,h=center](efficientdet_scaling.svgz)
+
+---
+# EfficientDet – Results
+
+![w=50%](efficientdet_flops.svgz)![w=50%](efficientdet_size.svgz)
+
+---
+# EfficientDet – Results
+
+![w=83%,h=center](efficientdet_results.svgz)
+
+---
+# EfficientDet – Inference Latencies
+
+![w=100%](efficientdet_latency.svgz)
+
+---
+# EfficientDet – Ablations
+
+Given that EfficientDet employs both a powerful backbone and new BiFPN, authors
+quantify the improvement of the individual components.
+
+![w=49%,h=center](efficientdet_ablations_backbone.svgz)
+
+~~~
+The comparison with previously used cross-scale fusion architectures is also
+provided:
+
+![w=49%,h=center](efficientdet_ablations_fpn.svgz)
+
+---
+class: wide
+# EfficientDet-D0 Example
+
+![w=98%,h=center](efficientdet_example.jpg)
+
+---
+section: GroupNorm
+# Normalization
+
+## Batch Normalization
+
+Neuron value is normalized across the minibatch, and in case of CNN also across
+all positions.
+
+~~~
+## Layer Normalization
+
+Neuron value is normalized across the layer.
+
+~~~
+![w=100%](normalizations.svgz)
+
+---
+# Group Normalization
+
+Group Normalization is analogous to Layer normalization, but the channels are
+normalized in groups (by default, $G=32$).
+
+![w=40%,h=center](normalizations.svgz)
+
+~~~
+![w=40%,h=center](group_norm.svgz)
+
+---
+# Group Normalization
+
+![w=78%,h=center](group_norm_vs_batch_norm.svgz)
+
+---
+# Group Normalization
+
+![w=65%,h=center](group_norm_coco.svgz)
diff --git a/slides/06/anchor_net.svgz b/slides/06/anchor_net.svgz
new file mode 100644
index 0000000..a78b80f
Binary files /dev/null and b/slides/06/anchor_net.svgz differ
diff --git a/slides/06/anchor_net.svgz.ref b/slides/06/anchor_net.svgz.ref
new file mode 100644
index 0000000..8473ea0
--- /dev/null
+++ b/slides/06/anchor_net.svgz.ref
@@ -0,0 +1 @@
+Adapted from slide 65 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/average_precision.jpg b/slides/06/average_precision.jpg
new file mode 100644
index 0000000..aa92c3a
Binary files /dev/null and b/slides/06/average_precision.jpg differ
diff --git a/slides/06/average_precision.jpg.ref b/slides/06/average_precision.jpg.ref
new file mode 100644
index 0000000..0bdfae7
--- /dev/null
+++ b/slides/06/average_precision.jpg.ref
@@ -0,0 +1 @@
+https://miro.medium.com/max/1400/1*naz02wO-XMywlwAdFzF-GA.jpeg
diff --git a/slides/06/bbox_representation_cxcywh.webp b/slides/06/bbox_representation_cxcywh.webp
new file mode 100644
index 0000000..745ad04
Binary files /dev/null and b/slides/06/bbox_representation_cxcywh.webp differ
diff --git a/slides/06/bbox_representation_cxcywh.webp.ref b/slides/06/bbox_representation_cxcywh.webp.ref
new file mode 100644
index 0000000..91b33ac
--- /dev/null
+++ b/slides/06/bbox_representation_cxcywh.webp.ref
@@ -0,0 +1 @@
+https://miro.medium.com/1*Z80D7vwD-3UwP16asY-k6A.jpeg
diff --git a/slides/06/bbox_representation_xywh.webp b/slides/06/bbox_representation_xywh.webp
new file mode 100644
index 0000000..f82925e
Binary files /dev/null and b/slides/06/bbox_representation_xywh.webp differ
diff --git a/slides/06/bbox_representation_xywh.webp.ref b/slides/06/bbox_representation_xywh.webp.ref
new file mode 100644
index 0000000..0e2a026
--- /dev/null
+++ b/slides/06/bbox_representation_xywh.webp.ref
@@ -0,0 +1 @@
+https://miro.medium.com/1*JLeFS2KIOzSTk6lUp1Ou2w.jpeg
diff --git a/slides/06/bbox_representation_xyxy.webp b/slides/06/bbox_representation_xyxy.webp
new file mode 100644
index 0000000..2f7d93b
Binary files /dev/null and b/slides/06/bbox_representation_xyxy.webp differ
diff --git a/slides/06/bbox_representation_xyxy.webp.ref b/slides/06/bbox_representation_xyxy.webp.ref
new file mode 100644
index 0000000..7399ff7
--- /dev/null
+++ b/slides/06/bbox_representation_xyxy.webp.ref
@@ -0,0 +1 @@
+https://miro.medium.com/1*oZcZhzOWKb3kvBHPOHYfow.jpeg
diff --git a/slides/06/cv_tasks.jpg b/slides/06/cv_tasks.jpg
new file mode 100644
index 0000000..de4459b
Binary files /dev/null and b/slides/06/cv_tasks.jpg differ
diff --git a/slides/06/cv_tasks.jpg.ref b/slides/06/cv_tasks.jpg.ref
new file mode 100644
index 0000000..1f5753a
--- /dev/null
+++ b/slides/06/cv_tasks.jpg.ref
@@ -0,0 +1 @@
+https://www.implantology.or.kr/articles/xml/RvNO/
diff --git a/slides/06/efficientdet_ablations_backbone.svgz b/slides/06/efficientdet_ablations_backbone.svgz
new file mode 100644
index 0000000..a73b0d0
Binary files /dev/null and b/slides/06/efficientdet_ablations_backbone.svgz differ
diff --git a/slides/06/efficientdet_ablations_backbone.svgz.ref b/slides/06/efficientdet_ablations_backbone.svgz.ref
new file mode 100644
index 0000000..8ea6795
--- /dev/null
+++ b/slides/06/efficientdet_ablations_backbone.svgz.ref
@@ -0,0 +1 @@
+Table 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_ablations_fpn.svgz b/slides/06/efficientdet_ablations_fpn.svgz
new file mode 100644
index 0000000..ac3affa
Binary files /dev/null and b/slides/06/efficientdet_ablations_fpn.svgz differ
diff --git a/slides/06/efficientdet_ablations_fpn.svgz.ref b/slides/06/efficientdet_ablations_fpn.svgz.ref
new file mode 100644
index 0000000..dd61bd6
--- /dev/null
+++ b/slides/06/efficientdet_ablations_fpn.svgz.ref
@@ -0,0 +1 @@
+Table 5 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_architecture.svgz b/slides/06/efficientdet_architecture.svgz
new file mode 100644
index 0000000..dd376f1
Binary files /dev/null and b/slides/06/efficientdet_architecture.svgz differ
diff --git a/slides/06/efficientdet_architecture.svgz.ref b/slides/06/efficientdet_architecture.svgz.ref
new file mode 100644
index 0000000..66db1af
--- /dev/null
+++ b/slides/06/efficientdet_architecture.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_bifpn.svgz b/slides/06/efficientdet_bifpn.svgz
new file mode 100644
index 0000000..bc694d3
Binary files /dev/null and b/slides/06/efficientdet_bifpn.svgz differ
diff --git a/slides/06/efficientdet_bifpn.svgz.ref b/slides/06/efficientdet_bifpn.svgz.ref
new file mode 100644
index 0000000..86130e9
--- /dev/null
+++ b/slides/06/efficientdet_bifpn.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_example.jpg b/slides/06/efficientdet_example.jpg
new file mode 100644
index 0000000..1f1aa1b
Binary files /dev/null and b/slides/06/efficientdet_example.jpg differ
diff --git a/slides/06/efficientdet_example.jpg.ref b/slides/06/efficientdet_example.jpg.ref
new file mode 100644
index 0000000..2e9aaab
--- /dev/null
+++ b/slides/06/efficientdet_example.jpg.ref
@@ -0,0 +1 @@
+https://github.com/google/automl/blob/master/efficientdet/g3doc/street.jpg
diff --git a/slides/06/efficientdet_flops.svgz b/slides/06/efficientdet_flops.svgz
new file mode 100644
index 0000000..24d9e8c
Binary files /dev/null and b/slides/06/efficientdet_flops.svgz differ
diff --git a/slides/06/efficientdet_flops.svgz.ref b/slides/06/efficientdet_flops.svgz.ref
new file mode 100644
index 0000000..186b61d
--- /dev/null
+++ b/slides/06/efficientdet_flops.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_latency.svgz b/slides/06/efficientdet_latency.svgz
new file mode 100644
index 0000000..0a5dd99
Binary files /dev/null and b/slides/06/efficientdet_latency.svgz differ
diff --git a/slides/06/efficientdet_latency.svgz.ref b/slides/06/efficientdet_latency.svgz.ref
new file mode 100644
index 0000000..bb23a56
--- /dev/null
+++ b/slides/06/efficientdet_latency.svgz.ref
@@ -0,0 +1 @@
+Figure 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_results.svgz b/slides/06/efficientdet_results.svgz
new file mode 100644
index 0000000..b2e4058
Binary files /dev/null and b/slides/06/efficientdet_results.svgz differ
diff --git a/slides/06/efficientdet_results.svgz.ref b/slides/06/efficientdet_results.svgz.ref
new file mode 100644
index 0000000..c4f6073
--- /dev/null
+++ b/slides/06/efficientdet_results.svgz.ref
@@ -0,0 +1 @@
+Table 2 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_scaling.svgz b/slides/06/efficientdet_scaling.svgz
new file mode 100644
index 0000000..675dbb8
Binary files /dev/null and b/slides/06/efficientdet_scaling.svgz differ
diff --git a/slides/06/efficientdet_scaling.svgz.ref b/slides/06/efficientdet_scaling.svgz.ref
new file mode 100644
index 0000000..5f14bba
--- /dev/null
+++ b/slides/06/efficientdet_scaling.svgz.ref
@@ -0,0 +1 @@
+Table 1 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/efficientdet_size.svgz b/slides/06/efficientdet_size.svgz
new file mode 100644
index 0000000..f42947b
Binary files /dev/null and b/slides/06/efficientdet_size.svgz differ
diff --git a/slides/06/efficientdet_size.svgz.ref b/slides/06/efficientdet_size.svgz.ref
new file mode 100644
index 0000000..bb23a56
--- /dev/null
+++ b/slides/06/efficientdet_size.svgz.ref
@@ -0,0 +1 @@
+Figure 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070
diff --git a/slides/06/fast_rcnn.jpg b/slides/06/fast_rcnn.jpg
new file mode 100644
index 0000000..1803bb5
Binary files /dev/null and b/slides/06/fast_rcnn.jpg differ
diff --git a/slides/06/fast_rcnn.jpg.ref b/slides/06/fast_rcnn.jpg.ref
new file mode 100644
index 0000000..fbecdb1
--- /dev/null
+++ b/slides/06/fast_rcnn.jpg.ref
@@ -0,0 +1 @@
+Figure 1 of "Fast R-CNN", https://arxiv.org/abs/1504.08083
diff --git a/slides/06/fast_rcnn_architecture.svgz b/slides/06/fast_rcnn_architecture.svgz
new file mode 100644
index 0000000..b7bda19
Binary files /dev/null and b/slides/06/fast_rcnn_architecture.svgz differ
diff --git a/slides/06/fast_rcnn_architecture.svgz.ref b/slides/06/fast_rcnn_architecture.svgz.ref
new file mode 100644
index 0000000..6efa2ff
--- /dev/null
+++ b/slides/06/fast_rcnn_architecture.svgz.ref
@@ -0,0 +1 @@
+Slide 61 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/fast_rcnn_rumcajs.svgz b/slides/06/fast_rcnn_rumcajs.svgz
new file mode 100644
index 0000000..c774a93
Binary files /dev/null and b/slides/06/fast_rcnn_rumcajs.svgz differ
diff --git a/slides/06/fast_rcnn_rumcajs.svgz.ref b/slides/06/fast_rcnn_rumcajs.svgz.ref
new file mode 100644
index 0000000..3ebdb63
--- /dev/null
+++ b/slides/06/fast_rcnn_rumcajs.svgz.ref
@@ -0,0 +1 @@
+https://commons.wikimedia.org/wiki/File:Tišnov,_Hajánky,_garážová_ozdoba_(6597).jpg
diff --git a/slides/06/fast_rcnn_speed.svgz b/slides/06/fast_rcnn_speed.svgz
new file mode 100644
index 0000000..9f24720
Binary files /dev/null and b/slides/06/fast_rcnn_speed.svgz differ
diff --git a/slides/06/fast_rcnn_speed.svgz.ref b/slides/06/fast_rcnn_speed.svgz.ref
new file mode 100644
index 0000000..436c3bf
--- /dev/null
+++ b/slides/06/fast_rcnn_speed.svgz.ref
@@ -0,0 +1 @@
+Slide 76 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/fast_rcnn_vgg.png b/slides/06/fast_rcnn_vgg.png
new file mode 100644
index 0000000..07cfbf0
Binary files /dev/null and b/slides/06/fast_rcnn_vgg.png differ
diff --git a/slides/06/fast_rcnn_vgg.png.ref b/slides/06/fast_rcnn_vgg.png.ref
new file mode 100644
index 0000000..62ac59b
--- /dev/null
+++ b/slides/06/fast_rcnn_vgg.png.ref
@@ -0,0 +1 @@
+https://en.wikipedia.org/wiki/File:VGG_neural_network.png
diff --git a/slides/06/faster_rcnn_architecture.png b/slides/06/faster_rcnn_architecture.png
new file mode 100644
index 0000000..8464540
Binary files /dev/null and b/slides/06/faster_rcnn_architecture.png differ
diff --git a/slides/06/faster_rcnn_architecture.png.ref b/slides/06/faster_rcnn_architecture.png.ref
new file mode 100644
index 0000000..657ebdd
--- /dev/null
+++ b/slides/06/faster_rcnn_architecture.png.ref
@@ -0,0 +1 @@
+Figure 2 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497
diff --git a/slides/06/faster_rcnn_performance.svgz b/slides/06/faster_rcnn_performance.svgz
new file mode 100644
index 0000000..f2ccc58
Binary files /dev/null and b/slides/06/faster_rcnn_performance.svgz differ
diff --git a/slides/06/faster_rcnn_performance.svgz.ref b/slides/06/faster_rcnn_performance.svgz.ref
new file mode 100644
index 0000000..8796742
--- /dev/null
+++ b/slides/06/faster_rcnn_performance.svgz.ref
@@ -0,0 +1 @@
+Tables 3 and 4 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497
diff --git a/slides/06/faster_rcnn_rpn.svgz b/slides/06/faster_rcnn_rpn.svgz
new file mode 100644
index 0000000..b493b07
Binary files /dev/null and b/slides/06/faster_rcnn_rpn.svgz differ
diff --git a/slides/06/faster_rcnn_rpn.svgz.ref b/slides/06/faster_rcnn_rpn.svgz.ref
new file mode 100644
index 0000000..1fac88c
--- /dev/null
+++ b/slides/06/faster_rcnn_rpn.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497
diff --git a/slides/06/focal_loss_cdf.svgz b/slides/06/focal_loss_cdf.svgz
new file mode 100644
index 0000000..403d6d5
Binary files /dev/null and b/slides/06/focal_loss_cdf.svgz differ
diff --git a/slides/06/focal_loss_cdf.svgz.ref b/slides/06/focal_loss_cdf.svgz.ref
new file mode 100644
index 0000000..0dd7c12
--- /dev/null
+++ b/slides/06/focal_loss_cdf.svgz.ref
@@ -0,0 +1 @@
+Figure 4 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/focal_loss_graph.svgz b/slides/06/focal_loss_graph.svgz
new file mode 100644
index 0000000..44ebdf2
Binary files /dev/null and b/slides/06/focal_loss_graph.svgz differ
diff --git a/slides/06/focal_loss_graph.svgz.ref b/slides/06/focal_loss_graph.svgz.ref
new file mode 100644
index 0000000..ccc201a
--- /dev/null
+++ b/slides/06/focal_loss_graph.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/fpn_architecture.svgz b/slides/06/fpn_architecture.svgz
new file mode 100644
index 0000000..af04b27
Binary files /dev/null and b/slides/06/fpn_architecture.svgz differ
diff --git a/slides/06/fpn_architecture.svgz.ref b/slides/06/fpn_architecture.svgz.ref
new file mode 100644
index 0000000..96d788c
--- /dev/null
+++ b/slides/06/fpn_architecture.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144
diff --git a/slides/06/fpn_architecture_detailed.svgz b/slides/06/fpn_architecture_detailed.svgz
new file mode 100644
index 0000000..ff42dd0
Binary files /dev/null and b/slides/06/fpn_architecture_detailed.svgz differ
diff --git a/slides/06/fpn_architecture_detailed.svgz.ref b/slides/06/fpn_architecture_detailed.svgz.ref
new file mode 100644
index 0000000..bfb0bc8
--- /dev/null
+++ b/slides/06/fpn_architecture_detailed.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144
diff --git a/slides/06/fpn_overview.svgz b/slides/06/fpn_overview.svgz
new file mode 100644
index 0000000..c6c1574
Binary files /dev/null and b/slides/06/fpn_overview.svgz differ
diff --git a/slides/06/fpn_overview.svgz.ref b/slides/06/fpn_overview.svgz.ref
new file mode 100644
index 0000000..c00542b
--- /dev/null
+++ b/slides/06/fpn_overview.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144
diff --git a/slides/06/fpn_results.svgz b/slides/06/fpn_results.svgz
new file mode 100644
index 0000000..02db310
Binary files /dev/null and b/slides/06/fpn_results.svgz differ
diff --git a/slides/06/fpn_results.svgz.ref b/slides/06/fpn_results.svgz.ref
new file mode 100644
index 0000000..8ced9a5
--- /dev/null
+++ b/slides/06/fpn_results.svgz.ref
@@ -0,0 +1 @@
+Table 4 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144
diff --git a/slides/06/group_norm.svgz b/slides/06/group_norm.svgz
new file mode 100644
index 0000000..0be782b
Binary files /dev/null and b/slides/06/group_norm.svgz differ
diff --git a/slides/06/group_norm.svgz.ref b/slides/06/group_norm.svgz.ref
new file mode 100644
index 0000000..6e47f02
--- /dev/null
+++ b/slides/06/group_norm.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Group Normalization", https://arxiv.org/abs/1803.08494
diff --git a/slides/06/group_norm_coco.svgz b/slides/06/group_norm_coco.svgz
new file mode 100644
index 0000000..fe964af
Binary files /dev/null and b/slides/06/group_norm_coco.svgz differ
diff --git a/slides/06/group_norm_coco.svgz.ref b/slides/06/group_norm_coco.svgz.ref
new file mode 100644
index 0000000..86ea266
--- /dev/null
+++ b/slides/06/group_norm_coco.svgz.ref
@@ -0,0 +1 @@
+Tables 4 and 5 of "Group Normalization", https://arxiv.org/abs/1803.08494
diff --git a/slides/06/group_norm_vs_batch_norm.svgz b/slides/06/group_norm_vs_batch_norm.svgz
new file mode 100644
index 0000000..2c017ac
Binary files /dev/null and b/slides/06/group_norm_vs_batch_norm.svgz differ
diff --git a/slides/06/group_norm_vs_batch_norm.svgz.ref b/slides/06/group_norm_vs_batch_norm.svgz.ref
new file mode 100644
index 0000000..e6c9431
--- /dev/null
+++ b/slides/06/group_norm_vs_batch_norm.svgz.ref
@@ -0,0 +1 @@
+Figures 4 and 5 of "Group Normalization", https://arxiv.org/abs/1803.08494
diff --git a/slides/06/huber_loss.py b/slides/06/huber_loss.py
new file mode 100644
index 0000000..f6f93d2
--- /dev/null
+++ b/slides/06/huber_loss.py
@@ -0,0 +1,22 @@
+#!/usr/bin/env python3
+import os
+
+import matplotlib
+import matplotlib.pyplot as plt
+import numpy as np
+
+matplotlib.rcParams["mathtext.fontset"] = "cm"
+
+xs = np.linspace(-3, 3, 51)
+l2 = xs * xs / 2
+huber = np.where(np.abs(xs) <= 1, xs * xs / 2, np.abs(xs) - 0.5)
+d_huber = np.where(np.abs(xs) <= 1, xs, np.sign(xs))
+
+plt.figure(figsize=(5, 3.5))
+plt.plot(xs, l2, label="L2 loss $\\frac{1}{2} x^2$")
+plt.plot(xs, huber, label="Huber loss")
+plt.plot(xs, d_huber, label="Huber loss derivative")
+plt.gca().set_aspect(1)
+plt.grid(True)
+plt.legend(loc="upper center")
+plt.savefig("huber_loss.svg", bbox_inches="tight", transparent=True)
diff --git a/slides/06/huber_loss.svgz b/slides/06/huber_loss.svgz
new file mode 100644
index 0000000..a3362fa
Binary files /dev/null and b/slides/06/huber_loss.svgz differ
diff --git a/slides/06/huber_loss.svgz.ref b/slides/06/huber_loss.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/06/mask_rcnn_ablation.svgz b/slides/06/mask_rcnn_ablation.svgz
new file mode 100644
index 0000000..1b6b8e2
Binary files /dev/null and b/slides/06/mask_rcnn_ablation.svgz differ
diff --git a/slides/06/mask_rcnn_ablation.svgz.ref b/slides/06/mask_rcnn_ablation.svgz.ref
new file mode 100644
index 0000000..8877b9d
--- /dev/null
+++ b/slides/06/mask_rcnn_ablation.svgz.ref
@@ -0,0 +1 @@
+Table 2 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/mask_rcnn_architecture.png b/slides/06/mask_rcnn_architecture.png
new file mode 100644
index 0000000..5b9e6ed
Binary files /dev/null and b/slides/06/mask_rcnn_architecture.png differ
diff --git a/slides/06/mask_rcnn_architecture.png.ref b/slides/06/mask_rcnn_architecture.png.ref
new file mode 100644
index 0000000..2d5bd13
--- /dev/null
+++ b/slides/06/mask_rcnn_architecture.png.ref
@@ -0,0 +1 @@
+Figure 1 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/mask_rcnn_heads.svgz b/slides/06/mask_rcnn_heads.svgz
new file mode 100644
index 0000000..f5c90b1
Binary files /dev/null and b/slides/06/mask_rcnn_heads.svgz differ
diff --git a/slides/06/mask_rcnn_heads.svgz.ref b/slides/06/mask_rcnn_heads.svgz.ref
new file mode 100644
index 0000000..5e303ff
--- /dev/null
+++ b/slides/06/mask_rcnn_heads.svgz.ref
@@ -0,0 +1 @@
+Figure 4 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/mask_rcnn_hpe_performance.svgz b/slides/06/mask_rcnn_hpe_performance.svgz
new file mode 100644
index 0000000..b79f401
Binary files /dev/null and b/slides/06/mask_rcnn_hpe_performance.svgz differ
diff --git a/slides/06/mask_rcnn_hpe_performance.svgz.ref b/slides/06/mask_rcnn_hpe_performance.svgz.ref
new file mode 100644
index 0000000..19c0665
--- /dev/null
+++ b/slides/06/mask_rcnn_hpe_performance.svgz.ref
@@ -0,0 +1 @@
+Table 4 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/mask_rcnn_roialign.svgz b/slides/06/mask_rcnn_roialign.svgz
new file mode 100644
index 0000000..0cefb39
Binary files /dev/null and b/slides/06/mask_rcnn_roialign.svgz differ
diff --git a/slides/06/mask_rcnn_roialign.svgz.ref b/slides/06/mask_rcnn_roialign.svgz.ref
new file mode 100644
index 0000000..b4070e5
--- /dev/null
+++ b/slides/06/mask_rcnn_roialign.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "Mask R-CNN", https://arxiv.org/abs/1703.06870
diff --git a/slides/06/normalizations.svgz b/slides/06/normalizations.svgz
new file mode 100644
index 0000000..6230387
Binary files /dev/null and b/slides/06/normalizations.svgz differ
diff --git a/slides/06/normalizations.svgz.ref b/slides/06/normalizations.svgz.ref
new file mode 100644
index 0000000..7b89167
--- /dev/null
+++ b/slides/06/normalizations.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Group Normalization", https://arxiv.org/abs/1803.08494
diff --git a/slides/06/object_localization.png b/slides/06/object_localization.png
new file mode 100644
index 0000000..a6d3c85
Binary files /dev/null and b/slides/06/object_localization.png differ
diff --git a/slides/06/object_localization.png.ref b/slides/06/object_localization.png.ref
new file mode 100644
index 0000000..b84eac5
--- /dev/null
+++ b/slides/06/object_localization.png.ref
@@ -0,0 +1 @@
+Slide 38 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/precision_recall_bottle.svgz b/slides/06/precision_recall_bottle.svgz
new file mode 100644
index 0000000..41de99d
Binary files /dev/null and b/slides/06/precision_recall_bottle.svgz differ
diff --git a/slides/06/precision_recall_bottle.svgz.ref b/slides/06/precision_recall_bottle.svgz.ref
new file mode 100644
index 0000000..5a828ee
--- /dev/null
+++ b/slides/06/precision_recall_bottle.svgz.ref
@@ -0,0 +1 @@
+Figure 6 of "The PASCAL Visual Object Classes (VOC) Challenge", http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf
diff --git a/slides/06/precision_recall_curve.png b/slides/06/precision_recall_curve.png
new file mode 100644
index 0000000..13f8fb9
Binary files /dev/null and b/slides/06/precision_recall_curve.png differ
diff --git a/slides/06/precision_recall_curve.png.ref b/slides/06/precision_recall_curve.png.ref
new file mode 100644
index 0000000..fc537f8
--- /dev/null
+++ b/slides/06/precision_recall_curve.png.ref
@@ -0,0 +1 @@
+https://miro.medium.com/max/1400/1*VenTq4IgxjmIpOXWdFb-jg.png
diff --git a/slides/06/precision_recall_curve_interpolated.jpg b/slides/06/precision_recall_curve_interpolated.jpg
new file mode 100644
index 0000000..817eae0
Binary files /dev/null and b/slides/06/precision_recall_curve_interpolated.jpg differ
diff --git a/slides/06/precision_recall_curve_interpolated.jpg.ref b/slides/06/precision_recall_curve_interpolated.jpg.ref
new file mode 100644
index 0000000..9a840d2
--- /dev/null
+++ b/slides/06/precision_recall_curve_interpolated.jpg.ref
@@ -0,0 +1 @@
+https://miro.medium.com/max/1400/1*pmSxeb4EfdGnzT6Xa68GEQ.jpeg
diff --git a/slides/06/precision_recall_person.svgz b/slides/06/precision_recall_person.svgz
new file mode 100644
index 0000000..808dd55
Binary files /dev/null and b/slides/06/precision_recall_person.svgz differ
diff --git a/slides/06/precision_recall_person.svgz.ref b/slides/06/precision_recall_person.svgz.ref
new file mode 100644
index 0000000..5a828ee
--- /dev/null
+++ b/slides/06/precision_recall_person.svgz.ref
@@ -0,0 +1 @@
+Figure 6 of "The PASCAL Visual Object Classes (VOC) Challenge", http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf
diff --git a/slides/06/pyramidnet_architecture.svgz b/slides/06/pyramidnet_architecture.svgz
new file mode 100644
index 0000000..d773f10
Binary files /dev/null and b/slides/06/pyramidnet_architecture.svgz differ
diff --git a/slides/06/pyramidnet_architecture.svgz.ref b/slides/06/pyramidnet_architecture.svgz.ref
new file mode 100644
index 0000000..321784e
--- /dev/null
+++ b/slides/06/pyramidnet_architecture.svgz.ref
@@ -0,0 +1 @@
+Table 1 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/pyramidnet_blocks.svgz b/slides/06/pyramidnet_blocks.svgz
new file mode 100644
index 0000000..077785f
Binary files /dev/null and b/slides/06/pyramidnet_blocks.svgz differ
diff --git a/slides/06/pyramidnet_blocks.svgz.ref b/slides/06/pyramidnet_blocks.svgz.ref
new file mode 100644
index 0000000..2fde23d
--- /dev/null
+++ b/slides/06/pyramidnet_blocks.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/pyramidnet_cifar.svgz b/slides/06/pyramidnet_cifar.svgz
new file mode 100644
index 0000000..4f2b985
Binary files /dev/null and b/slides/06/pyramidnet_cifar.svgz differ
diff --git a/slides/06/pyramidnet_cifar.svgz.ref b/slides/06/pyramidnet_cifar.svgz.ref
new file mode 100644
index 0000000..bc183f0
--- /dev/null
+++ b/slides/06/pyramidnet_cifar.svgz.ref
@@ -0,0 +1 @@
+Table 4 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/pyramidnet_growth_rate.svgz b/slides/06/pyramidnet_growth_rate.svgz
new file mode 100644
index 0000000..5474788
Binary files /dev/null and b/slides/06/pyramidnet_growth_rate.svgz differ
diff --git a/slides/06/pyramidnet_growth_rate.svgz.ref b/slides/06/pyramidnet_growth_rate.svgz.ref
new file mode 100644
index 0000000..12ee550
--- /dev/null
+++ b/slides/06/pyramidnet_growth_rate.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/pyramidnet_residuals.svgz b/slides/06/pyramidnet_residuals.svgz
new file mode 100644
index 0000000..c4290c1
Binary files /dev/null and b/slides/06/pyramidnet_residuals.svgz differ
diff --git a/slides/06/pyramidnet_residuals.svgz.ref b/slides/06/pyramidnet_residuals.svgz.ref
new file mode 100644
index 0000000..b53108d
--- /dev/null
+++ b/slides/06/pyramidnet_residuals.svgz.ref
@@ -0,0 +1 @@
+Figure 5 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915
diff --git a/slides/06/rcnn_architecture.svgz b/slides/06/rcnn_architecture.svgz
new file mode 100644
index 0000000..0a7cf0e
Binary files /dev/null and b/slides/06/rcnn_architecture.svgz differ
diff --git a/slides/06/rcnn_architecture.svgz.ref b/slides/06/rcnn_architecture.svgz.ref
new file mode 100644
index 0000000..1a20f30
--- /dev/null
+++ b/slides/06/rcnn_architecture.svgz.ref
@@ -0,0 +1 @@
+Slide 54 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/retinanet.svgz b/slides/06/retinanet.svgz
new file mode 100644
index 0000000..60fe4c1
Binary files /dev/null and b/slides/06/retinanet.svgz differ
diff --git a/slides/06/retinanet.svgz.ref b/slides/06/retinanet.svgz.ref
new file mode 100644
index 0000000..aab04d0
--- /dev/null
+++ b/slides/06/retinanet.svgz.ref
@@ -0,0 +1 @@
+Figure 3 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/retinanet_ablations.svgz b/slides/06/retinanet_ablations.svgz
new file mode 100644
index 0000000..aec5956
Binary files /dev/null and b/slides/06/retinanet_ablations.svgz differ
diff --git a/slides/06/retinanet_ablations.svgz.ref b/slides/06/retinanet_ablations.svgz.ref
new file mode 100644
index 0000000..1e51d14
--- /dev/null
+++ b/slides/06/retinanet_ablations.svgz.ref
@@ -0,0 +1 @@
+Table 1 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/retinanet_graph.svgz b/slides/06/retinanet_graph.svgz
new file mode 100644
index 0000000..299a928
Binary files /dev/null and b/slides/06/retinanet_graph.svgz differ
diff --git a/slides/06/retinanet_graph.svgz.ref b/slides/06/retinanet_graph.svgz.ref
new file mode 100644
index 0000000..b54356d
--- /dev/null
+++ b/slides/06/retinanet_graph.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/retinanet_results.svgz b/slides/06/retinanet_results.svgz
new file mode 100644
index 0000000..80a5c4d
Binary files /dev/null and b/slides/06/retinanet_results.svgz differ
diff --git a/slides/06/retinanet_results.svgz.ref b/slides/06/retinanet_results.svgz.ref
new file mode 100644
index 0000000..38a2dcf
--- /dev/null
+++ b/slides/06/retinanet_results.svgz.ref
@@ -0,0 +1 @@
+Table 2 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002
diff --git a/slides/06/roi_generation.jpg b/slides/06/roi_generation.jpg
new file mode 100644
index 0000000..18f7350
Binary files /dev/null and b/slides/06/roi_generation.jpg differ
diff --git a/slides/06/roi_generation.jpg.ref b/slides/06/roi_generation.jpg.ref
new file mode 100644
index 0000000..fbb2b02
--- /dev/null
+++ b/slides/06/roi_generation.jpg.ref
@@ -0,0 +1 @@
+Slide 48 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/06/roi_pooling.svgz b/slides/06/roi_pooling.svgz
new file mode 100644
index 0000000..b5d6c0d
Binary files /dev/null and b/slides/06/roi_pooling.svgz differ
diff --git a/slides/06/roi_pooling.svgz.ref b/slides/06/roi_pooling.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/06/roi_projection.svgz b/slides/06/roi_projection.svgz
new file mode 100644
index 0000000..a6aee2e
Binary files /dev/null and b/slides/06/roi_projection.svgz differ
diff --git a/slides/06/roi_projection.svgz.ref b/slides/06/roi_projection.svgz.ref
new file mode 100644
index 0000000..1cc5acc
--- /dev/null
+++ b/slides/06/roi_projection.svgz.ref
@@ -0,0 +1 @@
+Slide 65 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf.
diff --git a/slides/08/08.md b/slides/08/08.md
new file mode 100644
index 0000000..e8a103e
--- /dev/null
+++ b/slides/08/08.md
@@ -0,0 +1,594 @@
+title: NPFL138, Lecture 8
+class: title, langtech, cc-by-sa
+style: .algorithm { background-color: #eee; padding: .5em }
+
+# Recurrent Neural Networks
+
+## Milan Straka
+
+### April 8, 2024
+
+---
+section: RNN
+class: middle, center
+# Recurrent Neural Networks
+
+# Recurrent Neural Networks
+
+---
+# Recurrent Neural Networks
+
+## Single RNN cell
+
+![w=17%,h=center](rnn_cell.svgz)
+
+~~~
+
+## Unrolled RNN cells
+
+![w=60%,h=center](rnn_cell_unrolled.svgz)
+
+---
+# Basic RNN Cell
+
+![w=100%,h=center,mw=50%](rnn_cell_basic.svgz)![w=50%,h=center,mw=50%](rnn_cell_basic_as_cell.svgz)
+
+Given an input $→x^{(t)}$ and previous state $→h^{(t-1)}$, the new state is computed as
+$$→h^{(t)} = f(→h^{(t-1)}, →x^{(t)}; →θ).$$
+
+~~~
+One of the simplest possibilities (called `SimpleRNN` in Keras, `RNN` in PyTorch) is
+$$→h^{(t)} = \tanh(⇉U→h^{(t-1)} + ⇉V→x^{(t)} + →b).$$
+
+---
+# Basic RNN Cell
+
+Basic RNN cells suffer a lot from vanishing/exploding gradients (the so-called
+**challenge of long-term dependencies**).
+
+~~~
+If we simplify the recurrence of states to just a linear approximation
+$$→h^{(t)} ≈ ⇉U→h^{(t-1)},$$
+
+~~~
+we get $→h^{(t)} ≈ ⇉U^t→h^{(0)}$.
+
+~~~
+If $⇉U$ has an eigenvalue decomposition of $⇉U = ⇉Q ⇉Λ ⇉Q^{-1}$, we get that
+$$→h^{(t)} ≈ ⇉Q ⇉Λ^t ⇉Q^{-1} →h^{(0)}.$$
+The main problem is that the _same_ function is iteratively applied many times.
+
+~~~
+Several more complex RNN cell variants have been proposed, which alleviate
+this issue to some degree, namely **LSTM** and **GRU**.
+
+---
+section: LSTM
+# Long Short-Term Memory
+
+Hochreiter & Schmidhuber (1997) suggested that to enforce
+_constant error flow_, we would like
+$$f' = →1.$$
+
+~~~
+They propose to achieve that by a _constant error carrousel_.
+
+![w=60%,h=center](lstm_cec_idea.svgz)
+
+~~~ ~~
+They propose to achieve that by a _constant error carrousel_.
+
+![w=60%,h=center](lstm_cec.svgz)
+
+---
+# Long Short-Term Memory
+
+They also propose an **input** and **output** gates which control the flow
+of information into and out of the carrousel (**memory cell** $→c_t$).
+
+![w=40%,f=right](lstm_input_output_gates.svgz)
+
+$$\begin{aligned}
+  \textcolor{blue}     {→i_t} & ← σ(⇉W^i →x_t + ⇉V^i →h_{t-1} + →b^i) \\
+  \textcolor{darkgreen}{→o_t} & ← σ(⇉W^o →x_t + ⇉V^o →h_{t-1} + →b^o) \\
+  \textcolor{magenta}  {→c_t} & ← →c_{t-1} + →i_t ⊙ \tanh(⇉W^y →x_t + ⇉V^y →h_{t-1} + →b^y) \\
+  \textcolor{red}      {→h_t} & ← →o_t ⊙ \tanh(→c_t)
+\end{aligned}$$
+
+---
+# Long Short-Term Memory
+
+Later, Gers, Schmidhuber & Cummins (1999) added a possibility to **forget**
+information from memory cell $→c_t$.
+
+![w=40%,f=right](lstm_input_output_forget_gates.svgz)
+
+$$\begin{aligned}
+  \textcolor{blue}      {→i_t} & ← σ(⇉W^i →x_t + ⇉V^i →h_{t-1} + →b^i) \\
+  \textcolor{darkorange}{→f_t} & ← σ(⇉W^f →x_t + ⇉V^f →h_{t-1} + →b^f) \\
+  \textcolor{darkgreen} {→o_t} & ← σ(⇉W^o →x_t + ⇉V^o →h_{t-1} + →b^o) \\
+  \textcolor{magenta}   {→c_t} & ← →f_t ⊙ →c_{t-1} + →i_t ⊙ \tanh(⇉W^y →x_t + ⇉V^y →h_{t-1} + →b^y) \\
+  \textcolor{red}       {→h_t} & ← →o_t ⊙ \tanh(→c_t)
+\end{aligned}$$
+
+~~~
+Note that since 2015, following the paper
+- R. Jozefowicz et al.: _An Empirical Exploration of Recurrent Network Architectures_
+
+the forget gate bias $→b^f$ is usually initialized to 1, so that the forget gate is closer
+to 1 and the gradients can easily flow through multiple timesteps.
+~~~
+(Gers et al. advocated this in the original paper already.)
+~~~
+(BTW, I think 3 might be even better, as $σ(1) ≈ 0.731$, $σ(3) ≈ 0.953$.)
+
+---
+# Long Short-Term Memory
+![w=100%,v=middle](LSTM3-SimpleRNN.png)
+
+---
+# Long Short-Term Memory
+![w=100%,v=middle](LSTM3-chain.png)
+
+---
+# Long Short-Term Memory
+![w=100%,v=middle](LSTM3-C-line.png)
+
+---
+# Long Short-Term Memory
+![w=100%,v=middle](LSTM3-focus-i.png)
+
+---
+# Long Short-Term Memory
+![w=100%,v=middle](LSTM3-focus-f.png)
+
+---
+# Long Short-Term Memory
+![w=100%,v=middle](LSTM3-focus-C.png)
+
+---
+# Long Short-Term Memory
+![w=100%,v=middle](LSTM3-focus-o.png)
+
+---
+section: GRU
+# Gated Recurrent Unit
+
+**Gated recurrent unit (GRU)** was proposed by Cho et al. (2014) as
+a simplification of LSTM. The main differences are
+
+![w=45%,f=right](gru.svgz)
+
+- no memory cell,
+- forgetting and updating tied together.
+
+~~~
+$$\begin{aligned}
+  \textcolor{blue}     {→r_t} & ← σ(⇉W^r →x_t + ⇉V^r →h_{t-1} + →b^r) \\
+  \textcolor{darkgreen}{→u_t} & ← σ(⇉W^u →x_t + ⇉V^u →h_{t-1} + →b^u) \\
+  \textcolor{magenta}  {→ĥ_t} & ← \tanh(⇉W^h →x_t + ⇉V^h (→r_t ⊙ →h_{t-1}) + →b^h) \\
+  \textcolor{red}      {→h_t} & ← →u_t ⊙ →h_{t-1} + (1 - →u_t) ⊙ →ĥ_t
+\end{aligned}$$
+
+---
+# Gated Recurrent Unit
+![w=100%,v=middle](LSTM3-var-GRU.png)
+
+---
+# GRU and LSTM Differences
+
+The main differences between GRU and LSTM:
+~~~
+- GRU uses fewer parameters and less computation.
+
+  - six matrices $⇉W$, $⇉V$ instead of eight
+~~~
+- GRU are easier to work with, because the state is just one tensor, while it is
+  a pair of tensors for LSTM.
+~~~
+- In most tasks, LSTM and GRU give very similar results.
+~~~
+- However, there are some tasks, on which LSTM achieves (much) better results
+  than GRU.
+~~~
+  - For a demonstration of difference in the expressive power of LSTM and GRU
+    (caused by the coupling of the forget and update gate), see the paper
+    - G. Weiss et al.: _On the Practical Computational Power of Finite Precision
+      RNNs for Language Recognition_ https://arxiv.org/abs/1805.04908
+~~~
+  - For a difference between LSTM and GRU on a real-word task, see for example
+    - T. Dozat et al.: _Deep Biaffine Attention for Neural Dependency Parsing_
+      https://arxiv.org/abs/1611.01734
+
+---
+# SimpleRNN, GRU, and LSTM Initialization
+
+Recall that when we approximate $→h^{(t)} ≈ ⇉U→h^{(t-1)}$,
+assuming the eigenvalue decomposition of $⇉U = ⇉Q ⇉Λ ⇉Q^{-1}$, we get
+$$→h^{(t)} ≈ ⇉Q ⇉Λ^t ⇉Q^{-1} →h^{(0)}.$$
+
+~~~
+This motivated a specific initialization scheme for the $⇉U$ matrix –
+this so-called **recurrent kernel** (the concatenation of all the $⇉V^i$,
+$⇉V^f$, $⇉V^o$, $⇉V^y$ matrices) is initialized with a randomly generated
+orthogonal matrix.
+
+~~~
+This **orthogonal** initialization is used for all RNN cells in Keras
+(via the `recurrent_initializer='orthogonal'` parameter of `SimpleRNN`, `GRU`,
+and `LSTM`).
+
+---
+section: HighwayNetworks
+class: middle, center
+# Highway Networks
+
+# Highway Networks
+
+---
+# Highway Networks
+
+For input $→x$, fully connected layer computes
+$$→y ← H(→x, ⇉W_H).$$
+
+~~~
+Highway networks add residual connection with gating:
+$$→y ← H(→x, ⇉W_H) ⊙ T(→x, ⇉W_T) + →x ⊙ (1 - T(→x, ⇉W_T)).$$
+
+~~~
+Usually, the gating is defined as
+$$T(→x, ⇉W_T) ← σ(⇉W_T →x + →b_T).$$
+
+~~~
+Note that the resulting update is very similar to a GRU cell with $→h_t$ removed; for a
+fully connected layer $H(→x, ⇉W_H) = \tanh(⇉W_H →x + →b_H)$ it is exactly it,
+apart from copying $→x$ instead of $→h_{t-1}$.
+
+~~~
+Analogously to LSTM, the transform gate bias $→b_T$ should be initialized to
+a negative number.
+
+---
+# Highway Networks on MNIST
+
+![w=100%](highway_training.svgz)
+
+---
+# Highway Networks
+
+![w=90%,h=center](highway_activations.jpg)
+
+---
+# Highway Networks
+
+![w=95%,h=center](highway_leisoning.svgz)
+
+---
+section: RNNRegularization
+# Regularizing RNNs
+
+## Dropout
+
+- Using dropout on hidden states interferes with long-term dependencies.
+
+~~~
+
+- However, using dropout on the inputs and outputs works well and is used
+frequently.
+~~~
+    - In case residual connections are present, the output dropout needs to be
+      applied before adding the residual connection.
+
+~~~
+- Several techniques were designed to allow using dropout on hidden states.
+    - Variational Dropout
+    - Recurrent Dropout
+    - Zoneout
+
+---
+# Regularizing RNNs
+
+## Variational Dropout
+
+![w=75%,h=center](variational_rnn.svgz)
+
+~~~
+To implement variational dropout on inputs in Keras, use `noise_shape` of
+`keras.layers.Dropout` to force the same mask across time-steps.
+The variational dropout on the hidden states can be implemented using
+`recurrent_dropout` argument of `keras.layers.{LSTM,GRU,SimpleRNN}{,Cell}`.
+
+---
+# Regularizing RNNs
+
+## Recurrent Dropout
+
+Dropout only candidate states (i.e., values added to the memory cell in LSTM and
+previous state in GRU), independently in every time-step.
+
+~~~
+## Zoneout
+
+Randomly preserve hidden activations instead of dropping them.
+
+~~~
+## Batch Normalization
+
+![w=42%,f=right](recurrent_batch_normalization.svgz)
+
+Very fragile and sensitive to proper initialization – there were papers with
+negative results (_Dario Amodei et al, 2015: Deep Speech 2_ or _Cesar Laurent et al,
+2016: Batch Normalized Recurrent Neural Networks_) until people managed to make
+it work (_Tim Cooijmans et al, 2016: Recurrent Batch Normalization_;
+specifically, initializing $γ=0.1$ did the trick).
+
+---
+# Regularizing RNNs
+
+## Batch Normalization
+
+Neuron value is normalized across the minibatch, and in case of CNN also across
+all positions.
+
+~~~
+## Layer Normalization
+
+Neuron value is normalized across the layer.
+
+~~~
+![w=100%](../06/normalizations.svgz)
+
+---
+# Layer Normalization
+
+Consider a hidden value $→x ∈ ℝ^D$. Layer normalization (both during training and
+during inference) is performed as follows.
+
+<div class="algorithm">
+
+**Inputs**: An example $→x ∈ ℝ^D$, $ε ∈ ℝ$ with default value 0.001<br>
+**Parameters**: $→β ∈ ℝ^D$ initialized to $→0$, $→γ ∈ ℝ^D$ initialized to $→1$<br>
+**Outputs**: Normalized example $→y$
+
+~~~
+- $μ ← \frac{1}{D} ∑_{i = 1}^D x_i$
+
+~~~
+- $σ^2 ← \frac{1}{D} ∑_{i = 1}^D (x_i - μ)^2$
+~~~
+- $→x̂ ← (→x - μ) / \sqrt{σ^2 + ε}$
+~~~
+- $→y ← →γ ⊙ →x̂ + →β$
+</div>
+
+---
+# Regularizing RNNs
+
+## Layer Normalization
+
+Much more stable than batch normalization for RNN regularization.
+
+![w=70%,h=center](layer_norm.svgz)
+
+~~~
+![w=85%,h=center](layer_norm_properties.svgz)
+
+---
+# Layer Normalization
+
+In an important recent architecture (namely Transformer), many fully
+connected layers are used, with a residual connection and a layer normalization.
+
+![w=85%,h=center](layer_norm_residual.svgz)
+
+~~~
+This could be considered an alternative to highway networks, i.e., a suitable
+residual connection for fully connected layers.
+~~~
+Note the architecture can be considered as a variant of a mobile inverted
+bottleneck $1×1$ convolution block.
+
+---
+section: RNNArchitectures
+# Basic RNN Architectures and Tasks
+
+## Sequence Element Representation
+
+Create output for individual elements, for example for classification of the
+individual elements.
+
+![w=70%,h=center](rnn_cell_unrolled.svgz)
+
+~~~
+## Sequence Representation
+
+Generate a single output for the whole sequence (either the last output or the
+last state).
+
+---
+# Basic RNN Architectures and Tasks
+
+## Sequence Prediction
+
+During training, predict next sequence element.
+
+![w=75%,h=center](sequence_prediction_training.svgz)
+
+~~~
+During inference, use predicted elements as further inputs.
+
+![w=75%,h=center](sequence_prediction_inference.svgz)
+
+---
+# Multilayer RNNs
+
+We might stack several layers of recurrent neural networks. Usually using two or
+three layers gives better results than just one.
+
+![w=75%,h=center](multilayer_rnn.svgz)
+
+---
+# Multilayer RNNs
+
+In case of multiple layers, residual connections usually improve results.
+Because dimensionality has to be the same, they are usually applied from the
+second layer.
+
+![w=75%,h=center](multilayer_rnn_residual.svgz)
+
+---
+# Bidirectional RNN
+
+To consider both the left and right contexts, a **bidirectional** RNN can be used,
+which consists of parallel application of a **forward** RNN and a **backward** RNN.
+
+![w=80%,h=center](bidirectional_rnn.svgz)
+
+~~~
+The outputs of both directions can be either **added** or **concatenated**. Even
+if adding them does not seem very intuitive, it does not increase
+dimensionality and therefore allows residual connections to be used in case
+of multilayer bidirectional RNN.
+
+---
+section: WE
+# Word Embeddings
+
+We might represent **words** using one-hot encoding, considering all words to be
+independent of each other.
+
+~~~
+However, words are not independent – some are more similar than others.
+
+~~~
+Ideally, we would like some kind of similarity in the space of the word
+representations.
+
+~~~
+## Distributed Representation
+The idea behind distributed representation is that objects can
+be represented using a set of common underlying factors.
+
+~~~
+We therefore represent words as fixed-size **embeddings** into $ℝ^d$ space,
+with the vector elements playing role of the common underlying factors.
+
+~~~
+These embeddings are initialized randomly and trained together with the rest of
+the network.
+
+---
+# Word Embeddings
+
+The word embedding layer is in fact just a fully connected layer on top of
+one-hot encoding. However, it is not implemented in that way.
+
+~~~
+Instead, the so-called **embedding** layer is used, which is much more efficient.
+When a matrix is multiplied by an one-hot encoded vector (all but one zeros
+and exactly one 1), the row corresponding to that 1 is selected, so the
+embedding layer can be implemented only as a simple lookup.
+
+~~~
+In Keras, the embedding layer is available as
+```python
+keras.layers.Embedding(input_dim, output_dim)
+```
+
+~~~
+In PyTorch, it is available as
+```python
+torch.nn.Embedding(input_dim, output_dim)
+```
+
+---
+# Word Embeddings
+
+Even if the embedding layer is just a fully connected layer on top of one-hot
+encoding, it is important that this layer is _shared_ across
+the whole network.
+
+~~~
+![w=37.5%](words_onehot.svgz)
+~~~
+![w=60.5%](words_embeddings.svgz)
+
+---
+section: CLE
+# Word Embeddings for Unknown Words
+
+![w=42%,f=right](cle_rnn.svgz)
+
+## Recurrent Character-level WEs
+
+In order to handle words not seen during training, we could find a way
+to generate a representation from the word **characters**.
+
+~~~
+A possible way to compose the representation from individual characters
+is to use RNNs – we embed _characters_ to get character representation,
+and then use an RNN to produce the representation of a whole _sequence of
+characters_.
+
+~~~
+Usually, both forward and backward directions are used, and the resulting
+representations are concatenated/added.
+
+---
+# Word Embeddings for Unknown Words
+
+## Convolutional Character-level WEs
+
+![w=32%,f=right](cle_cnn.png)
+
+Alternatively, 1D convolutions might be used.
+
+~~~
+Assume we use a 1D convolution with kernel size 3. It produces a representation
+for every input word trigram, but we need a representation of the whole word.
+To that end, we use _global max-pooling_ – using it has an interpretable
+meaning, where the kernel is a _pattern_ and the activation after the maximum
+is a level of a highest match of the pattern anywhere in the word.
+
+~~~
+Kernels of varying sizes are usually used (because it makes sense to have
+patterns for unigrams, bigrams, trigrams, …) – for example, 25 filters for every
+kernel size $(1, 2, 3, 4, 5)$ might be used.
+
+~~~
+Lastly, authors employed a highway layer after the convolutions, improving
+the results (compared to not using any layer or using a fully connected one).
+
+---
+# Examples of Recurrent Character-level WEs
+
+![w=80%,h=center](cle_rnn_examples.svgz)
+
+---
+# Examples of Convolutional Character-level WEs
+
+![w=100%](cle_cnn_examples.svgz)
+
+---
+# Character-level WE Implementation
+
+## Training
+
+- Generate unique words per batch.
+
+~~~
+- Process the unique words in the batch.
+
+~~~
+- Copy the resulting embeddings suitably in the batch.
+
+~~~
+## Inference
+
+- We can cache character-level word embeddings during inference.
+
+---
+# NLP Processing with CLEs
+
+![w=100%,v=middle](cle_rnn_gru.png)
+
diff --git a/slides/08/LSTM3-C-line.png b/slides/08/LSTM3-C-line.png
new file mode 100644
index 0000000..ce79157
Binary files /dev/null and b/slides/08/LSTM3-C-line.png differ
diff --git a/slides/08/LSTM3-C-line.png.ref b/slides/08/LSTM3-C-line.png.ref
new file mode 100644
index 0000000..bdccd03
--- /dev/null
+++ b/slides/08/LSTM3-C-line.png.ref
@@ -0,0 +1 @@
+http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-C-line.png
diff --git a/slides/08/LSTM3-SimpleRNN.png b/slides/08/LSTM3-SimpleRNN.png
new file mode 100644
index 0000000..9472592
Binary files /dev/null and b/slides/08/LSTM3-SimpleRNN.png differ
diff --git a/slides/08/LSTM3-SimpleRNN.png.ref b/slides/08/LSTM3-SimpleRNN.png.ref
new file mode 100644
index 0000000..5f038dd
--- /dev/null
+++ b/slides/08/LSTM3-SimpleRNN.png.ref
@@ -0,0 +1 @@
+http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png
diff --git a/slides/08/LSTM3-chain.png b/slides/08/LSTM3-chain.png
new file mode 100644
index 0000000..e962a3c
Binary files /dev/null and b/slides/08/LSTM3-chain.png differ
diff --git a/slides/08/LSTM3-chain.png.ref b/slides/08/LSTM3-chain.png.ref
new file mode 100644
index 0000000..5dc69bd
--- /dev/null
+++ b/slides/08/LSTM3-chain.png.ref
@@ -0,0 +1 @@
+http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png
diff --git a/slides/08/LSTM3-focus-C.png b/slides/08/LSTM3-focus-C.png
new file mode 100644
index 0000000..7fc49f5
Binary files /dev/null and b/slides/08/LSTM3-focus-C.png differ
diff --git a/slides/08/LSTM3-focus-C.png.ref b/slides/08/LSTM3-focus-C.png.ref
new file mode 100644
index 0000000..a32d12a
--- /dev/null
+++ b/slides/08/LSTM3-focus-C.png.ref
@@ -0,0 +1 @@
+http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-C.png
diff --git a/slides/08/LSTM3-focus-f.png b/slides/08/LSTM3-focus-f.png
new file mode 100644
index 0000000..5808675
Binary files /dev/null and b/slides/08/LSTM3-focus-f.png differ
diff --git a/slides/08/LSTM3-focus-f.png.ref b/slides/08/LSTM3-focus-f.png.ref
new file mode 100644
index 0000000..827d9b7
--- /dev/null
+++ b/slides/08/LSTM3-focus-f.png.ref
@@ -0,0 +1 @@
+http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-f.png
diff --git a/slides/08/LSTM3-focus-i.png b/slides/08/LSTM3-focus-i.png
new file mode 100644
index 0000000..d3d82f0
Binary files /dev/null and b/slides/08/LSTM3-focus-i.png differ
diff --git a/slides/08/LSTM3-focus-i.png.ref b/slides/08/LSTM3-focus-i.png.ref
new file mode 100644
index 0000000..3f83f87
--- /dev/null
+++ b/slides/08/LSTM3-focus-i.png.ref
@@ -0,0 +1 @@
+http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-i.png
diff --git a/slides/08/LSTM3-focus-o.png b/slides/08/LSTM3-focus-o.png
new file mode 100644
index 0000000..40fc56b
Binary files /dev/null and b/slides/08/LSTM3-focus-o.png differ
diff --git a/slides/08/LSTM3-focus-o.png.ref b/slides/08/LSTM3-focus-o.png.ref
new file mode 100644
index 0000000..d9ad766
--- /dev/null
+++ b/slides/08/LSTM3-focus-o.png.ref
@@ -0,0 +1 @@
+http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-o.png
diff --git a/slides/08/LSTM3-var-GRU.png b/slides/08/LSTM3-var-GRU.png
new file mode 100644
index 0000000..6838a20
Binary files /dev/null and b/slides/08/LSTM3-var-GRU.png differ
diff --git a/slides/08/LSTM3-var-GRU.png.ref b/slides/08/LSTM3-var-GRU.png.ref
new file mode 100644
index 0000000..985df8d
--- /dev/null
+++ b/slides/08/LSTM3-var-GRU.png.ref
@@ -0,0 +1 @@
+http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-var-GRU.png
diff --git a/slides/08/bidirectional_rnn.ipe b/slides/08/bidirectional_rnn.ipe
new file mode 100644
index 0000000..098dfa2
--- /dev/null
+++ b/slides/08/bidirectional_rnn.ipe
@@ -0,0 +1,456 @@
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20180423065931" modified="D:20200413184300"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path stroke="navy" arrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 64 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 64 0" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 0" stroke="navy" arrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 128 0" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="navy" arrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 192 0" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="navy" arrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 256 0" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="navy" arrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 320 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 320 0" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 0" stroke="navy" arrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 0 -48" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 0 -48" stroke="darkcyan" rarrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 64 -48" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 64 -48" stroke="darkcyan" rarrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 128 -48" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 128 -48" stroke="darkcyan" rarrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 192 -48" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 192 -48" stroke="darkcyan" rarrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 256 -48" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 256 -48" stroke="darkcyan" rarrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 320 -48" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 320 -48" stroke="darkcyan" rarrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 0 -112" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 -112" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 -112" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 -112" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 -112" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 -112" stroke="black" arrow="normal/normal">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 384 0" stroke="navy" arrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 384 -48" stroke="darkcyan" rarrow="normal/normal">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 0 -48" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 160 0" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="1 0 0 1 64 -48" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 224 0" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="1 0 0 1 128 -48" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 288 0" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="1 0 0 1 192 -48" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 352 0" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="1 0 0 1 256 -48" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 416 0" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="1 0 0 1 320 -48" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 480 0" stroke="black" arrow="normal/normal">
+80 800 m
+32 768
+80 736 c
+</path>
+</page>
+</ipe>
diff --git a/slides/08/bidirectional_rnn.svgz b/slides/08/bidirectional_rnn.svgz
new file mode 100644
index 0000000..053e516
Binary files /dev/null and b/slides/08/bidirectional_rnn.svgz differ
diff --git a/slides/08/bidirectional_rnn.svgz.ref b/slides/08/bidirectional_rnn.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/cle_cnn.png b/slides/08/cle_cnn.png
new file mode 100644
index 0000000..286a13d
Binary files /dev/null and b/slides/08/cle_cnn.png differ
diff --git a/slides/08/cle_cnn.png.ref b/slides/08/cle_cnn.png.ref
new file mode 100644
index 0000000..833534b
--- /dev/null
+++ b/slides/08/cle_cnn.png.ref
@@ -0,0 +1 @@
+Figure 1 of "Character-Aware Neural Language Models", https://arxiv.org/abs/1508.06615
diff --git a/slides/08/cle_cnn_examples.svgz b/slides/08/cle_cnn_examples.svgz
new file mode 100644
index 0000000..3cd8541
Binary files /dev/null and b/slides/08/cle_cnn_examples.svgz differ
diff --git a/slides/08/cle_cnn_examples.svgz.ref b/slides/08/cle_cnn_examples.svgz.ref
new file mode 100644
index 0000000..5012fbf
--- /dev/null
+++ b/slides/08/cle_cnn_examples.svgz.ref
@@ -0,0 +1 @@
+Table 6 of "Character-Aware Neural Language Models", https://arxiv.org/abs/1508.06615
diff --git a/slides/08/cle_rnn.svgz b/slides/08/cle_rnn.svgz
new file mode 100644
index 0000000..46a7cfd
Binary files /dev/null and b/slides/08/cle_rnn.svgz differ
diff --git a/slides/08/cle_rnn.svgz.ref b/slides/08/cle_rnn.svgz.ref
new file mode 100644
index 0000000..ee616f0
--- /dev/null
+++ b/slides/08/cle_rnn.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation", https://arxiv.org/abs/1508.02096
diff --git a/slides/08/cle_rnn_examples.svgz b/slides/08/cle_rnn_examples.svgz
new file mode 100644
index 0000000..cc3e0e7
Binary files /dev/null and b/slides/08/cle_rnn_examples.svgz differ
diff --git a/slides/08/cle_rnn_examples.svgz.ref b/slides/08/cle_rnn_examples.svgz.ref
new file mode 100644
index 0000000..9d722a3
--- /dev/null
+++ b/slides/08/cle_rnn_examples.svgz.ref
@@ -0,0 +1 @@
+Table 2 of "Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation", https://arxiv.org/abs/1508.02096
diff --git a/slides/08/cle_rnn_gru.png b/slides/08/cle_rnn_gru.png
new file mode 100644
index 0000000..bd87286
Binary files /dev/null and b/slides/08/cle_rnn_gru.png differ
diff --git a/slides/08/cle_rnn_gru.png.ref b/slides/08/cle_rnn_gru.png.ref
new file mode 100644
index 0000000..a45cbc4
--- /dev/null
+++ b/slides/08/cle_rnn_gru.png.ref
@@ -0,0 +1 @@
+Figure 1 of "Multi-Task Cross-Lingual Sequence Tagging from Scratch", https://arxiv.org/abs/1603.06270
diff --git a/slides/08/gru.ipe b/slides/08/gru.ipe
new file mode 100644
index 0000000..c443153
--- /dev/null
+++ b/slides/08/gru.ipe
@@ -0,0 +1,412 @@
+<?xml version="1.0"?>
+<!DOCTYPE ipe SYSTEM "ipe.dtd">
+<ipe version="70218" creator="Ipe 7.2.26">
+<info created="D:20180416065930" modified="D:20240415230010"/>
+<preamble>\usepackage{bm}</preamble>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<anglesize name="22.5 deg" value="22.5"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="90 deg" value="90"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="blue" value="0 0 1"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="gray" value="0.745"/>
+<color name="green" value="0 1 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="red" value="1 0 0"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="yellow" value="1 1 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<pen name="fat" value="1.2"/>
+<pen name="heavier" value="0.8"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="black">
+96 768 m
+96 624 l
+352 624 l
+352 768 l
+h
+</path>
+<path stroke="black" arrow="normal/small">
+96 672 m
+252 672 l
+</path>
+<path matrix="1 0 0 1 80 12" stroke="black">
+160 644 m
+160 628 l
+192 628 l
+192 644 l
+h
+</path>
+<text matrix="1 0 0 1 129.745 -14.3465" transformations="translations" pos="124 660" stroke="black" type="label" width="6.05" height="4.289" depth="0" valign="baseline" style="math">\sigma</text>
+<path stroke="black" arrow="normal/small">
+256 624 m
+256 640 l
+</path>
+<path matrix="1 0 0 1 80 0" stroke="black" arrow="normal/small">
+160 592 m
+176 624 l
+</path>
+<path matrix="1 0 0 1 80 0" stroke="black" arrow="normal/small">
+192 592 m
+176 624 l
+</path>
+<text matrix="1 0 0 1 156 -72" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<text matrix="1 0 0 1 196 -40" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path stroke="darkgreen" arrow="normal/small">
+256 656 m
+256 668 l
+</path>
+<path matrix="1 0 0 1 8 0" stroke="black" arrow="normal/small">
+56 704 m
+88 704 l
+</path>
+<text matrix="1 0 0 1 -4 84" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path stroke="black" arrow="normal/small">
+96 720 m
+172 720 l
+</path>
+<path matrix="1 0 0 1 0 108" stroke="black">
+160 644 m
+160 628 l
+192 628 l
+192 644 l
+h
+</path>
+<text matrix="1 0 0 1 49.7453 81.6535" transformations="translations" pos="124 660" stroke="black" type="label" width="6.05" height="4.289" depth="0" valign="baseline" style="math">\sigma</text>
+<path stroke="black" arrow="normal/small">
+176 768 m
+176 752 l
+</path>
+<path matrix="1 0 0 -1 0 1392" stroke="black" arrow="normal/small">
+160 592 m
+176 624 l
+</path>
+<path matrix="1 0 0 -1 0 1392" stroke="black" arrow="normal/small">
+192 592 m
+176 624 l
+</path>
+<text matrix="1 0 0 1 116 156" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path stroke="blue" arrow="normal/small">
+176 736 m
+176 724 l
+</path>
+<text matrix="1 0 0 1 76 124" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<path matrix="1 0 0 1 8 16" stroke="black" arrow="normal/small">
+56 704 m
+88 704 l
+</path>
+<text matrix="1 0 0 1 -4 68" transformations="translations" pos="76 640" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<path matrix="1 0 0 1 80 56" stroke="black">
+116 664 m
+116 648 l
+152 648 l
+152 664 l
+h
+</path>
+<text matrix="1 0 0 1 80.41 49.2755" transformations="translations" pos="124 660" stroke="black" type="label" width="19.925" height="6.918" depth="0" valign="baseline" style="math">\tanh</text>
+<path stroke="black" arrow="normal/small">
+96 704 m
+180 704 l
+196 712 l
+</path>
+<path matrix="1 0 0 1 8 -32" stroke="black" arrow="normal/small">
+56 704 m
+88 704 l
+</path>
+<text matrix="1 0 0 1 -4 36" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<text matrix="1 0 0 1 -2.17938 -11.3155" transformations="translations" pos="252 700" stroke="black" type="label" width="12.73" height="6.421" depth="0.83" valign="baseline" style="math">1-</text>
+<path stroke="black">
+244 700 m
+244 684 l
+268 684 l
+268 700 l
+h
+</path>
+<path stroke="darkgreen" arrow="normal/small">
+256 656 m
+272 672
+256 684 c
+</path>
+<path stroke="black" arrow="normal/small">
+256 700 m
+256 708 l
+</path>
+<path stroke="1 0 1" arrow="normal/small">
+232 712 m
+252 712 l
+</path>
+<path stroke="black" arrow="normal/small">
+260 712 m
+300 688 l
+</path>
+<path stroke="black" arrow="normal/small">
+260 672 m
+300 688 l
+</path>
+<path stroke="1 0 0" arrow="normal/small">
+308 688 m
+352 688 l
+</path>
+<path matrix="1 0 0 1 0 32" stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<text matrix="1 0 0 1 -8 28" transformations="translations" pos="372 664" stroke="black" type="label" width="10.159" height="6.923" depth="1.49" valign="baseline" style="math">\bm h_t</text>
+<path matrix="1 0 0 1 0 64" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 6.35127 17.045" transformations="translations" pos="168 700" stroke="black" type="label" width="3.321" height="5.313" depth="0" valign="baseline" size="large" style="math">\cdot</text>
+<path stroke="black" arrow="normal/small">
+180 720 m
+196 712 l
+</path>
+<path matrix="1 0 0 1 80 56" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 86.3513 9.045" transformations="translations" pos="168 700" stroke="black" type="label" width="3.321" height="5.313" depth="0" valign="baseline" size="large" style="math">\cdot</text>
+<path matrix="1 0 0 1 80 16" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 86.3513 -30.955" transformations="translations" pos="168 700" stroke="black" type="label" width="3.321" height="5.313" depth="0" valign="baseline" size="large" style="math">\cdot</text>
+<path matrix="1 0 0 1 128 32" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 132.455 -14.181" transformations="translations" pos="168 700" stroke="black" type="label" width="7.168" height="5.314" depth="0.83" valign="baseline" size="small" style="math">+</text>
+</page>
+</ipe>
diff --git a/slides/08/gru.svgz b/slides/08/gru.svgz
new file mode 100644
index 0000000..7ed69fa
Binary files /dev/null and b/slides/08/gru.svgz differ
diff --git a/slides/08/gru.svgz.ref b/slides/08/gru.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/highway_activations.jpg b/slides/08/highway_activations.jpg
new file mode 100644
index 0000000..7989db3
Binary files /dev/null and b/slides/08/highway_activations.jpg differ
diff --git a/slides/08/highway_activations.jpg.ref b/slides/08/highway_activations.jpg.ref
new file mode 100644
index 0000000..91c9627
--- /dev/null
+++ b/slides/08/highway_activations.jpg.ref
@@ -0,0 +1 @@
+Figure 2 of "Training Very Deep Networks", https://arxiv.org/abs/1507.06228
diff --git a/slides/08/highway_leisoning.svgz b/slides/08/highway_leisoning.svgz
new file mode 100644
index 0000000..1f37f1b
Binary files /dev/null and b/slides/08/highway_leisoning.svgz differ
diff --git a/slides/08/highway_leisoning.svgz.ref b/slides/08/highway_leisoning.svgz.ref
new file mode 100644
index 0000000..3078441
--- /dev/null
+++ b/slides/08/highway_leisoning.svgz.ref
@@ -0,0 +1 @@
+Figure 4 of "Training Very Deep Networks", https://arxiv.org/abs/1507.06228
diff --git a/slides/08/highway_training.svgz b/slides/08/highway_training.svgz
new file mode 100644
index 0000000..973bcf3
Binary files /dev/null and b/slides/08/highway_training.svgz differ
diff --git a/slides/08/highway_training.svgz.ref b/slides/08/highway_training.svgz.ref
new file mode 100644
index 0000000..997d2b8
--- /dev/null
+++ b/slides/08/highway_training.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Training Very Deep Networks", https://arxiv.org/abs/1507.06228
diff --git a/slides/08/layer_norm.svgz b/slides/08/layer_norm.svgz
new file mode 100644
index 0000000..1951e39
Binary files /dev/null and b/slides/08/layer_norm.svgz differ
diff --git a/slides/08/layer_norm.svgz.ref b/slides/08/layer_norm.svgz.ref
new file mode 100644
index 0000000..2ff61c5
--- /dev/null
+++ b/slides/08/layer_norm.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Layer Normalization", https://arxiv.org/abs/1607.06450
diff --git a/slides/08/layer_norm_properties.svgz b/slides/08/layer_norm_properties.svgz
new file mode 100644
index 0000000..aaa0044
Binary files /dev/null and b/slides/08/layer_norm_properties.svgz differ
diff --git a/slides/08/layer_norm_properties.svgz.ref b/slides/08/layer_norm_properties.svgz.ref
new file mode 100644
index 0000000..e0b63f6
--- /dev/null
+++ b/slides/08/layer_norm_properties.svgz.ref
@@ -0,0 +1 @@
+Table 1 of "Layer Normalization", https://arxiv.org/abs/1607.06450
diff --git a/slides/08/layer_norm_residual.ipe b/slides/08/layer_norm_residual.ipe
new file mode 100644
index 0000000..4ecdd51
--- /dev/null
+++ b/slides/08/layer_norm_residual.ipe
@@ -0,0 +1,402 @@
+<?xml version="1.0"?>
+<!DOCTYPE ipe SYSTEM "ipe.dtd">
+<ipe version="70218" creator="Ipe 7.2.23">
+<info created="D:20180416065930" modified="D:20220504150041"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<anglesize name="22.5 deg" value="22.5"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="90 deg" value="90"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="blue" value="0 0 1"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="gray" value="0.745"/>
+<color name="green" value="0 1 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="red" value="1 0 0"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="yellow" value="1 1 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<pen name="fat" value="1.2"/>
+<pen name="heavier" value="0.8"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" matrix="1 0 0 1 -144 0" stroke="black" arrow="normal/small">
+292 532 m
+236 548 l
+</path>
+<path matrix="1 0 0 1 -84 -104" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 -79.5449 -150.181" transformations="translations" pos="168 700" stroke="black" type="label" width="7.168" height="5.314" depth="0.83" valign="baseline" size="small" style="math">+</text>
+<path matrix="1 0 0 1 -144 0" stroke="black">
+248 532 m
+248 520 l
+336 520 l
+336 532 l
+h
+</path>
+<text matrix="1 0 0 1 -220.131 1.05841" transformations="translations" pos="324 528" stroke="black" type="minipage" width="88" height="6.035" depth="1.05" valign="top" size="footnote" style="center">Fully connected layer</text>
+<path matrix="1 0 0 1 -144 -24" stroke="black">
+248 532 m
+248 520 l
+336 520 l
+336 532 l
+h
+</path>
+<text matrix="1 0 0 1 -219.961 -22.7719" transformations="translations" pos="324 528" stroke="black" type="minipage" width="88" height="5.216" depth="0.23" valign="top" size="footnote" style="center">ReLU</text>
+<path matrix="1 0 0 1 -144 -48" stroke="black">
+248 532 m
+248 520 l
+336 520 l
+336 532 l
+h
+</path>
+<text matrix="1 0 0 1 -220.131 -46.9416" transformations="translations" pos="324 528" stroke="black" type="minipage" width="88" height="6.035" depth="1.05" valign="top" size="footnote" style="center">Fully connected layer</text>
+<path matrix="1 0 0 1 -144 0" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+<path matrix="1 0 0 1 -144 -24" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+<path matrix="-1 0 0 1 384 -76" stroke="black" arrow="normal/small">
+292 532 m
+236 548 l
+</path>
+<path matrix="1 0 0 1 -144 0" stroke="black" arrow="normal/small">
+236 456 m
+236 548 l
+</path>
+<path matrix="1 0 0 1 -200 -64" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+<path matrix="1 0 0 1 -200 48" stroke="black">
+248 532 m
+248 520 l
+336 520 l
+336 532 l
+h
+</path>
+<text matrix="1 0 0 1 -276.131 49.0584" transformations="translations" pos="324 528" stroke="black" type="minipage" width="88" height="6.035" depth="1.05" valign="top" size="footnote" style="center">Layer normalization</text>
+<path matrix="1 0 0 1 -200 48" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+<path matrix="1 0 0 1 -200 72" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+<text matrix="1 0 0 1 -148 0" transformations="translations" pos="244 456" stroke="darkorange" type="minipage" width="84" height="6.035" depth="1.05" valign="top" size="footnote">For example 512 values</text>
+<text matrix="1 0 0 1 -92 36" transformations="translations" pos="244 456" stroke="darkorange" type="minipage" width="100" height="6.035" depth="1.05" valign="top" size="footnote">For example 2048 values</text>
+<text matrix="1 0 0 1 -92 84" transformations="translations" pos="244 456" stroke="darkorange" type="minipage" width="100" height="6.035" depth="1.05" valign="top" size="footnote">For example 512 values</text>
+<text matrix="1 0 0 1 -284.131 81.0584" transformations="translations" pos="324 528" stroke="black" type="minipage" width="168" height="6.926" depth="1.93" valign="top" style="center">\bf Original ``Post-LN&apos;&apos; configuration</text>
+<path matrix="1 0 0 1 56 24" stroke="black" arrow="normal/small">
+292 532 m
+236 548 l
+</path>
+<path matrix="1 0 0 1 116 -80" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 120.455 -126.181" transformations="translations" pos="168 700" stroke="black" type="label" width="7.168" height="5.314" depth="0.83" valign="baseline" size="small" style="math">+</text>
+<path matrix="1 0 0 1 56 24" stroke="black">
+248 532 m
+248 520 l
+336 520 l
+336 532 l
+h
+</path>
+<text matrix="1 0 0 1 -20.131 25.0584" transformations="translations" pos="324 528" stroke="black" type="minipage" width="88" height="6.035" depth="1.05" valign="top" size="footnote" style="center">Fully connected layer</text>
+<path matrix="1 0 0 1 56 0" stroke="black">
+248 532 m
+248 520 l
+336 520 l
+336 532 l
+h
+</path>
+<text matrix="1 0 0 1 -19.961 1.2281" transformations="translations" pos="324 528" stroke="black" type="minipage" width="88" height="5.216" depth="0.23" valign="top" size="footnote" style="center">ReLU</text>
+<path matrix="1 0 0 1 56 -24" stroke="black">
+248 532 m
+248 520 l
+336 520 l
+336 532 l
+h
+</path>
+<text matrix="1 0 0 1 -20.131 -22.9416" transformations="translations" pos="324 528" stroke="black" type="minipage" width="88" height="6.035" depth="1.05" valign="top" size="footnote" style="center">Fully connected layer</text>
+<path matrix="1 0 0 1 56 24" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+<path matrix="1 0 0 1 56 0" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+<path matrix="-1 0 0 1 584 -76" stroke="black" arrow="normal/small">
+292 532 m
+236 548 l
+</path>
+<path matrix="1 0 0 1 -8 0" stroke="black" arrow="normal/small">
+300 456 m
+300 572 l
+</path>
+<path matrix="1 0 0 1 0 -64" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+<path matrix="1 0 0 1 0 72" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+<text matrix="1 0 0 1 52 0" transformations="translations" pos="244 456" stroke="darkorange" type="minipage" width="84" height="6.035" depth="1.05" valign="top" size="footnote">For example 512 values</text>
+<text matrix="1 0 0 1 108 60" transformations="translations" pos="244 456" stroke="darkorange" type="minipage" width="100" height="6.035" depth="1.05" valign="top" size="footnote">For example 2048 values</text>
+<text matrix="1 0 0 1 108 108" transformations="translations" pos="244 456" stroke="darkorange" type="minipage" width="100" height="6.035" depth="1.05" valign="top" size="footnote">For example 512 values</text>
+<text matrix="1 0 0 1 -92.131 81.0584" transformations="translations" pos="324 528" stroke="black" type="minipage" width="224" height="6.926" depth="1.93" valign="top" style="center">\bf Improved ``Pre-LN`` configuration since 2020</text>
+<path matrix="1 0 0 1 56 -48" stroke="black">
+248 532 m
+248 520 l
+336 520 l
+336 532 l
+h
+</path>
+<text matrix="1 0 0 1 -20.131 -46.9416" transformations="translations" pos="324 528" stroke="black" type="minipage" width="88" height="6.035" depth="1.05" valign="top" size="footnote" style="center">Layer normalization</text>
+<path matrix="1 0 0 1 56 -24" stroke="black" arrow="normal/small">
+292 508 m
+292 520 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/layer_norm_residual.svgz b/slides/08/layer_norm_residual.svgz
new file mode 100644
index 0000000..ed3dcde
Binary files /dev/null and b/slides/08/layer_norm_residual.svgz differ
diff --git a/slides/08/layer_norm_residual.svgz.ref b/slides/08/layer_norm_residual.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/lstm_cec.ipe b/slides/08/lstm_cec.ipe
new file mode 100644
index 0000000..3f2f80e
--- /dev/null
+++ b/slides/08/lstm_cec.ipe
@@ -0,0 +1,312 @@
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20180416065930" modified="D:20200414001940"/>
+<preamble>\usepackage{bm}</preamble>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="black">
+96 768 m
+96 624 l
+352 624 l
+352 768 l
+h
+</path>
+<path stroke="black" arrow="normal/small">
+64 672 m
+96 656 l
+</path>
+<path stroke="black" arrow="normal/small">
+64 640 m
+96 656 l
+</path>
+<text matrix="1 0 0 1 -4 0" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<text matrix="1 0 0 1 -4 -4" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<path matrix="1 0 0 1 0 -8" stroke="gray">
+40 0 0 40 224 704 e
+</path>
+<path stroke="gray" arrow="linear/large">
+96 656 m
+224 656 l
+</path>
+<path stroke="black" arrow="normal/small">
+96 656 m
+352 656 l
+</path>
+<text matrix="1 0 0 1 -8 -4" transformations="translations" pos="372 664" stroke="black" type="label" width="10.159" height="6.923" depth="1.49" valign="baseline" style="math">\bm h_t</text>
+<path stroke="black" arrow="arc/small">
+96 736 m
+352 736 l
+</path>
+<path matrix="1 0 0 1 0 80" stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<text matrix="1 0 0 1 -8 76" transformations="translations" pos="372 664" stroke="black" type="label" width="8.623" height="4.432" depth="1.49" valign="baseline" style="math">\bm c_t</text>
+<path matrix="1 0 0 1 -288 80" stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<text matrix="1 0 0 1 -296 76" transformations="translations" pos="372 664" stroke="black" type="label" width="18.821" height="4.432" depth="2.32" valign="baseline" style="math">\bm c_{t-1}</text>
+<path stroke="black" dash="dash dotted" arrow="linear/normal">
+184 736 m
+264 656 l
+</path>
+<path stroke="black" dash="dash dotted" arrow="linear/normal">
+184 656 m
+264 736 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/lstm_cec.svgz b/slides/08/lstm_cec.svgz
new file mode 100644
index 0000000..74d37da
Binary files /dev/null and b/slides/08/lstm_cec.svgz differ
diff --git a/slides/08/lstm_cec.svgz.ref b/slides/08/lstm_cec.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/lstm_cec_idea.ipe b/slides/08/lstm_cec_idea.ipe
new file mode 100644
index 0000000..0c948c2
--- /dev/null
+++ b/slides/08/lstm_cec_idea.ipe
@@ -0,0 +1,290 @@
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20180416065930" modified="D:20200414002010"/>
+<preamble>\usepackage{bm}</preamble>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="black">
+96 768 m
+96 624 l
+352 624 l
+352 768 l
+h
+</path>
+<path stroke="black" arrow="normal/small">
+64 672 m
+96 656 l
+</path>
+<path stroke="black" arrow="normal/small">
+64 640 m
+96 656 l
+</path>
+<text matrix="1 0 0 1 -4 0" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<text matrix="1 0 0 1 -4 -4" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<path matrix="1 0 0 1 0 -8" stroke="black">
+40 0 0 40 224 704 e
+</path>
+<path stroke="black" arrow="linear/large">
+96 656 m
+224 656 l
+</path>
+<path stroke="black" arrow="normal/small">
+224 656 m
+352 656 l
+</path>
+<text matrix="1 0 0 1 -8 -4" transformations="translations" pos="372 664" stroke="black" type="label" width="10.159" height="6.923" depth="1.49" valign="baseline" style="math">\bm h_t</text>
+</page>
+</ipe>
diff --git a/slides/08/lstm_cec_idea.svgz b/slides/08/lstm_cec_idea.svgz
new file mode 100644
index 0000000..a404dfa
Binary files /dev/null and b/slides/08/lstm_cec_idea.svgz differ
diff --git a/slides/08/lstm_cec_idea.svgz.ref b/slides/08/lstm_cec_idea.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/lstm_input_output_forget_gates.ipe b/slides/08/lstm_input_output_forget_gates.ipe
new file mode 100644
index 0000000..e3e4c85
--- /dev/null
+++ b/slides/08/lstm_input_output_forget_gates.ipe
@@ -0,0 +1,444 @@
+<?xml version="1.0"?>
+<!DOCTYPE ipe SYSTEM "ipe.dtd">
+<ipe version="70218" creator="Ipe 7.2.26">
+<info created="D:20180416065930" modified="D:20240415225717"/>
+<preamble>\usepackage{bm}</preamble>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<anglesize name="22.5 deg" value="22.5"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="90 deg" value="90"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="blue" value="0 0 1"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="gray" value="0.745"/>
+<color name="green" value="0 1 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="red" value="1 0 0"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="yellow" value="1 1 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<pen name="fat" value="1.2"/>
+<pen name="heavier" value="0.8"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" matrix="1 0 0 1 0 116" stroke="black">
+160 644 m
+160 628 l
+192 628 l
+192 644 l
+h
+</path>
+<text matrix="1 0 0 1 49.7453 89.6535" transformations="translations" pos="124 660" stroke="black" type="label" width="6.05" height="4.289" depth="0" valign="baseline" style="math">\sigma</text>
+<path matrix="1 0 0 -1 0 1392" stroke="black" arrow="normal/small">
+176 624 m
+176 632 l
+</path>
+<path matrix="1 0 0 -1 0 1392" stroke="black" arrow="normal/small">
+160 592 m
+176 624 l
+</path>
+<path matrix="1 0 0 -1 0 1392" stroke="black" arrow="normal/small">
+192 592 m
+176 624 l
+</path>
+<text matrix="1 0 0 1 116 156" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<text matrix="1 0 0 1 76 124" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<path stroke="gray">
+32 0 0 32 216 696 e
+</path>
+<path matrix="1 0 0 1 -4 0" stroke="gray" arrow="linear/large">
+216 664 m
+220 664 l
+</path>
+<path stroke="black">
+96 768 m
+96 624 l
+352 624 l
+352 768 l
+h
+</path>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+64 672 m
+96 656 l
+</path>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+64 640 m
+96 656 l
+</path>
+<text matrix="1 0 0 1 -4 8" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<text matrix="1 0 0 1 -4 4" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<path stroke="red" arrow="normal/small">
+324 664 m
+352 664 l
+</path>
+<text matrix="1 0 0 1 -8 4" transformations="translations" pos="372 664" stroke="black" type="label" width="10.159" height="6.923" depth="1.49" valign="baseline" style="math">\bm h_t</text>
+<path matrix="1 0 0 1 -4 8" stroke="black">
+116 664 m
+116 648 l
+152 648 l
+152 664 l
+h
+</path>
+<text matrix="1 0 0 1 -3.59001 1.27553" transformations="translations" pos="124 660" stroke="black" type="label" width="19.925" height="6.918" depth="0" valign="baseline" style="math">\tanh</text>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+96 656 m
+112 656 l
+</path>
+<path matrix="1 0 0 1 0 4" stroke="black">
+160 644 m
+160 628 l
+192 628 l
+192 644 l
+h
+</path>
+<text matrix="1 0 0 1 49.7453 -22.3465" transformations="translations" pos="124 660" stroke="black" type="label" width="6.05" height="4.289" depth="0" valign="baseline" style="math">\sigma</text>
+<path stroke="black" arrow="normal/small">
+176 624 m
+176 632 l
+</path>
+<path stroke="black" arrow="normal/small">
+160 592 m
+176 624 l
+</path>
+<path stroke="black" arrow="normal/small">
+192 592 m
+176 624 l
+</path>
+<text matrix="1 0 0 1 76 -72" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<text matrix="1 0 0 1 116 -40" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path stroke="blue" arrow="normal/small">
+176 648 m
+176 660 l
+</path>
+<path matrix="1 0 0 1 140 8" stroke="black">
+116 664 m
+116 648 l
+152 648 l
+152 664 l
+h
+</path>
+<text matrix="1 0 0 1 140.41 1.27553" transformations="translations" pos="124 660" stroke="black" type="label" width="19.925" height="6.918" depth="0" valign="baseline" style="math">\tanh</text>
+<path matrix="1 0 0 1 144 4" stroke="black">
+160 644 m
+160 628 l
+192 628 l
+192 644 l
+h
+</path>
+<text matrix="1 0 0 1 193.745 -22.3465" transformations="translations" pos="124 660" stroke="black" type="label" width="6.05" height="4.289" depth="0" valign="baseline" style="math">\sigma</text>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+148 656 m
+172 656 l
+</path>
+<path matrix="1 0 0 1 144 0" stroke="black" arrow="normal/small">
+176 624 m
+176 632 l
+</path>
+<path matrix="1 0 0 1 144 0" stroke="black" arrow="normal/small">
+160 592 m
+176 624 l
+</path>
+<path matrix="1 0 0 1 144 0" stroke="black" arrow="normal/small">
+192 592 m
+176 624 l
+</path>
+<text matrix="1 0 0 1 220 -72" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<text matrix="1 0 0 1 260 -40" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path stroke="black" arrow="arc/small">
+96 728 m
+172 728 l
+</path>
+<path matrix="1 0 0 1 0 72" stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<text matrix="1 0 0 1 -8 68" transformations="translations" pos="372 664" stroke="black" type="label" width="8.623" height="4.432" depth="1.49" valign="baseline" style="math">\bm c_t</text>
+<text matrix="1 0 0 1 -296 68" transformations="translations" pos="372 664" stroke="black" type="label" width="18.821" height="4.432" depth="2.32" valign="baseline" style="math">\bm c_{t-1}</text>
+<path matrix="1 0 0 1 -288 72" stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<path matrix="1 0 0 1 0 8" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 6.35127 -38.955" transformations="translations" pos="168 700" stroke="black" type="label" width="3.321" height="5.313" depth="0" valign="baseline" size="large" style="math">\cdot</text>
+<path matrix="1 0 0 1 144 0" stroke="darkgreen" arrow="normal/small">
+176 648 m
+176 660 l
+</path>
+<path matrix="1 0 0 1 144 8" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 150.351 -38.955" transformations="translations" pos="168 700" stroke="black" type="label" width="3.321" height="5.313" depth="0" valign="baseline" size="large" style="math">\cdot</text>
+<path matrix="1 0 0 1 144 8" stroke="black" arrow="normal/small">
+148 656 m
+172 656 l
+</path>
+<path matrix="1 0 0 1 40 72" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 44.4551 25.819" transformations="translations" pos="168 700" stroke="black" type="label" width="7.168" height="5.314" depth="0.83" valign="baseline" size="small" style="math">+</text>
+<path stroke="1 0 1" arrow="arc/small">
+220 728 m
+352 728 l
+</path>
+<path stroke="black" arrow="normal/small">
+178.851 666.885 m
+212.886 725.303 l
+</path>
+<path stroke="1 0 1" arrow="normal/small">
+219.14 725.351 m
+256 664 l
+</path>
+<path matrix="1 0 0 1 0 72" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 6.35127 25.045" transformations="translations" pos="168 700" stroke="black" type="label" width="3.321" height="5.313" depth="0" valign="baseline" size="large" style="math">\cdot</text>
+<path stroke="black" arrow="arc/small">
+180 728 m
+212 728 l
+</path>
+<path stroke="darkorange" arrow="normal/small">
+176 744 m
+176 732 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/lstm_input_output_forget_gates.svgz b/slides/08/lstm_input_output_forget_gates.svgz
new file mode 100644
index 0000000..6770692
Binary files /dev/null and b/slides/08/lstm_input_output_forget_gates.svgz differ
diff --git a/slides/08/lstm_input_output_forget_gates.svgz.ref b/slides/08/lstm_input_output_forget_gates.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/lstm_input_output_gates.ipe b/slides/08/lstm_input_output_gates.ipe
new file mode 100644
index 0000000..0170d2f
--- /dev/null
+++ b/slides/08/lstm_input_output_gates.ipe
@@ -0,0 +1,411 @@
+<?xml version="1.0"?>
+<!DOCTYPE ipe SYSTEM "ipe.dtd">
+<ipe version="70218" creator="Ipe 7.2.26">
+<info created="D:20180416065930" modified="D:20240415225401"/>
+<preamble>\usepackage{bm}</preamble>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<anglesize name="22.5 deg" value="22.5"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="90 deg" value="90"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="blue" value="0 0 1"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="gray" value="0.745"/>
+<color name="green" value="0 1 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="red" value="1 0 0"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="yellow" value="1 1 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<pen name="fat" value="1.2"/>
+<pen name="heavier" value="0.8"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="gray">
+32 0 0 32 216 696 e
+</path>
+<path matrix="1 0 0 1 -4 0" stroke="gray" arrow="linear/large">
+216 664 m
+220 664 l
+</path>
+<path stroke="black">
+96 768 m
+96 624 l
+352 624 l
+352 768 l
+h
+</path>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+64 672 m
+96 656 l
+</path>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+64 640 m
+96 656 l
+</path>
+<text matrix="1 0 0 1 -4 8" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<text matrix="1 0 0 1 -4 4" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<path stroke="red" arrow="normal/small">
+324 664 m
+352 664 l
+</path>
+<text matrix="1 0 0 1 -8 4" transformations="translations" pos="372 664" stroke="black" type="label" width="10.159" height="6.923" depth="1.49" valign="baseline" style="math">\bm h_t</text>
+<path matrix="1 0 0 1 -4 8" stroke="black">
+116 664 m
+116 648 l
+152 648 l
+152 664 l
+h
+</path>
+<text matrix="1 0 0 1 -3.59001 1.27553" transformations="translations" pos="124 660" stroke="black" type="label" width="19.925" height="6.918" depth="0" valign="baseline" style="math">\tanh</text>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+96 656 m
+112 656 l
+</path>
+<path matrix="1 0 0 1 0 4" stroke="black">
+160 644 m
+160 628 l
+192 628 l
+192 644 l
+h
+</path>
+<text matrix="1 0 0 1 49.7453 -22.3465" transformations="translations" pos="124 660" stroke="black" type="label" width="6.05" height="4.289" depth="0" valign="baseline" style="math">\sigma</text>
+<path stroke="black" arrow="normal/small">
+176 624 m
+176 632 l
+</path>
+<path stroke="black" arrow="normal/small">
+160 592 m
+176 624 l
+</path>
+<path stroke="black" arrow="normal/small">
+192 592 m
+176 624 l
+</path>
+<text matrix="1 0 0 1 76 -72" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<text matrix="1 0 0 1 116 -40" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path stroke="blue" arrow="normal/small">
+176 648 m
+176 660 l
+</path>
+<path matrix="1 0 0 1 140 8" stroke="black">
+116 664 m
+116 648 l
+152 648 l
+152 664 l
+h
+</path>
+<text matrix="1 0 0 1 140.41 1.27553" transformations="translations" pos="124 660" stroke="black" type="label" width="19.925" height="6.918" depth="0" valign="baseline" style="math">\tanh</text>
+<path matrix="1 0 0 1 144 4" stroke="black">
+160 644 m
+160 628 l
+192 628 l
+192 644 l
+h
+</path>
+<text matrix="1 0 0 1 193.745 -22.3465" transformations="translations" pos="124 660" stroke="black" type="label" width="6.05" height="4.289" depth="0" valign="baseline" style="math">\sigma</text>
+<path matrix="1 0 0 1 0 8" stroke="black" arrow="normal/small">
+148 656 m
+172 656 l
+</path>
+<path matrix="1 0 0 1 144 0" stroke="black" arrow="normal/small">
+176 624 m
+176 632 l
+</path>
+<path matrix="1 0 0 1 144 0" stroke="black" arrow="normal/small">
+160 592 m
+176 624 l
+</path>
+<path matrix="1 0 0 1 144 0" stroke="black" arrow="normal/small">
+192 592 m
+176 624 l
+</path>
+<text matrix="1 0 0 1 220 -72" transformations="translations" pos="76 672" stroke="black" type="label" width="10.073" height="4.432" depth="1.49" valign="baseline" style="math">\bm x_t</text>
+<text matrix="1 0 0 1 260 -40" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\bm h_{t-1}</text>
+<path stroke="black" arrow="arc/small">
+96 728 m
+212 728 l
+</path>
+<path matrix="1 0 0 1 0 72" stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<text matrix="1 0 0 1 -8 68" transformations="translations" pos="372 664" stroke="black" type="label" width="8.623" height="4.432" depth="1.49" valign="baseline" style="math">\bm c_t</text>
+<text matrix="1 0 0 1 -296 68" transformations="translations" pos="372 664" stroke="black" type="label" width="18.821" height="4.432" depth="2.32" valign="baseline" style="math">\bm c_{t-1}</text>
+<path matrix="1 0 0 1 -288 72" stroke="black" arrow="normal/small">
+352 656 m
+384 656 l
+</path>
+<path matrix="1 0 0 1 0 8" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 6.35127 -38.955" transformations="translations" pos="168 700" stroke="black" type="label" width="3.321" height="5.313" depth="0" valign="baseline" size="large" style="math">\cdot</text>
+<path matrix="1 0 0 1 144 0" stroke="darkgreen" arrow="normal/small">
+176 648 m
+176 660 l
+</path>
+<path matrix="1 0 0 1 144 8" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 150.351 -38.955" transformations="translations" pos="168 700" stroke="black" type="label" width="3.321" height="5.313" depth="0" valign="baseline" size="large" style="math">\cdot</text>
+<path matrix="1 0 0 1 144 8" stroke="black" arrow="normal/small">
+148 656 m
+172 656 l
+</path>
+<path matrix="1 0 0 1 40 72" stroke="black">
+4 0 0 4 176 656 e
+</path>
+<text matrix="1 0 0 1 44.4551 25.819" transformations="translations" pos="168 700" stroke="black" type="label" width="7.168" height="5.314" depth="0.83" valign="baseline" size="small" style="math">+</text>
+<path stroke="1 0 1" arrow="arc/small">
+220 728 m
+352 728 l
+</path>
+<path stroke="black" arrow="normal/small">
+178.851 666.885 m
+212.886 725.303 l
+</path>
+<path stroke="1 0 1" arrow="normal/small">
+219.14 725.351 m
+256 664 l
+</path>
+<text matrix="1 0 0 1 116 156" transformations="translations" pos="76 640" stroke="black" type="label" width="20.357" height="6.923" depth="2.32" valign="baseline" style="math">\phantom{\bm h_{t-1}}</text>
+</page>
+</ipe>
diff --git a/slides/08/lstm_input_output_gates.svgz b/slides/08/lstm_input_output_gates.svgz
new file mode 100644
index 0000000..b26f64a
Binary files /dev/null and b/slides/08/lstm_input_output_gates.svgz differ
diff --git a/slides/08/lstm_input_output_gates.svgz.ref b/slides/08/lstm_input_output_gates.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/multilayer_rnn.ipe b/slides/08/multilayer_rnn.ipe
new file mode 100644
index 0000000..1f85067
--- /dev/null
+++ b/slides/08/multilayer_rnn.ipe
@@ -0,0 +1,490 @@
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20180423065931" modified="D:20200413181423"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 64 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 64 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 128 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 192 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 256 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 320 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 320 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 0 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 0 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 0 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 64 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 64 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 128 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 128 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 192 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 192 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 256 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 256 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 320 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 320 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 0 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 0 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 0 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 64 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 64 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 128 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 128 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 192 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 192 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 256 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 256 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 320 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 320 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 0 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 384 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 384 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 384 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/multilayer_rnn.svgz b/slides/08/multilayer_rnn.svgz
new file mode 100644
index 0000000..bbb50c0
Binary files /dev/null and b/slides/08/multilayer_rnn.svgz differ
diff --git a/slides/08/multilayer_rnn.svgz.ref b/slides/08/multilayer_rnn.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/multilayer_rnn_residual.ipe b/slides/08/multilayer_rnn_residual.ipe
new file mode 100644
index 0000000..194b03a
--- /dev/null
+++ b/slides/08/multilayer_rnn_residual.ipe
@@ -0,0 +1,550 @@
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20180423065931" modified="D:20200414020529"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 64 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 64 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 128 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 192 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 256 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 320 0" stroke="navy">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 320 0" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 0 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 0 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 0 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 64 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 64 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 128 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 128 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 192 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 192 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 256 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 256 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 320 -64" stroke="darkcyan">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 320 -64" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 0 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 -128" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="-1 0 0 1 480 -64" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 416 -64" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 352 -64" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 288 -64" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 224 -64" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 160 -64" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="1 0 0 1 0 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 0 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 64 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 64 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 128 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 128 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 192 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 192 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 256 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 256 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 320 -128" stroke="darkmagenta">
+16 0 0 16 80 768 e
+</path>
+<path matrix="1 0 0 1 320 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 0 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 64 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 128 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 192 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 256 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="1 0 0 1 320 -192" stroke="black" arrow="normal/large">
+80 816 m
+80 784 l
+</path>
+<path matrix="-1 0 0 1 480 -128" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 416 -128" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 352 -128" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 288 -128" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 224 -128" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="-1 0 0 1 160 -128" stroke="black" arrow="normal/large">
+80 800 m
+32 768
+80 736 c
+</path>
+<path matrix="1 0 0 1 384 0" stroke="navy" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 384 -64" stroke="darkcyan" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+<path matrix="1 0 0 1 384 -128" stroke="darkmagenta" arrow="normal/large">
+32 768 m
+64 768 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/multilayer_rnn_residual.svgz b/slides/08/multilayer_rnn_residual.svgz
new file mode 100644
index 0000000..6abdf6a
Binary files /dev/null and b/slides/08/multilayer_rnn_residual.svgz differ
diff --git a/slides/08/multilayer_rnn_residual.svgz.ref b/slides/08/multilayer_rnn_residual.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/recurrent_batch_normalization.svgz b/slides/08/recurrent_batch_normalization.svgz
new file mode 100644
index 0000000..b7e6c7d
Binary files /dev/null and b/slides/08/recurrent_batch_normalization.svgz differ
diff --git a/slides/08/recurrent_batch_normalization.svgz.ref b/slides/08/recurrent_batch_normalization.svgz.ref
new file mode 100644
index 0000000..55024b1
--- /dev/null
+++ b/slides/08/recurrent_batch_normalization.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "Recurrent Batch Normalization", https://arxiv.org/abs/1603.09025
diff --git a/slides/08/rnn_cell.ipe b/slides/08/rnn_cell.ipe
new file mode 100644
index 0000000..0d76d0c
--- /dev/null
+++ b/slides/08/rnn_cell.ipe
@@ -0,0 +1,353 @@
+<?xml version="1.0"?>
+<!DOCTYPE ipe SYSTEM "ipe.dtd">
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20180409075511" modified="D:20210701125804"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="black">
+32 0 0 32 192 672 e
+</path>
+<text matrix="1 0 0 1 12 8" transformations="translations" pos="184 728" stroke="black" type="label" width="37.874" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{input}</text>
+<path stroke="black" arrow="normal/large">
+192 752 m
+192 704 l
+</path>
+<path stroke="black" arrow="normal/large">
+192 640 m
+192 584 l
+</path>
+<text matrix="1 0 0 1 12 -128" transformations="translations" pos="184 728" stroke="black" type="label" width="46.482" height="10.586" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{output}</text>
+<path stroke="black" arrow="normal/large">
+236 672 m
+256 672
+256 628
+128 628
+124 672
+160 672 c
+</path>
+<text matrix="1 0 0 1 56 -48" transformations="translations" pos="184 728" stroke="black" type="label" width="34.431" height="10.589" depth="0" valign="baseline" size="LARGE" style="math">\textit{state}</text>
+<path stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path stroke="black">
+176 728 m
+176 720 l
+</path>
+<path stroke="black">
+184 728 m
+184 720 l
+</path>
+<path stroke="black">
+200 728 m
+200 720 l
+</path>
+<path stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 0 -104" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 0 -104" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 0 -104" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 0 -104" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 0 -104" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="0 1 -1 0 956 480" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="0 1 -1 0 956 480" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="0 1 -1 0 956 480" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="0 1 -1 0 956 480" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="0 1 -1 0 956 480" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path stroke="black">
+224 672 m
+236 672 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/rnn_cell.svgz b/slides/08/rnn_cell.svgz
new file mode 100644
index 0000000..5ecff56
Binary files /dev/null and b/slides/08/rnn_cell.svgz differ
diff --git a/slides/08/rnn_cell.svgz.ref b/slides/08/rnn_cell.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/rnn_cell_basic.ipe b/slides/08/rnn_cell_basic.ipe
new file mode 100644
index 0000000..a97675e
--- /dev/null
+++ b/slides/08/rnn_cell_basic.ipe
@@ -0,0 +1,288 @@
+<ipe version="70206" creator="Ipe 7.2.7">
+<info created="D:20180409080635" modified="D:20180409080638"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="small" value="\small"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textsize name="tiny" value="\tiny"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" matrix="1 0 0 1 32 8" stroke="black">
+80 800 m
+80 736 l
+96 736 l
+96 800 l
+h
+</path>
+<path matrix="1 0 0 1 32 0" stroke="black">
+128 768 m
+128 704 l
+144 704 l
+144 768 l
+h
+</path>
+<path matrix="1 0 0 1 32 -72" stroke="black">
+80 800 m
+80 736 l
+96 736 l
+96 800 l
+h
+</path>
+<text matrix="1 0 0 1 32 -4" transformations="translations" pos="40 780" stroke="black" type="label" width="31.562" height="9.3" depth="2.79" valign="baseline" size="Large" style="math">\textit{input}</text>
+<text matrix="1 0 0 1 -20 -88" transformations="translations" pos="40 780" stroke="black" type="label" width="83.747" height="9.3" depth="2.79" valign="baseline" size="Large" style="math">\textit{previous state}</text>
+<path stroke="black" arrow="normal/normal">
+128 776 m
+160 736 l
+</path>
+<path stroke="black" arrow="normal/normal">
+128 696 m
+160 736 l
+</path>
+<text matrix="1 0 0 1 144 -48" transformations="translations" pos="40 780" stroke="black" type="label" width="116.923" height="8.824" depth="2.79" valign="baseline" size="Large" style="math">\textit{output~=~new state}</text>
+</page>
+</ipe>
diff --git a/slides/08/rnn_cell_basic.svgz b/slides/08/rnn_cell_basic.svgz
new file mode 100644
index 0000000..f1ea259
Binary files /dev/null and b/slides/08/rnn_cell_basic.svgz differ
diff --git a/slides/08/rnn_cell_basic.svgz.ref b/slides/08/rnn_cell_basic.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/rnn_cell_basic_as_cell.ipe b/slides/08/rnn_cell_basic_as_cell.ipe
new file mode 100644
index 0000000..b0440ad
--- /dev/null
+++ b/slides/08/rnn_cell_basic_as_cell.ipe
@@ -0,0 +1,433 @@
+<?xml version="1.0"?>
+<!DOCTYPE ipe SYSTEM "ipe.dtd">
+<ipe version="70218" creator="Ipe 7.2.23">
+<info created="D:20180409075511" modified="D:20220327182625"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<anglesize name="22.5 deg" value="22.5"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="90 deg" value="90"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="blue" value="0 0 1"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="gray" value="0.745"/>
+<color name="green" value="0 1 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="red" value="1 0 0"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="yellow" value="1 1 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<pen name="fat" value="1.2"/>
+<pen name="heavier" value="0.8"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" matrix="1 0 0 1 0 -32" stroke="black">
+64 0 0 64 192 672 e
+</path>
+<text matrix="1 0 0 1 12 8" transformations="translations" pos="184 728" stroke="black" type="label" width="37.874" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{input}</text>
+<path stroke="black" arrow="normal/large">
+192 752 m
+192 704 l
+</path>
+<path stroke="black" arrow="normal/large">
+192 576 m
+192 512 l
+</path>
+<text matrix="1 0 0 1 12 -200" transformations="translations" pos="184 728" stroke="black" type="label" width="46.482" height="10.586" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{output}</text>
+<path stroke="black" arrow="normal/large">
+268 640 m
+288 640
+288 556
+96 556
+96 640
+128 640 c
+</path>
+<text matrix="1 0 0 1 88 -80" transformations="translations" pos="184 728" stroke="black" type="label" width="34.431" height="10.589" depth="0" valign="baseline" size="LARGE" style="math">\textit{state}</text>
+<path stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path stroke="black">
+176 728 m
+176 720 l
+</path>
+<path stroke="black">
+184 728 m
+184 720 l
+</path>
+<path stroke="black">
+200 728 m
+200 720 l
+</path>
+<path stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 0 -176" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 0 -176" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 0 -176" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 0 -176" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 0 -176" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="0 1 -1 0 988 448" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="0 1 -1 0 988 448" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="0 1 -1 0 988 448" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="0 1 -1 0 988 448" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="0 1 -1 0 988 448" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 32 -32" stroke="black">
+224 672 m
+236 672 l
+</path>
+<path matrix="0 1 -1 0 904 448" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="0 1 -1 0 904 448" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="0 1 -1 0 904 448" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="0 1 -1 0 904 448" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="0 1 -1 0 904 448" stroke="black">
+208 728 m
+208 720 l
+</path>
+<text matrix="1 0 0 1 4 -84" transformations="translations" pos="184 728" stroke="black" type="label" width="31.581" height="11.955" depth="0" valign="baseline" size="LARGE" style="math">\tanh</text>
+<path stroke="black">
+176 640 m
+256 640 l
+</path>
+<path stroke="black" arrow="normal/large">
+128 640 m
+176 640 l
+</path>
+<path matrix="1 0 0 1 -24 0" stroke="black">
+184 648 m
+8 0 0 8 192 648 192 640 a
+</path>
+<path matrix="1 0 0 1 -8 0" stroke="black">
+192 672 m
+8 0 0 8 192 680 200 680 a
+</path>
+<path matrix="1 0 0 1 8 0" stroke="black">
+160 672 m
+8 0 0 8 160 664 152 664 a
+</path>
+<path stroke="black">
+192 704 m
+192 680 l
+</path>
+<path stroke="black">
+184 672 m
+168 672 l
+</path>
+<path stroke="black">
+160 664 m
+160 648 l
+</path>
+<path matrix="-1 0 0 -1 408 1280" stroke="black">
+184 648 m
+8 0 0 8 192 648 192 640 a
+</path>
+<path matrix="-1 0 0 -1 392 1280" stroke="black">
+192 672 m
+8 0 0 8 192 680 200 680 a
+</path>
+<path matrix="-1 0 0 -1 376 1280" stroke="black">
+160 672 m
+8 0 0 8 160 664 152 664 a
+</path>
+<path matrix="-1 0 0 -1 384 1280" stroke="black">
+192 704 m
+192 680 l
+</path>
+<path matrix="-1 0 0 -1 384 1280" stroke="black">
+184 672 m
+168 672 l
+</path>
+<path matrix="-1 0 0 -1 384 1280" stroke="black">
+160 664 m
+160 648 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/rnn_cell_basic_as_cell.svgz b/slides/08/rnn_cell_basic_as_cell.svgz
new file mode 100644
index 0000000..7947878
Binary files /dev/null and b/slides/08/rnn_cell_basic_as_cell.svgz differ
diff --git a/slides/08/rnn_cell_basic_as_cell.svgz.ref b/slides/08/rnn_cell_basic_as_cell.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/rnn_cell_unrolled.ipe b/slides/08/rnn_cell_unrolled.ipe
new file mode 100644
index 0000000..9063203
--- /dev/null
+++ b/slides/08/rnn_cell_unrolled.ipe
@@ -0,0 +1,606 @@
+<?xml version="1.0"?>
+<!DOCTYPE ipe SYSTEM "ipe.dtd">
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20180409075511" modified="D:20210701125625"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="black">
+32 0 0 32 64 720 e
+</path>
+<text matrix="1 0 0 1 -116 56" transformations="translations" pos="184 728" stroke="black" type="label" width="52.508" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{input~1}</text>
+<path matrix="1 0 0 1 -128 48" stroke="black" arrow="normal/large">
+192 752 m
+192 704 l
+</path>
+<path matrix="1 0 0 1 -128 48" stroke="black" arrow="normal/large">
+192 640 m
+192 592 l
+</path>
+<text matrix="1 0 0 1 -116 -72" transformations="translations" pos="184 728" stroke="black" type="label" width="61.115" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{output~1}</text>
+<text matrix="1 0 0 1 -68 0" transformations="translations" pos="184 728" stroke="black" type="label" width="34.431" height="10.589" depth="0" valign="baseline" size="LARGE" style="math">\textit{state}</text>
+<path stroke="black" arrow="normal/large">
+96 720 m
+160 720 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="black">
+32 0 0 32 64 720 e
+</path>
+<text matrix="1 0 0 1 12 56" transformations="translations" pos="184 728" stroke="black" type="label" width="52.508" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{input~2}</text>
+<path matrix="1 0 0 1 0 48" stroke="black" arrow="normal/large">
+192 752 m
+192 704 l
+</path>
+<path matrix="1 0 0 1 0 48" stroke="black" arrow="normal/large">
+192 640 m
+192 592 l
+</path>
+<text matrix="1 0 0 1 12 -72" transformations="translations" pos="184 728" stroke="black" type="label" width="61.115" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{output~2}</text>
+<text matrix="1 0 0 1 60 0" transformations="translations" pos="184 728" stroke="black" type="label" width="34.431" height="10.589" depth="0" valign="baseline" size="LARGE" style="math">\textit{state}</text>
+<path matrix="1 0 0 1 128 0" stroke="black" arrow="normal/large">
+96 720 m
+160 720 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="black">
+32 0 0 32 64 720 e
+</path>
+<text matrix="1 0 0 1 140 56" transformations="translations" pos="184 728" stroke="black" type="label" width="52.508" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{input~3}</text>
+<path matrix="1 0 0 1 128 48" stroke="black" arrow="normal/large">
+192 752 m
+192 704 l
+</path>
+<path matrix="1 0 0 1 128 48" stroke="black" arrow="normal/large">
+192 640 m
+192 592 l
+</path>
+<text matrix="1 0 0 1 140 -72" transformations="translations" pos="184 728" stroke="black" type="label" width="61.115" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{output~3}</text>
+<text matrix="1 0 0 1 188 0" transformations="translations" pos="184 728" stroke="black" type="label" width="34.431" height="10.589" depth="0" valign="baseline" size="LARGE" style="math">\textit{state}</text>
+<path matrix="1 0 0 1 256 0" stroke="black" arrow="normal/large">
+96 720 m
+160 720 l
+</path>
+<path matrix="1 0 0 1 384 0" stroke="black">
+32 0 0 32 64 720 e
+</path>
+<text matrix="1 0 0 1 268 56" transformations="translations" pos="184 728" stroke="black" type="label" width="52.508" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{input~4}</text>
+<path matrix="1 0 0 1 256 48" stroke="black" arrow="normal/large">
+192 752 m
+192 704 l
+</path>
+<path matrix="1 0 0 1 256 48" stroke="black" arrow="normal/large">
+192 640 m
+192 592 l
+</path>
+<text matrix="1 0 0 1 268 -72" transformations="translations" pos="184 728" stroke="black" type="label" width="61.115" height="11.158" depth="3.35" valign="baseline" size="LARGE" style="math">\textit{output~4}</text>
+<text matrix="1 0 0 1 316 0" transformations="translations" pos="184 728" stroke="black" type="label" width="34.431" height="10.589" depth="0" valign="baseline" size="LARGE" style="math">\textit{state}</text>
+<path matrix="1 0 0 1 384 0" stroke="black" arrow="normal/large">
+96 720 m
+160 720 l
+</path>
+<path matrix="1 0 0 1 -128 48" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 -128 48" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 -128 48" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 -128 48" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 -128 48" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 0 48" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 0 48" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 0 48" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 0 48" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 0 48" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 128 48" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 128 48" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 128 48" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 128 48" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 128 48" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 256 48" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 256 48" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 256 48" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 256 48" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 256 48" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 -128 -48" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 -128 -48" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 -128 -48" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 -128 -48" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 -128 -48" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 0 -48" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 0 -48" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 0 -48" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 0 -48" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 0 -48" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 128 -48" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 128 -48" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 128 -48" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 128 -48" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 128 -48" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="1 0 0 1 256 -48" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="1 0 0 1 256 -48" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="1 0 0 1 256 -48" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="1 0 0 1 256 -48" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="1 0 0 1 256 -48" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="0 1 -1 0 832 528" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="0 1 -1 0 832 528" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="0 1 -1 0 832 528" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="0 1 -1 0 832 528" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="0 1 -1 0 832 528" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="0 1 -1 0 960 528" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="0 1 -1 0 960 528" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="0 1 -1 0 960 528" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="0 1 -1 0 960 528" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="0 1 -1 0 960 528" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="0 1 -1 0 1088 528" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="0 1 -1 0 1088 528" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="0 1 -1 0 1088 528" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="0 1 -1 0 1088 528" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="0 1 -1 0 1088 528" stroke="black">
+208 728 m
+208 720 l
+</path>
+<path matrix="0 1 -1 0 1216 528" stroke="black">
+216 728 m
+216 720 l
+168 720 l
+168 728 l
+h
+</path>
+<path matrix="0 1 -1 0 1216 528" stroke="black">
+176 728 m
+176 720 l
+</path>
+<path matrix="0 1 -1 0 1216 528" stroke="black">
+184 728 m
+184 720 l
+</path>
+<path matrix="0 1 -1 0 1216 528" stroke="black">
+200 728 m
+200 720 l
+</path>
+<path matrix="0 1 -1 0 1216 528" stroke="black">
+208 728 m
+208 720 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/rnn_cell_unrolled.svgz b/slides/08/rnn_cell_unrolled.svgz
new file mode 100644
index 0000000..1ac808c
Binary files /dev/null and b/slides/08/rnn_cell_unrolled.svgz differ
diff --git a/slides/08/rnn_cell_unrolled.svgz.ref b/slides/08/rnn_cell_unrolled.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/sequence_prediction_inference.ipe b/slides/08/sequence_prediction_inference.ipe
new file mode 100644
index 0000000..6514be8
--- /dev/null
+++ b/slides/08/sequence_prediction_inference.ipe
@@ -0,0 +1,371 @@
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20180409093007" modified="D:20200415115918"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<path stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<text matrix="1 0 0 1 -4 -20" transformations="translations" pos="88 780" stroke="black" type="label" width="28.527" height="9.803" depth="0" valign="baseline" size="Large" style="math">\textit{BOS}</text>
+<path stroke="black" arrow="normal/normal">
+80 720 m
+80 696 l
+</path>
+<text matrix="1 0 0 1 0 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">\hat x^{(0)}</text>
+<path stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<path matrix="1 0 0 1 64 0" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<path stroke="black" arrow="normal/normal">
+144 720 m
+144 696 l
+</path>
+<text matrix="1 0 0 1 64 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">\hat x^{(1)}</text>
+<path matrix="1 0 0 1 64 0" stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<path stroke="black" arrow="normal/normal">
+208 720 m
+208 696 l
+</path>
+<text matrix="1 0 0 1 128 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">\hat x^{(2)}</text>
+<path matrix="1 0 0 1 128 0" stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<path stroke="black" arrow="normal/normal">
+272 720 m
+272 696 l
+</path>
+<text matrix="1 0 0 1 192 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">\hat x^{(3)}</text>
+<path matrix="1 0 0 1 192 0" stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<path stroke="black" arrow="normal/normal">
+336 720 m
+336 696 l
+</path>
+<text matrix="1 0 0 1 256 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="28.168" height="9.803" depth="0" valign="baseline" size="Large" style="math">\textit{EOS}</text>
+<path matrix="1 0 0 1 64 0" stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<path stroke="black" arrow="normal/normal">
+80 696 m
+88 680
+112 680
+112 792
+136 792
+144 776 c
+</path>
+<path matrix="1 0 0 1 64 0" stroke="black" arrow="normal/normal">
+80 696 m
+88 680
+112 680
+112 792
+136 792
+144 776 c
+</path>
+<path matrix="1 0 0 1 128 0" stroke="black" arrow="normal/normal">
+80 696 m
+88 680
+112 680
+112 792
+136 792
+144 776 c
+</path>
+<path matrix="1 0 0 1 192 0" stroke="black" arrow="normal/normal">
+80 696 m
+88 680
+112 680
+112 792
+136 792
+144 776 c
+</path>
+<path matrix="1 0 0 1 -64 0" stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<text matrix="1 0 0 1 -75.7749 -40.913" transformations="translations" pos="88 780" stroke="black" type="label" width="37.665" height="4.297" depth="1.93" valign="baseline" style="math">\mathit{sequence}</text>
+<text matrix="1 0 0 1 -85.9951 -52.8593" transformations="translations" pos="88 780" stroke="black" type="label" width="62.123" height="6.536" depth="1.93" valign="baseline" style="math">\mathit{representation}</text>
+</page>
+</ipe>
diff --git a/slides/08/sequence_prediction_inference.svgz b/slides/08/sequence_prediction_inference.svgz
new file mode 100644
index 0000000..bf4030b
Binary files /dev/null and b/slides/08/sequence_prediction_inference.svgz differ
diff --git a/slides/08/sequence_prediction_inference.svgz.ref b/slides/08/sequence_prediction_inference.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/sequence_prediction_training.ipe b/slides/08/sequence_prediction_training.ipe
new file mode 100644
index 0000000..a8a2922
--- /dev/null
+++ b/slides/08/sequence_prediction_training.ipe
@@ -0,0 +1,343 @@
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20180409093007" modified="D:20200415115831"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<path layer="alpha" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<path stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<text matrix="1 0 0 1 -4 -20" transformations="translations" pos="88 780" stroke="black" type="label" width="28.527" height="9.803" depth="0" valign="baseline" size="Large" style="math">\textit{BOS}</text>
+<path stroke="black" arrow="normal/normal">
+80 720 m
+80 696 l
+</path>
+<text matrix="1 0 0 1 0 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">\hat x^{(0)}</text>
+<path stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<path matrix="1 0 0 1 64 0" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<text matrix="1 0 0 1 60 -20" transformations="translations" pos="88 780" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">x^{(0)}</text>
+<path stroke="black" arrow="normal/normal">
+144 720 m
+144 696 l
+</path>
+<text matrix="1 0 0 1 64 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">\hat x^{(1)}</text>
+<path matrix="1 0 0 1 64 0" stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<text matrix="1 0 0 1 124 -20" transformations="translations" pos="88 780" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">x^{(1)}</text>
+<path stroke="black" arrow="normal/normal">
+208 720 m
+208 696 l
+</path>
+<text matrix="1 0 0 1 128 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">\hat x^{(2)}</text>
+<path matrix="1 0 0 1 128 0" stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<text matrix="1 0 0 1 188 -20" transformations="translations" pos="88 780" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">x^{(2)}</text>
+<path stroke="black" arrow="normal/normal">
+272 720 m
+272 696 l
+</path>
+<text matrix="1 0 0 1 192 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">\hat x^{(3)}</text>
+<path matrix="1 0 0 1 192 0" stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="black">
+16 0 0 16 80 736 e
+</path>
+<text matrix="1 0 0 1 252 -20" transformations="translations" pos="88 780" stroke="black" type="label" width="21.211" height="12.678" depth="0" valign="baseline" size="Large" style="math">x^{(3)}</text>
+<path stroke="black" arrow="normal/normal">
+336 720 m
+336 696 l
+</path>
+<text matrix="1 0 0 1 256 -8" transformations="translations" pos="84 712" stroke="black" type="label" width="28.168" height="9.803" depth="0" valign="baseline" size="Large" style="math">\textit{EOS}</text>
+<path matrix="1 0 0 1 64 0" stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<path matrix="1 0 0 1 128 0" stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<path matrix="1 0 0 1 192 0" stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<path matrix="1 0 0 1 256 0" stroke="black" arrow="normal/normal">
+80 776 m
+80 752 l
+</path>
+<path matrix="1 0 0 1 -64 0" stroke="black" arrow="normal/normal">
+96 736 m
+128 736 l
+</path>
+<text matrix="1 0 0 1 -75.7749 -40.913" transformations="translations" pos="88 780" stroke="black" type="label" width="37.665" height="4.297" depth="1.93" valign="baseline" style="math">\mathit{sequence}</text>
+<text matrix="1 0 0 1 -85.9951 -52.8593" transformations="translations" pos="88 780" stroke="black" type="label" width="62.123" height="6.536" depth="1.93" valign="baseline" style="math">\mathit{representation}</text>
+</page>
+</ipe>
diff --git a/slides/08/sequence_prediction_training.svgz b/slides/08/sequence_prediction_training.svgz
new file mode 100644
index 0000000..722547a
Binary files /dev/null and b/slides/08/sequence_prediction_training.svgz differ
diff --git a/slides/08/sequence_prediction_training.svgz.ref b/slides/08/sequence_prediction_training.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/variational_rnn.svgz b/slides/08/variational_rnn.svgz
new file mode 100644
index 0000000..7552c81
Binary files /dev/null and b/slides/08/variational_rnn.svgz differ
diff --git a/slides/08/variational_rnn.svgz.ref b/slides/08/variational_rnn.svgz.ref
new file mode 100644
index 0000000..51c0c59
--- /dev/null
+++ b/slides/08/variational_rnn.svgz.ref
@@ -0,0 +1 @@
+Figure 1 of "A Theoretically Grounded Application of Dropout in Recurrent Neural Networks", https://arxiv.org/abs/1512.05287.pdf
diff --git a/slides/08/words_embeddings.ipe b/slides/08/words_embeddings.ipe
new file mode 100644
index 0000000..a776b44
--- /dev/null
+++ b/slides/08/words_embeddings.ipe
@@ -0,0 +1,316 @@
+<ipe version="70206" creator="Ipe 7.2.7">
+<info created="D:20180423074252" modified="D:20180423074430"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="small" value="\small"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textsize name="tiny" value="\tiny"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<text layer="alpha" matrix="1 0 0 1 -16 16" transformations="translations" pos="64 688" stroke="black" type="minipage" width="96" height="26.778" depth="21.84" valign="top" size="Large" style="center">Word in one-hot encoding</text>
+<path matrix="1 0 0 1 -16 16" stroke="black">
+72 696 m
+72 632 l
+152 632 l
+152 696 l
+h
+</path>
+<path matrix="1 0 0 1 312 112" stroke="black">
+72 696 m
+72 632 l
+152 632 l
+152 696 l
+h
+</path>
+<text matrix="1 0 0 1 160 8" transformations="translations" pos="208 776" stroke="black" type="label" width="12.053" height="9.803" depth="0" valign="baseline" size="Large" style="math">D</text>
+<text matrix="1 0 0 1 160 -8" transformations="translations" pos="252 820" stroke="black" type="label" width="17.143" height="9.805" depth="2.15" valign="baseline" size="Large" style="math">D_1</text>
+<path matrix="1 0 0 1 312 16" stroke="black">
+72 696 m
+72 632 l
+152 632 l
+152 696 l
+h
+</path>
+<text matrix="1 0 0 1 160 -88" transformations="translations" pos="208 776" stroke="black" type="label" width="12.053" height="9.803" depth="0" valign="baseline" size="Large" style="math">D</text>
+<text matrix="1 0 0 1 160 -104" transformations="translations" pos="252 820" stroke="black" type="label" width="17.143" height="9.805" depth="2.15" valign="baseline" size="Large" style="math">D_2</text>
+<path matrix="1 0 0 1 312 -80" stroke="black">
+72 696 m
+72 632 l
+152 632 l
+152 696 l
+h
+</path>
+<text matrix="1 0 0 1 160 -208" transformations="translations" pos="208 776" stroke="black" type="label" width="12.053" height="9.803" depth="0" valign="baseline" size="Large" style="math">D</text>
+<text matrix="1 0 0 1 160 -200" transformations="translations" pos="252 820" stroke="black" type="label" width="17.143" height="9.805" depth="2.15" valign="baseline" size="Large" style="math">D_3</text>
+<path matrix="1 0 0 1 160 0" stroke="black" arrow="normal/large">
+136 680 m
+224 776 l
+</path>
+<path matrix="1 0 0 1 160 0" stroke="black" arrow="normal/large">
+136 680 m
+224 584 l
+</path>
+<path matrix="1 0 0 1 160 0" stroke="black" arrow="normal/large">
+136 680 m
+224 680 l
+</path>
+<path matrix="1 0 0 1 144 16" stroke="black">
+72 696 m
+72 632 l
+152 632 l
+152 696 l
+h
+</path>
+<text matrix="1 0 0 1 -8 -88" transformations="translations" pos="208 776" stroke="black" type="label" width="11.311" height="9.803" depth="0" valign="baseline" size="Large" style="math">V</text>
+<text matrix="1 0 0 1 -8 -104" transformations="translations" pos="252 820" stroke="black" type="label" width="12.053" height="9.803" depth="0" valign="baseline" size="Large" style="math">D</text>
+<path stroke="black" arrow="normal/large">
+136 680 m
+216 680 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/words_embeddings.svgz b/slides/08/words_embeddings.svgz
new file mode 100644
index 0000000..59ab2f4
Binary files /dev/null and b/slides/08/words_embeddings.svgz differ
diff --git a/slides/08/words_embeddings.svgz.ref b/slides/08/words_embeddings.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/08/words_onehot.ipe b/slides/08/words_onehot.ipe
new file mode 100644
index 0000000..6ab09a2
--- /dev/null
+++ b/slides/08/words_onehot.ipe
@@ -0,0 +1,303 @@
+<ipe version="70206" creator="Ipe 7.2.7">
+<info created="D:20180423074252" modified="D:20180423074252"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="small" value="\small"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textsize name="tiny" value="\tiny"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<text layer="alpha" matrix="1 0 0 1 -16 16" transformations="translations" pos="64 688" stroke="black" type="minipage" width="96" height="26.778" depth="21.84" valign="top" size="Large" style="center">Word in one-hot encoding</text>
+<path matrix="1 0 0 1 -16 16" stroke="black">
+72 696 m
+72 632 l
+152 632 l
+152 696 l
+h
+</path>
+<path matrix="1 0 0 1 152 112" stroke="black">
+72 696 m
+72 632 l
+152 632 l
+152 696 l
+h
+</path>
+<text matrix="1 0 0 1 0 8" transformations="translations" pos="208 776" stroke="black" type="label" width="11.311" height="9.803" depth="0" valign="baseline" size="Large" style="math">V</text>
+<text matrix="1 0 0 1 0 -8" transformations="translations" pos="252 820" stroke="black" type="label" width="17.143" height="9.805" depth="2.15" valign="baseline" size="Large" style="math">D_1</text>
+<path matrix="1 0 0 1 152 16" stroke="black">
+72 696 m
+72 632 l
+152 632 l
+152 696 l
+h
+</path>
+<text matrix="1 0 0 1 0 -88" transformations="translations" pos="208 776" stroke="black" type="label" width="11.311" height="9.803" depth="0" valign="baseline" size="Large" style="math">V</text>
+<text matrix="1 0 0 1 0 -104" transformations="translations" pos="252 820" stroke="black" type="label" width="17.143" height="9.805" depth="2.15" valign="baseline" size="Large" style="math">D_2</text>
+<path matrix="1 0 0 1 152 -80" stroke="black">
+72 696 m
+72 632 l
+152 632 l
+152 696 l
+h
+</path>
+<text matrix="1 0 0 1 0 -208" transformations="translations" pos="208 776" stroke="black" type="label" width="11.311" height="9.803" depth="0" valign="baseline" size="Large" style="math">V</text>
+<text matrix="1 0 0 1 0 -200" transformations="translations" pos="252 820" stroke="black" type="label" width="17.143" height="9.805" depth="2.15" valign="baseline" size="Large" style="math">D_3</text>
+<path stroke="black" arrow="normal/large">
+136 680 m
+224 776 l
+</path>
+<path stroke="black" arrow="normal/large">
+136 680 m
+224 584 l
+</path>
+<path stroke="black" arrow="normal/large">
+136 680 m
+224 680 l
+</path>
+</page>
+</ipe>
diff --git a/slides/08/words_onehot.svgz b/slides/08/words_onehot.svgz
new file mode 100644
index 0000000..f283837
Binary files /dev/null and b/slides/08/words_onehot.svgz differ
diff --git a/slides/08/words_onehot.svgz.ref b/slides/08/words_onehot.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/09/09.md b/slides/09/09.md
new file mode 100644
index 0000000..a7bf206
--- /dev/null
+++ b/slides/09/09.md
@@ -0,0 +1,496 @@
+title: NPFL138, Lecture 9
+class: title, langtech, cc-by-sa
+style: .algorithm { background-color: #eee; padding: .5em }
+
+# Structured Prediction, CTC, Word2Vec
+
+## Milan Straka
+
+### April 15, 2024
+
+---
+section: Span Labeling
+class: middle, center
+# Structured Prediction
+
+# Structured Prediction
+
+---
+# Structured Prediction
+
+Consider generating a sequence of $y_1, \ldots, y_N ∈ 𝓨^N$ given input
+$→x_1, \ldots, →x_N$.
+
+~~~
+Predicting each sequence element independently models the distribution $P(y_i | ⇉X)$.
+
+![w=40%,h=center](labeling_independent.svgz)
+
+~~~
+However, there may be dependencies among the $y_i$ themselves, in the sense
+that not all sequences of $y_i$ are valid; but when generating each $y_i$
+independently, the model might not be capable of generating only valid
+sequences.
+
+---
+# Structured Prediction – Span Labeling
+
+Consider for example **named entity recognition**, whose goal is to locate
+_named entities_, which are single words or sequences of multiple words
+denoting real-world objects, concepts, and events.
+~~~
+The most common types of named entities include:
+- `PER`: _people_, including names of individuals, historical figures, and even
+  fictional characters;
+~~~
+- `ORG`: _organizations_, incorporating companies, government agencies,
+  educational institutions, and others;
+~~~
+- `LOC`: _locations_, encompassing countries, cities, geographical features,
+  addresses.
+
+~~~
+Compared to part-of-speech tagging, locating named entities is much more
+challenging – named entity mentions are generally multi-word spans, and
+arbitrary number of named entities can appear in a sentence (consequently,
+we cannot use accuracy for evaluation; F1-score is commonly used).
+
+~~~
+Named entity recognition is an instance of a **span labeling** task, where
+the goal is to locate and classify spans in the input sequence.
+
+---
+# Span Labeling – BIO Encoding
+
+A possible approach to a span labeling task is to classify every sequence
+element using a specialized tag set. A common approach is to use the
+**BIO** encoding, which consists of
+~~~
+- `O`: _outside_, the given element is not part of any span;
+
+~~~
+- `B-PER`, `B-ORG`, `B-LOC`, …: _beginning_, the element is first in a new span;
+~~~
+- `I-PER`, `I-ORG`, `I-LOC`, …: _inside_, a continuation element of an existing
+  span.
+
+~~~
+(Formally, the described scheme is IOB-2 format; there exists quite a few other
+possibilities like IOB-1, IEO, BILOU, …)
+
+~~~
+The described encoding can represent any set of continuous typed spans (when no spans
+overlap, i.e., a single element can belong to at most one span).
+
+---
+# Span Labeling – BIO Encoding
+
+However, when predicting each of the element tags independently, invalid
+sequences might be created.
+
+~~~
+- We can decide to ignore it and heuristics capable of recovering the spans
+  from invalid sequences of BIO tags.
+
+~~~
+- We can employ a decoding algorithm producing the most probable **valid
+  sequence** of tags during prediction.
+~~~
+  - However, during training we do not consider the BIO tags validity.
+
+~~~
+- We might use a different loss enabling the model to consider only
+  valid BIO tag sequences also during training.
+
+---
+# Span Labeling – Decoding Algorithm
+
+Let $→x_1, \ldots, →x_N$ be an input sequence.
+
+Our goal is to produce an output sequence $y_1, …, y_N$, where each $y_t ∈ 𝓨$
+with $Y$ classes.
+
+~~~
+Assume we have a model predicting $p(y_t = k | ⇉X; →θ)$, a probability that the
+$t$-th output element $y_t$ is the class $k$.
+
+~~~
+However, only some sequences $→y$ are valid.
+~~~
+We now make an assumption that the validity of a sequence depends only on the
+validity of **neighboring** output classes. In other words, if all neighboring
+pairs of output elements are valid, the whole sequence is.
+
+~~~
+- The validity of neighboring pairs can be described by a transition matrix $⇉A
+  ∈ \{0, 1\}^{Y×Y}$.
+~~~
+- Such an approach allows expressing the (in)validity of a BIO tag sequence.
+
+---
+# Span Labeling – Decoding Algorithm
+
+Let us denote $α_t(k)$ the log probability of the most probable output sequence
+of $t$ elements with the last one being $k$.
+
+~~~
+We can compute $α_t(k)$ efficiently using dynamic programming. The core idea is
+the following:
+
+![w=38%,h=center](crf_composability.svgz)
+
+~~~
+$$α_t(k) = \log p(y_t=k | ⇉X; →θ) + \max\nolimits_{j,\textrm{~such~that~}A_{j,k}\textrm{~is~valid}} α_{t-1}(j).$$
+
+~~~
+If we consider $\log A_{j,k}$ to be $-∞$ when $A_{j,k}=0$, we can rewrite the above as
+$$α_t(k) = \log p(y_t=k | ⇉X; →θ) + \max\nolimits_j \big(α_{t-1}(j) + \log A_{j,k}\big).$$
+
+~~~
+The resulting algorithm is also called the **Viterbi algorithm**, and it is also
+a search for the path of maximum length in an acyclic graph.
+
+---
+# Span Labeling – Decoding Algorithm
+
+<div class="algorithm">
+
+**Inputs**: Input sequence of length $N$, tag set with $Y$ tags.  
+**Inputs**: Model computing $p(y_t = k | ⇉X; →θ)$, a probability that $y_t$
+should have the class $k$.
+**Inputs**: Transition matrix $⇉A ∈ ℝ^{Y×Y}$ indicating _valid_ and _invalid_
+transitions.  
+**Outputs**: The most probable sequence $→y$ consisting of valid transitions
+only.  
+**Time Complexity**: $𝓞(N ⋅ Y^2)$ in the worst case.
+
+- For $t = 1, \ldots, N$:
+  - For $k = 1, \ldots, Y:$
+    - $α_t(k) ← \log p(y_t=k | ⇉X; →θ)$  _logits (unnormalized log probs) can also be used_
+    - If $t > 1$:
+      - $β_t(k) ← \argmax\nolimits_{j,\textrm{~such~that~}A_{j,k}\textrm{~is~valid}} α_{t-1}(j)$
+      - $α_t(k) ← α_t(k) + α_{t-1}\big(β_t(k)\big)$
+- The most probable sequence has the log probability $\max α_N$, and its
+  elements can be recovered by traversing $β$ from $t=N$ downto $t=1$.
+</div>
+
+---
+# Span Labeling – Other Approaches
+
+With deep learning models, constrained decoding is usually sufficient to deliver
+high performance.
+
+~~~
+Historically, there have been also other approaches:
+
+~~~
+- **Maximum Entropy Markov Models**
+
+  We might model the dependencies by explicitly conditioning on the previous
+  label:
+  $$P(y_i | ⇉X, y_{i-1}).$$
+
+~~~
+  Then, each label is predicted by a softmax from a hidden state and a
+  _previous label_.
+  ![w=35%,h=center](labeling_memm.svgz)
+
+~~~
+  The decoding can still be performed by a dynamic programming algorithm.
+
+---
+# Span Labeling – Other Approaches
+
+- **Conditional Random Fields (CRF)**
+
+  In the simplest variant, Linear-chain CRF, usually abbreviated only to CRF,
+  can be considered an extension of softmax – instead of a sequence of
+  independent softmaxes, it is a sentence-level softmax, with additional weights
+  for neighboring sequence elements.
+
+~~~
+  We start by defining a score of a label sequence $→y$ as
+  $$s(⇉X, →y; →θ, ⇉A) = f(y_1 | ⇉X; →θ) + ∑\nolimits_{i=2}^N \big(⇉A_{y_{i-1}, y_i} + f(y_i | ⇉X; →θ)\big),$$
+~~~
+  and define the probability of a label sequence $→y$ using $\softmax$:
+  $$p(→y | ⇉X) = \softmax_{→z ∈ Y^N}\big(s(⇉X, →z)\big)_{→y}.$$
+
+~~~
+  The probability $\log p(→y_\textrm{gold} | ⇉X)$ can be efficiently computed
+  using dynamic programming in a differentiable way, so it can be used in NLL
+  computation.
+
+~~~
+  For more details, see [Lecture 8 of NPFL114 2022/23 slides](https://ufal.mff.cuni.cz/~straka/courses/npfl114/2223/slides/?08).
+
+---
+section: CTC
+# Connectionist Temporal Classification
+
+Let us again consider generating a sequence of $y_1, \ldots, y_M$ given input
+$→x_1, \ldots, →x_N$, but this time $M ≤ N$, and there is no explicit alignment
+of $→x$ and $y$ in the gold data.
+
+~~~
+![w=100%,mh=90%,v=middle](ctc_example.svgz)
+
+---
+# Connectionist Temporal Classification
+
+We enlarge the set of the output labels by a – (**blank**), and perform a classification for every
+input element to produce an **extended labeling** (in contrast to the original **regular labeling**).
+We then post-process it by the following rules (denoted as $𝓑$):
+1. We collapse multiple neighboring occurrences of the same symbol into one.
+2. We remove the blank –.
+
+~~~
+Because the explicit alignment of inputs and labels is not known, we consider
+_all possible_ alignments.
+
+~~~
+Denoting the probability of label $l$ at time $t$ as $p_l^t$, we define
+$$α^t(s) ≝ ∑_{\substack{\textrm{extended}\\\textrm{labelings~}→π:\\𝓑(→π_{1:t}) = →y_{1:s}}} ∏_{i=1}^t p_{π_i}^i.$$
+
+---
+# Connectionist Temporal Classification
+
+## Computation
+
+When aligning an extended labeling to a regular one, we need to consider
+whether the extended labeling ends by a _blank_ or not. We therefore define
+$$\begin{aligned}
+  α_-^t(s) &≝ ∑_{\substack{\textrm{extended}\\\textrm{labelings~}→π:\\𝓑(→π_{1:t}) = →y_{1:s}, π_t=-}} ∏_{i=1}^t p_{π_i}^i \\
+  α_*^t(s) &≝ ∑_{\substack{\textrm{extended}\\\textrm{labelings~}→π:\\𝓑(→π_{1:t}) = →y_{1:s}, π_t≠-}} ∏_{i=1}^t p_{π_i}^i
+
+\end{aligned}$$
+and compute $α^t(s)$ as $α_-^t(s) + α_*^t(s)$.
+
+---
+# Connectionist Temporal Classification
+
+## Computation – Initialization
+
+![w=35%,f=right](ctc_computation.svgz)
+
+We initialize $α^1$ as follows:
+- $α_-^1(0) ← p_-^1$
+- $α_*^1(1) ← p_{y_1}^1$
+- all other $α^1$ to zeros
+
+~~~
+## Computation – Induction Step
+
+We then proceed recurrently according to:
+- $α_-^t(s) ← p_-^t \big(α_*^{t-1}(s) + α_-^{t-1}(s)\big)$
+
+~~~
+- $α_*^t(s) ← \begin{cases}
+  p_{y_s}^t\big(α_*^{t-1}(s) + α_-^{t-1}(s-1) + α_*^{t-1}(s-1)\big)\textrm{, if }y_s≠y_{s-1}\\
+  p_{y_s}^t\big(α_*^{t-1}(s) + α_-^{t-1}(s-1) + \sout{α_*^{t-1}(s-1)}\big)\textrm{, if }y_s=y_{s-1}\\
+\end{cases}$
+
+~~~
+  We can write the update as $p_{y_s}^t\big(α_*^{t-1}(s) + α_-^{t-1}(s-1) + [y_s≠y_{s-1}] ⋅ α_*^{t-1}(s-1)\big)$.
+
+---
+section: CTCDecoding
+# CTC Decoding
+
+Unlike BIO-tag structured prediction, nobody knows how to perform CTC decoding
+optimally in polynomial time.
+
+~~~
+The key observation is that while an optimal extended labeling can be extended
+into an optimal labeling of a greater length, the same does not apply to
+a regular labeling. The problem is that regular labeling corresponds to many
+extended labelings, which are modified each in a different way during an
+extension of the regular labeling.
+
+~~~
+![w=75%,h=center](ctc_decoding.svgz)
+
+---
+# CTC Decoding
+
+## Beam Search
+
+~~~
+To perform a beam search, we keep $k$ best **regular** (non-extended) labelings.
+Specifically, for each regular labeling $→y$ we keep both $α^t_-(→y)$ and
+$α^t_*(→y)$, which are probabilities of all (modulo beam search) extended
+labelings of length $t$ which produce the regular labeling $→y$; we therefore
+keep $k$ regular labelings with the highest $α^t_-(→y) + α^t_*(→y)$.
+
+~~~
+To compute the best regular labelings for a longer prefix of extended labelings,
+for each regular labeling in the beam we consider the following cases:
+~~~
+- adding a _blank_ symbol, i.e., contributing to $α^{t+1}_-(→y)$ both from
+  $α^t_-(→y)$ and $α^t_*(→y)$;
+~~~
+- adding a non-blank symbol, i.e., contributing to $α^{t+1}_*(•)$ from
+  $α^t_-(→y)$ and contributing to a possibly different $α^{t+1}_*(•)$ from
+  $α^t_*(→y)$.
+
+~~~
+Finally, we merge the resulting candidates according to their regular labeling, and
+keep only the $k$ best.
+
+---
+section: Word2Vec
+# Unsupervised Word Embeddings
+
+The embeddings can be trained for each task separately.
+
+~~~
+
+However, a method of precomputing word embeddings have been proposed, based on
+_distributional hypothesis_:
+
+> **Words that are used in the same contexts tend to have similar meanings**.
+
+~~~
+The distributional hypothesis is usually attributed to Firth (1957):
+> _You shall know a word by a company it keeps._
+
+---
+# Word2Vec
+
+![w=70%,h=center](word2vec.svgz)
+
+Mikolov et al. (2013) proposed two very simple architectures for precomputing
+word embeddings, together with a C multi-threaded implementation `word2vec`.
+
+---
+# Word2Vec
+
+![w=100%](word2vec_composability.svgz)
+
+---
+# Word2Vec – SkipGram Model
+
+![w=50%,h=center,mh=64%](word2vec.svgz)
+
+Considering input word $w_i$ and output $w_o$, the Skip-gram model defines
+$$p(w_o | w_i) ≝ \frac{e^{⇉V_{w_i}^\top ⇉W_{w_o}}}{∑_w e^{⇉V_{w_i}^\top ⇉W_w}}.$$
+After training, the final embeddings are the rows of the $⇉V$ matrix.
+
+---
+# Word2Vec – Hierarchical Softmax
+
+Instead of a large softmax, we construct a binary tree over the words, with
+a sigmoid classifier for each node.
+
+If word $w$ corresponds to a path $n_1, n_2, \ldots, n_L$, we define
+$$p_\textrm{HS}(w | w_i) ≝ ∏_{j=1}^{L-1} σ(\textrm{[+1 if }n_{j+1}\textrm{  is right child else -1]} ⋅ ⇉V_{w_i}^\top ⇉W_{n_j}).$$
+
+---
+# Word2Vec – Negative Sampling
+
+Instead of a large softmax, we could train individual sigmoids for all words.
+
+~~~
+We could also only sample several _negative examples_. This gives rise to the
+following _negative sampling_ objective (instead of just summing all the
+sigmoidal losses):
+$$l_\textrm{NEG}(w_o, w_i) ≝ -\log σ(⇉V_{w_i}^\top ⇉W_{w_o}) - ∑_{j=1}^k 𝔼_{w_j ∼ P(w)} \log \big(1 - σ(⇉V_{w_i}^\top ⇉W_{w_j})\big).$$
+
+~~~
+The usual value of negative samples $k$ is 5, but it can be even 2 for extremely
+large corpora.
+
+~~~
+Each expectation in the loss is estimated using a single sample.
+
+~~~
+For $P(w)$, both uniform and unigram distribution $U(w)$ work, but
+$$U(w)^{3/4}$$
+outperforms them significantly (this fact has been reported in several papers by
+different authors).
+
+---
+section: CLEs
+# Recurrent Character-level WEs
+
+![w=80%,h=center](../08/cle_rnn_examples.svgz)
+
+---
+# Convolutional Character-level WEs
+
+![w=100%](../08/cle_cnn_examples.svgz)
+
+---
+section: Subword Embeddings
+# Character N-grams
+
+Another simple idea appeared simultaneously in three nearly simultaneous
+publications as [Charagram](https://arxiv.org/abs/1607.02789), [Subword Information](https://arxiv.org/abs/1607.04606) or [SubGram](http://link.springer.com/chapter/10.1007/978-3-319-45510-5_21).
+
+A word embedding is a sum of the word embedding plus embeddings of its character
+_n_-grams. Such embedding can be pretrained using same algorithms as `word2vec`.
+
+~~~
+The implementation can be
+- dictionary based: only some number of frequent character _n_-grams is kept;
+~~~
+- hash-based: character _n_-grams are hashed into $K$ buckets
+  (usually $K ∼ 10^6$ is used).
+
+---
+# Charagram WEs
+
+![w=100%,v=middle](cle_charagram_examples.svgz)
+
+---
+# Charagram WEs
+
+![w=48%,h=center](cle_charagram_ngrams.svgz)
+
+---
+# FastText
+
+The word2vec enriched with subword embeddings is implemented in publicly
+available `fastText` library https://fasttext.cc/.
+
+~~~
+Pre-trained embeddings for 157 languages (including Czech) trained on
+Wikipedia and CommonCrawl are also available at
+https://fasttext.cc/docs/en/crawl-vectors.html.
+
+---
+section: ELMo
+# ELMo
+
+At the end of 2017, a new type of _deep contextualized_ word representations was
+proposed by Peters et al., called ELMo, **E**mbeddings from **L**anguage
+**Mo**dels.
+
+~~~
+The ELMo embeddings were based on a two-layer pre-trained LSTM language model,
+where a language model predicts following word based on a sentence prefix.
+~~~
+Specifically, two such models were used, one for the forward direction and the
+other one for the backward direction.
+~~~
+
+![w=30%](elmo_language_model.png)![w=68%](elmo_bidirectional.png)
+
+---
+# ELMo
+
+To compute an embedding of a word in a sentence, the concatenation of the two
+language model's hidden states is used.
+
+![w=68%,h=center](elmo_embedding.png)
+
+~~~
+To be exact, the authors propose to take a (trainable) weighted combination of
+the input embeddings and outputs on the first and second LSTM layers.
+
+---
+# ELMo Results
+
+Pre-trained ELMo embeddings substantially improved several NLP tasks.
+
+![w=100%](elmo_results.svgz)
+
diff --git a/slides/09/cle_charagram_examples.svgz b/slides/09/cle_charagram_examples.svgz
new file mode 100644
index 0000000..c38515c
Binary files /dev/null and b/slides/09/cle_charagram_examples.svgz differ
diff --git a/slides/09/cle_charagram_examples.svgz.ref b/slides/09/cle_charagram_examples.svgz.ref
new file mode 100644
index 0000000..c7dc8bd
--- /dev/null
+++ b/slides/09/cle_charagram_examples.svgz.ref
@@ -0,0 +1 @@
+Table 7 of "Enriching Word Vectors with Subword Information", https://arxiv.org/abs/1607.04606
diff --git a/slides/09/cle_charagram_ngrams.svgz b/slides/09/cle_charagram_ngrams.svgz
new file mode 100644
index 0000000..e47c49a
Binary files /dev/null and b/slides/09/cle_charagram_ngrams.svgz differ
diff --git a/slides/09/cle_charagram_ngrams.svgz.ref b/slides/09/cle_charagram_ngrams.svgz.ref
new file mode 100644
index 0000000..534f5a7
--- /dev/null
+++ b/slides/09/cle_charagram_ngrams.svgz.ref
@@ -0,0 +1 @@
+Figure 2 of "Enriching Word Vectors with Subword Information", https://arxiv.org/abs/1607.04606
diff --git a/slides/09/crf_composability.ipe b/slides/09/crf_composability.ipe
new file mode 100644
index 0000000..0ad99fb
--- /dev/null
+++ b/slides/09/crf_composability.ipe
@@ -0,0 +1,279 @@
+<ipe version="70206" creator="Ipe 7.2.7">
+<info created="D:20180514074910" modified="D:20180514075010"/>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<view layers="alpha" active="alpha"/>
+<text layer="alpha" matrix="1 0 0 1 0 8" transformations="translations" pos="272 704" stroke="black" type="label" width="6.619" height="9.405" depth="2.79" valign="baseline" size="Large" style="math">j</text>
+<path stroke="black">
+264 728 m
+264 704 l
+284 704 l
+284 728 l
+h
+</path>
+<path stroke="black">
+264 728 m
+184 728 l
+184 704 l
+264 704 l
+</path>
+<text matrix="1 0 0 1 28 8" transformations="translations" pos="272 704" stroke="black" type="label" width="7.787" height="9.963" depth="0" valign="baseline" size="Large" style="math">k</text>
+<path matrix="1 0 0 1 28 0" stroke="black">
+264 728 m
+264 704 l
+284 704 l
+284 728 l
+h
+</path>
+<text matrix="1 0 0 1 -52 -16" transformations="translations" pos="272 704" stroke="black" type="label" width="29.63" height="9.251" depth="1.19" valign="baseline" size="Large" style="math">t-1</text>
+</page>
+</ipe>
diff --git a/slides/09/crf_composability.svgz b/slides/09/crf_composability.svgz
new file mode 100644
index 0000000..fa386b3
Binary files /dev/null and b/slides/09/crf_composability.svgz differ
diff --git a/slides/09/crf_composability.svgz.ref b/slides/09/crf_composability.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/09/ctc_computation.svgz b/slides/09/ctc_computation.svgz
new file mode 100644
index 0000000..e01d84d
Binary files /dev/null and b/slides/09/ctc_computation.svgz differ
diff --git a/slides/09/ctc_computation.svgz.ref b/slides/09/ctc_computation.svgz.ref
new file mode 100644
index 0000000..34bef71
--- /dev/null
+++ b/slides/09/ctc_computation.svgz.ref
@@ -0,0 +1 @@
+Figure 7.3 of "Supervised Sequence Labelling with Recurrent Neural Networks" dissertation by Alex Graves
diff --git a/slides/09/ctc_decoding.svgz b/slides/09/ctc_decoding.svgz
new file mode 100644
index 0000000..ed3162a
Binary files /dev/null and b/slides/09/ctc_decoding.svgz differ
diff --git a/slides/09/ctc_decoding.svgz.ref b/slides/09/ctc_decoding.svgz.ref
new file mode 100644
index 0000000..6ff181c
--- /dev/null
+++ b/slides/09/ctc_decoding.svgz.ref
@@ -0,0 +1 @@
+Figure 7.5 of "Supervised Sequence Labelling with Recurrent Neural Networks" dissertation by Alex Graves
diff --git a/slides/09/ctc_example.svgz b/slides/09/ctc_example.svgz
new file mode 100644
index 0000000..fbc207e
Binary files /dev/null and b/slides/09/ctc_example.svgz differ
diff --git a/slides/09/ctc_example.svgz.ref b/slides/09/ctc_example.svgz.ref
new file mode 100644
index 0000000..28ea3d5
--- /dev/null
+++ b/slides/09/ctc_example.svgz.ref
@@ -0,0 +1 @@
+Figure 7.1 of "Supervised Sequence Labelling with Recurrent Neural Networks" dissertation by Alex Graves
diff --git a/slides/09/elmo_bidirectional.png b/slides/09/elmo_bidirectional.png
new file mode 100644
index 0000000..a0cff36
Binary files /dev/null and b/slides/09/elmo_bidirectional.png differ
diff --git a/slides/09/elmo_bidirectional.png.ref b/slides/09/elmo_bidirectional.png.ref
new file mode 100644
index 0000000..78cce7a
--- /dev/null
+++ b/slides/09/elmo_bidirectional.png.ref
@@ -0,0 +1 @@
+http://jalammar.github.io/images/elmo-forward-backward-language-model-embedding.png
diff --git a/slides/09/elmo_embedding.png b/slides/09/elmo_embedding.png
new file mode 100644
index 0000000..8b5c9d5
Binary files /dev/null and b/slides/09/elmo_embedding.png differ
diff --git a/slides/09/elmo_embedding.png.ref b/slides/09/elmo_embedding.png.ref
new file mode 100644
index 0000000..dd9c385
--- /dev/null
+++ b/slides/09/elmo_embedding.png.ref
@@ -0,0 +1 @@
+http://jalammar.github.io/images/elmo-embedding.png
diff --git a/slides/09/elmo_language_model.png b/slides/09/elmo_language_model.png
new file mode 100644
index 0000000..8090b9e
Binary files /dev/null and b/slides/09/elmo_language_model.png differ
diff --git a/slides/09/elmo_language_model.png.ref b/slides/09/elmo_language_model.png.ref
new file mode 100644
index 0000000..180b431
--- /dev/null
+++ b/slides/09/elmo_language_model.png.ref
@@ -0,0 +1 @@
+http://jalammar.github.io/images/Bert-language-modeling.png
diff --git a/slides/09/elmo_results.svgz b/slides/09/elmo_results.svgz
new file mode 100644
index 0000000..e363363
Binary files /dev/null and b/slides/09/elmo_results.svgz differ
diff --git a/slides/09/elmo_results.svgz.ref b/slides/09/elmo_results.svgz.ref
new file mode 100644
index 0000000..226a9ba
--- /dev/null
+++ b/slides/09/elmo_results.svgz.ref
@@ -0,0 +1 @@
+Table 1 of "Deep contextualized word representations", https://arxiv.org/abs/1802.05365
diff --git a/slides/09/labeling_independent.ipe b/slides/09/labeling_independent.ipe
new file mode 100644
index 0000000..1b3f2af
--- /dev/null
+++ b/slides/09/labeling_independent.ipe
@@ -0,0 +1,292 @@
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20200419171731" modified="D:20200419174058"/>
+<preamble>\usepackage{bm}</preamble>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<layer name="BBOX"/>
+<view layers="alpha" active="BBOX"/>
+<text layer="alpha" transformations="translations" pos="128 736" stroke="black" type="label" width="14.934" height="6.378" depth="2.15" valign="baseline" size="Large" style="math">\bm x_1</text>
+<path stroke="black" arrow="linear/normal">
+132 728 m
+132 680 l
+</path>
+<text matrix="1 0 0 1 0 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="12.329" height="6.176" depth="2.79" valign="baseline" size="Large" style="math">y_1</text>
+<text matrix="1 0 0 1 48 0" transformations="translations" pos="128 736" stroke="black" type="label" width="14.934" height="6.378" depth="2.15" valign="baseline" size="Large" style="math">\bm x_2</text>
+<path matrix="1 0 0 1 48 0" stroke="black" arrow="linear/normal">
+132 728 m
+132 680 l
+</path>
+<text matrix="1 0 0 1 48 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="12.329" height="6.176" depth="2.79" valign="baseline" size="Large" style="math">y_2</text>
+<text matrix="1 0 0 1 96 0" transformations="translations" pos="128 736" stroke="black" type="label" width="14.934" height="6.378" depth="2.15" valign="baseline" size="Large" style="math">\bm x_3</text>
+<path matrix="1 0 0 1 96 0" stroke="black" arrow="linear/normal">
+132 728 m
+132 680 l
+</path>
+<text matrix="1 0 0 1 96 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="12.329" height="6.176" depth="2.79" valign="baseline" size="Large" style="math">y_3</text>
+<text matrix="1 0 0 1 124 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="16.737" height="6.376" depth="0" valign="baseline" size="Large" style="math">\cdots</text>
+<text matrix="1 0 0 1 124 0" transformations="translations" pos="128 736" stroke="black" type="label" width="16.737" height="6.376" depth="0" valign="baseline" size="Large" style="math">\cdots</text>
+<text matrix="1 0 0 1 124 -36" transformations="translations" pos="128 736" stroke="black" type="label" width="16.737" height="6.376" depth="0" valign="baseline" size="Large" style="math">\cdots</text>
+<text matrix="1 0 0 1 160 0" transformations="translations" pos="128 736" stroke="black" type="label" width="19.044" height="6.378" depth="2.15" valign="baseline" size="Large" style="math">\bm x_N</text>
+<path matrix="1 0 0 1 160 0" stroke="black" arrow="linear/normal">
+132 728 m
+132 680 l
+</path>
+<text matrix="1 0 0 1 160 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="16.438" height="6.176" depth="2.79" valign="baseline" size="Large" style="math">y_N</text>
+<path layer="BBOX" stroke="black">
+120 752 m
+120 656 l
+312 656 l
+312 752 l
+h
+</path>
+</page>
+</ipe>
diff --git a/slides/09/labeling_independent.svgz b/slides/09/labeling_independent.svgz
new file mode 100644
index 0000000..0983a44
Binary files /dev/null and b/slides/09/labeling_independent.svgz differ
diff --git a/slides/09/labeling_independent.svgz.ref b/slides/09/labeling_independent.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/09/labeling_memm.ipe b/slides/09/labeling_memm.ipe
new file mode 100644
index 0000000..4fbb212
--- /dev/null
+++ b/slides/09/labeling_memm.ipe
@@ -0,0 +1,312 @@
+<ipe version="70206" creator="Ipe 7.2.9">
+<info created="D:20200419171731" modified="D:20200419174153"/>
+<preamble>\usepackage{bm}</preamble>
+<ipestyle name="basic">
+<symbol name="arrow/arc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/farc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/ptarc(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fptarc(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="mark/circle(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</symbol>
+<symbol name="mark/disk(sx)" transformations="translations">
+<path fill="sym-stroke">
+0.6 0 0 0.6 0 0 e
+</path>
+</symbol>
+<symbol name="mark/fdisk(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+0.5 0 0 0.5 0 0 e
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+0.6 0 0 0.6 0 0 e
+0.4 0 0 0.4 0 0 e
+</path>
+</group>
+</symbol>
+<symbol name="mark/box(sx)" transformations="translations">
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</symbol>
+<symbol name="mark/square(sx)" transformations="translations">
+<path fill="sym-stroke">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+</path>
+</symbol>
+<symbol name="mark/fsquare(sfx)" transformations="translations">
+<group>
+<path fill="sym-fill">
+-0.5 -0.5 m
+0.5 -0.5 l
+0.5 0.5 l
+-0.5 0.5 l
+h
+</path>
+<path fill="sym-stroke" fillrule="eofill">
+-0.6 -0.6 m
+0.6 -0.6 l
+0.6 0.6 l
+-0.6 0.6 l
+h
+-0.4 -0.4 m
+0.4 -0.4 l
+0.4 0.4 l
+-0.4 0.4 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="mark/cross(sx)" transformations="translations">
+<group>
+<path fill="sym-stroke">
+-0.43 -0.57 m
+0.57 0.43 l
+0.43 0.57 l
+-0.57 -0.43 l
+h
+</path>
+<path fill="sym-stroke">
+-0.43 0.57 m
+0.57 -0.43 l
+0.43 -0.57 l
+-0.57 0.43 l
+h
+</path>
+</group>
+</symbol>
+<symbol name="arrow/fnormal(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/pointed(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/fpointed(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-0.8 0 l
+-1 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/linear(spx)">
+<path stroke="sym-stroke" pen="sym-pen">
+-1 0.333 m
+0 0 l
+-1 -0.333 l
+</path>
+</symbol>
+<symbol name="arrow/fdouble(spx)">
+<path stroke="sym-stroke" fill="white" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<symbol name="arrow/double(spx)">
+<path stroke="sym-stroke" fill="sym-stroke" pen="sym-pen">
+0 0 m
+-1 0.333 l
+-1 -0.333 l
+h
+-1 0 m
+-2 0.333 l
+-2 -0.333 l
+h
+</path>
+</symbol>
+<pen name="heavier" value="0.8"/>
+<pen name="fat" value="1.2"/>
+<pen name="ultrafat" value="2"/>
+<symbolsize name="large" value="5"/>
+<symbolsize name="small" value="2"/>
+<symbolsize name="tiny" value="1.1"/>
+<arrowsize name="large" value="10"/>
+<arrowsize name="small" value="5"/>
+<arrowsize name="tiny" value="3"/>
+<color name="red" value="1 0 0"/>
+<color name="green" value="0 1 0"/>
+<color name="blue" value="0 0 1"/>
+<color name="yellow" value="1 1 0"/>
+<color name="orange" value="1 0.647 0"/>
+<color name="gold" value="1 0.843 0"/>
+<color name="purple" value="0.627 0.125 0.941"/>
+<color name="gray" value="0.745"/>
+<color name="brown" value="0.647 0.165 0.165"/>
+<color name="navy" value="0 0 0.502"/>
+<color name="pink" value="1 0.753 0.796"/>
+<color name="seagreen" value="0.18 0.545 0.341"/>
+<color name="turquoise" value="0.251 0.878 0.816"/>
+<color name="violet" value="0.933 0.51 0.933"/>
+<color name="darkblue" value="0 0 0.545"/>
+<color name="darkcyan" value="0 0.545 0.545"/>
+<color name="darkgray" value="0.663"/>
+<color name="darkgreen" value="0 0.392 0"/>
+<color name="darkmagenta" value="0.545 0 0.545"/>
+<color name="darkorange" value="1 0.549 0"/>
+<color name="darkred" value="0.545 0 0"/>
+<color name="lightblue" value="0.678 0.847 0.902"/>
+<color name="lightcyan" value="0.878 1 1"/>
+<color name="lightgray" value="0.827"/>
+<color name="lightgreen" value="0.565 0.933 0.565"/>
+<color name="lightyellow" value="1 1 0.878"/>
+<dashstyle name="dashed" value="[4] 0"/>
+<dashstyle name="dotted" value="[1 3] 0"/>
+<dashstyle name="dash dotted" value="[4 2 1 2] 0"/>
+<dashstyle name="dash dot dotted" value="[4 2 1 2 1 2] 0"/>
+<textsize name="large" value="\large"/>
+<textsize name="small" value="\small"/>
+<textsize name="tiny" value="\tiny"/>
+<textsize name="Large" value="\Large"/>
+<textsize name="LARGE" value="\LARGE"/>
+<textsize name="huge" value="\huge"/>
+<textsize name="Huge" value="\Huge"/>
+<textsize name="footnote" value="\footnotesize"/>
+<textstyle name="center" begin="\begin{center}" end="\end{center}"/>
+<textstyle name="itemize" begin="\begin{itemize}" end="\end{itemize}"/>
+<textstyle name="item" begin="\begin{itemize}\item{}" end="\end{itemize}"/>
+<gridsize name="4 pts" value="4"/>
+<gridsize name="8 pts (~3 mm)" value="8"/>
+<gridsize name="16 pts (~6 mm)" value="16"/>
+<gridsize name="32 pts (~12 mm)" value="32"/>
+<gridsize name="10 pts (~3.5 mm)" value="10"/>
+<gridsize name="20 pts (~7 mm)" value="20"/>
+<gridsize name="14 pts (~5 mm)" value="14"/>
+<gridsize name="28 pts (~10 mm)" value="28"/>
+<gridsize name="56 pts (~20 mm)" value="56"/>
+<anglesize name="90 deg" value="90"/>
+<anglesize name="60 deg" value="60"/>
+<anglesize name="45 deg" value="45"/>
+<anglesize name="30 deg" value="30"/>
+<anglesize name="22.5 deg" value="22.5"/>
+<opacity name="10%" value="0.1"/>
+<opacity name="30%" value="0.3"/>
+<opacity name="50%" value="0.5"/>
+<opacity name="75%" value="0.75"/>
+<tiling name="falling" angle="-60" step="4" width="1"/>
+<tiling name="rising" angle="30" step="4" width="1"/>
+</ipestyle>
+<page>
+<layer name="alpha"/>
+<layer name="BBOX"/>
+<view layers="alpha" active="BBOX"/>
+<text layer="alpha" transformations="translations" pos="128 736" stroke="black" type="label" width="14.934" height="6.378" depth="2.15" valign="baseline" size="Large" style="math">\bm x_1</text>
+<path stroke="black" arrow="linear/normal">
+132 728 m
+132 680 l
+</path>
+<text matrix="1 0 0 1 0 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="12.329" height="6.176" depth="2.79" valign="baseline" size="Large" style="math">y_1</text>
+<text matrix="1 0 0 1 48 0" transformations="translations" pos="128 736" stroke="black" type="label" width="14.934" height="6.378" depth="2.15" valign="baseline" size="Large" style="math">\bm x_2</text>
+<path matrix="1 0 0 1 48 0" stroke="black" arrow="linear/normal">
+132 728 m
+132 680 l
+</path>
+<text matrix="1 0 0 1 48 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="12.329" height="6.176" depth="2.79" valign="baseline" size="Large" style="math">y_2</text>
+<text matrix="1 0 0 1 96 0" transformations="translations" pos="128 736" stroke="black" type="label" width="14.934" height="6.378" depth="2.15" valign="baseline" size="Large" style="math">\bm x_3</text>
+<path matrix="1 0 0 1 96 0" stroke="black" arrow="linear/normal">
+132 728 m
+132 680 l
+</path>
+<text matrix="1 0 0 1 96 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="12.329" height="6.176" depth="2.79" valign="baseline" size="Large" style="math">y_3</text>
+<text matrix="1 0 0 1 124 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="16.737" height="6.376" depth="0" valign="baseline" size="Large" style="math">\cdots</text>
+<text matrix="1 0 0 1 124 0" transformations="translations" pos="128 736" stroke="black" type="label" width="16.737" height="6.376" depth="0" valign="baseline" size="Large" style="math">\cdots</text>
+<text matrix="1 0 0 1 124 -36" transformations="translations" pos="128 736" stroke="black" type="label" width="16.737" height="6.376" depth="0" valign="baseline" size="Large" style="math">\cdots</text>
+<text matrix="1 0 0 1 160 0" transformations="translations" pos="128 736" stroke="black" type="label" width="19.044" height="6.378" depth="2.15" valign="baseline" size="Large" style="math">\bm x_N</text>
+<path matrix="1 0 0 1 160 0" stroke="black" arrow="linear/normal">
+132 728 m
+132 680 l
+</path>
+<text matrix="1 0 0 1 160 -68" transformations="translations" pos="128 736" stroke="black" type="label" width="16.438" height="6.176" depth="2.79" valign="baseline" size="Large" style="math">y_N</text>
+<path stroke="black" pen="fat" arrow="linear/normal">
+132 680 m
+180 680 l
+</path>
+<path matrix="1 0 0 1 48 0" stroke="black" pen="fat" arrow="linear/normal">
+132 680 m
+180 680 l
+</path>
+<path stroke="black" pen="fat">
+228 680 m
+248 680 l
+</path>
+<path stroke="black" pen="fat" arrow="linear/normal">
+272 680 m
+292 680 l
+</path>
+<path stroke="black" dash="dotted" pen="fat">
+248 680 m
+272 680 l
+</path>
+<path layer="BBOX" stroke="black">
+120 752 m
+120 656 l
+312 656 l
+312 752 l
+h
+</path>
+</page>
+</ipe>
diff --git a/slides/09/labeling_memm.svgz b/slides/09/labeling_memm.svgz
new file mode 100644
index 0000000..4d8435c
Binary files /dev/null and b/slides/09/labeling_memm.svgz differ
diff --git a/slides/09/labeling_memm.svgz.ref b/slides/09/labeling_memm.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/09/word2vec.svgz b/slides/09/word2vec.svgz
new file mode 100644
index 0000000..a1bb3bb
Binary files /dev/null and b/slides/09/word2vec.svgz differ
diff --git a/slides/09/word2vec.svgz.ref b/slides/09/word2vec.svgz.ref
new file mode 100644
index 0000000..e69de29
diff --git a/slides/09/word2vec_composability.svgz b/slides/09/word2vec_composability.svgz
new file mode 100644
index 0000000..e060d02
Binary files /dev/null and b/slides/09/word2vec_composability.svgz differ
diff --git a/slides/09/word2vec_composability.svgz.ref b/slides/09/word2vec_composability.svgz.ref
new file mode 100644
index 0000000..30785b2
--- /dev/null
+++ b/slides/09/word2vec_composability.svgz.ref
@@ -0,0 +1 @@
+Table 8 of "Efficient Estimation of Word Representations in Vector Space", https://arxiv.org/abs/1301.3781
diff --git a/tasks/3d_recognition.md b/tasks/3d_recognition.md
new file mode 100644
index 0000000..eb3c23b
--- /dev/null
+++ b/tasks/3d_recognition.md
@@ -0,0 +1,28 @@
+### Assignment: 3d_recognition
+#### Date: Deadline: Apr 16, 22:00
+#### Points: 3 points+4 bonus
+
+Your goal in this assignment is to perform 3D object recognition. The input
+is voxelized representation of an object, stored as a _3D grid_ of either empty
+or occupied _voxels_, and your goal is to classify the object into one of
+10 classes. The data is available in two resolutions, either as
+[20×20×20 data](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/modelnet20.html)
+or [32×32×32 data](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/modelnet32.html).
+To load the dataset, use the
+[modelnet.py](https://github.com/ufal/npfl138/tree/master/labs/07/modelnet.py) module.
+
+The official dataset offers only train and test sets, with the **test set having
+a different distributions of labels**. Our dataset contains also a development
+set, which has **nearly the same** label distribution as the test set.
+
+If you want, it is possible to use any model from `keras.applications` in
+this assignment; however, the only way I know how to utilize such a pre-trained
+model is to render the objects to a set of 2D images and classify them instead.
+
+The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution
+achieving at least _88%_ test set accuracy gets 3 points; the remaining
+4 bonus points are distributed depending on relative ordering of your solutions.
+
+You can start with the
+[3d_recognition.py](https://github.com/ufal/npfl138/tree/master/labs/07/3d_recognition.py)
+template, which among others generates test set annotations in the required format.
diff --git a/tasks/bboxes_utils.md b/tasks/bboxes_utils.md
new file mode 100644
index 0000000..64c58c8
--- /dev/null
+++ b/tasks/bboxes_utils.md
@@ -0,0 +1,26 @@
+### Assignment: bboxes_utils
+#### Date: Deadline: Apr 09, 22:00
+#### Points: 2 points
+
+This is a preparatory assignment for `svhn_competition`. The goal is to
+implement several bounding box manipulation routines in the
+[bboxes_utils.py](https://github.com/ufal/npfl138/tree/master/labs/06/bboxes_utils.py)
+module. Notably, you need to implement the following methods:
+- `bboxes_to_rcnn`: convert given bounding boxes to a R-CNN-like
+  representation relative to the given anchors;
+- `bboxes_from_rcnn`: convert R-CNN-like representations relative to
+  given anchors back to bounding boxes;
+- `bboxes_training`: given a list of anchors and gold objects, assign gold
+  objects to anchors and generate suitable training data (the exact algorithm
+  is described in the template).
+
+The [bboxes_utils.py](https://github.com/ufal/npfl138/tree/master/labs/06/bboxes_utils.py)
+contains simple unit tests, which are evaluated when executing the module,
+which you can use to check the validity of your implementation. Note that
+the template does not contain type annotations because Python typing system is
+not flexible enough to describe the tensor shape changes.
+
+When submitting to ReCodEx, the method `main` is executed, returning the
+implemented `bboxes_to_rcnn`, `bboxes_from_rcnn` and `bboxes_training`
+methods. These methods are then executed and compared to the reference
+implementation.
diff --git a/tasks/cags_classification.md b/tasks/cags_classification.md
index eedad3a..42009a4 100644
--- a/tasks/cags_classification.md
+++ b/tasks/cags_classification.md
@@ -31,8 +31,8 @@ estimates on the batch) or in inference regime. There is one exception though
 inference regime even when `training == True`._
 
 The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution
-which achieves at least _93%_ test set accuracy will get 4 points; the rest
-5 points will be distributed depending on relative ordering of your solutions.
+achieving at least _93%_ test set accuracy gets 4 points; the remaining
+5 bonus points are distributed depending on relative ordering of your solutions.
 
 You may want to start with the
 [cags_classification.py](https://github.com/ufal/npfl138/tree/master/labs/05/cags_classification.py)
diff --git a/tasks/cags_segmentation.md b/tasks/cags_segmentation.md
index 7669d70..d9677e5 100644
--- a/tasks/cags_segmentation.md
+++ b/tasks/cags_segmentation.md
@@ -18,8 +18,8 @@ module, which can also evaluate your predictions (either by running with
 `evaluate_segmentation_file` method).
 
 The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution
-which achieves at least _87%_ test set IoU gets 4 points; the rest
-5 points will be distributed depending on relative ordering of your solutions.
+achieving at least _87%_ test set IoU gets 4 points; the remaining
+5 bonus points are distributed depending on relative ordering of your solutions.
 
 You may want to start with the
 [cags_segmentation.py](https://github.com/ufal/npfl138/tree/master/labs/05/cags_segmentation.py)
diff --git a/tasks/cifar_competition.md b/tasks/cifar_competition.md
index ae455fb..9505c18 100644
--- a/tasks/cifar_competition.md
+++ b/tasks/cifar_competition.md
@@ -8,8 +8,9 @@ You can load the data using the
 module. Note that the test set is different than that of official CIFAR-10.
 
 The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution
-which achieves at least _70%_ test set accuracy will get 4 points; the rest
-5 points will be distributed depending on relative ordering of your solutions.
+achieving at least _70%_ test set accuracy gets 4 points; the remaining
+5 bonus points are distributed depending on relative ordering of your solutions.
+
 Note that my solutions usually need to achieve around ~85% on the development
 set to score 70% on the test set.
 
diff --git a/tasks/cnn_manual.md b/tasks/cnn_manual.md
index 6d1483a..06f3a83 100644
--- a/tasks/cnn_manual.md
+++ b/tasks/cnn_manual.md
@@ -12,9 +12,9 @@ activation and `valid` padding, specified in the `args.cnn` option.
 The `args.cnn` contains comma-separated layer specifications in the format
 `filters-kernel_size-stride`.
 
-Of course, you cannot use any TensorFlow convolutional operation (instead,
+Of course, you cannot use any PyTorch convolutional operation (instead,
 implement the forward and backward pass using matrix multiplication and other
-operations), nor the `tf.GradientTape` for gradient computation.
+operations), nor the `.backward()` for gradient computation.
 
 To make debugging easier, the template supports a `--verify` option, which
 allows comparing the forward pass and the three gradients you compute in the
diff --git a/tasks/ctc_loss.md b/tasks/ctc_loss.md
new file mode 100644
index 0000000..a34cb03
--- /dev/null
+++ b/tasks/ctc_loss.md
@@ -0,0 +1,10 @@
+### Assignment: ctc_loss
+#### Date: Deadline: Apr 30, 22:00
+#### Points: 2 points
+
+**The template is being finalized, final version will be released shortly.**
+
+This assignment is an extension of `tagger_we` task. Using the
+`ctc_loss.py` template, manually implement the CTC loss computation
+and also greedy CTC decoding.
+
diff --git a/tasks/mnist_ensemble.md b/tasks/mnist_ensemble.md
index 9deb3d5..7bee385 100644
--- a/tasks/mnist_ensemble.md
+++ b/tasks/mnist_ensemble.md
@@ -8,7 +8,7 @@ Your goal in this assignment is to implement model ensembling.
 The [mnist_ensemble.py](https://github.com/ufal/npfl138/tree/master/labs/03/mnist_ensemble.py)
 template trains `args.models` individual models, and your goal is to perform
 an ensemble of the first model, first two models, first three models, …, all
-models, and evaluate their accuracy on the test set.
+models, and evaluate their accuracy on the development set.
 
 #### Tests Start: mnist_ensemble_tests
 _Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._
diff --git a/tasks/sequence_classification.md b/tasks/sequence_classification.md
new file mode 100644
index 0000000..96ce931
--- /dev/null
+++ b/tasks/sequence_classification.md
@@ -0,0 +1,127 @@
+### Assignment: sequence_classification
+#### Date: Deadline: Apr 23, 22:00
+#### Points: 2 points
+#### Tests: sequence_classification_tests
+#### Examples: sequence_classification_examples
+
+The goal of this assignment is to introduce recurrent neural networks, show
+their convergence speed, and illustrate exploding gradient issue. The network
+should process sequences of 50 small integers and compute parity for each prefix
+of the sequence. The inputs are either 0/1, or vectors with one-hot
+representation of small integer.
+
+Your goal is to modify the
+[sequence_classification.py](https://github.com/ufal/npfl138/tree/master/labs/08/sequence_classification.py)
+template and implement the following:
+- Use the specified RNN type (`SimpleRNN`, `GRU`, and `LSTM`) and dimensionality.
+- Process the sequence using the required RNN.
+- Use additional hidden layer on the RNN outputs if requested.
+- Implement gradient clipping if requested.
+
+In addition to submitting the task in ReCodEx, please also run the following
+variations and observe the results in TensorBoard.
+Concentrate on the way how the RNNs converge, convergence speed, exploding
+gradient issues and how gradient clipping helps:
+- `--rnn=SimpleRNN --sequence_dim=1`, `--rnn=GRU --sequence_dim=1`, `--rnn=LSTM --sequence_dim=1`
+- the same as above but with `--sequence_dim=3`
+- the same as above but with `--sequence_dim=10`
+- `--rnn=SimpleRNN --hidden_layer=85 --rnn_dim=30 --sequence_dim=30` and the same with `--clip_gradient=1`
+- the same as above but with `--rnn=GRU` with and without `--clip_gradient=1`
+- the same as above but with `--rnn=LSTM` with and without `--clip_gradient=1`
+
+#### Tests Start: sequence_classification_tests
+_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._
+
+1. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=SimpleRNN --epochs=5`
+```
+Epoch 1/5 accuracy: 0.4854 - loss: 0.7253 - val_accuracy: 0.5092 - val_loss: 0.6971
+Epoch 2/5 accuracy: 0.5101 - loss: 0.6944 - val_accuracy: 0.4990 - val_loss: 0.6914
+Epoch 3/5 accuracy: 0.5000 - loss: 0.6904 - val_accuracy: 0.5198 - val_loss: 0.6892
+Epoch 4/5 accuracy: 0.5200 - loss: 0.6887 - val_accuracy: 0.5328 - val_loss: 0.6875
+Epoch 5/5 accuracy: 0.5326 - loss: 0.6869 - val_accuracy: 0.5362 - val_loss: 0.6857
+```
+
+2. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=GRU --epochs=5`
+```
+Epoch 1/5 accuracy: 0.5277 - loss: 0.6925 - val_accuracy: 0.5217 - val_loss: 0.6921
+Epoch 2/5 accuracy: 0.5183 - loss: 0.6921 - val_accuracy: 0.5217 - val_loss: 0.6918
+Epoch 3/5 accuracy: 0.5185 - loss: 0.6919 - val_accuracy: 0.5217 - val_loss: 0.6914
+Epoch 4/5 accuracy: 0.5212 - loss: 0.6914 - val_accuracy: 0.5282 - val_loss: 0.6910
+Epoch 5/5 accuracy: 0.5320 - loss: 0.6904 - val_accuracy: 0.5355 - val_loss: 0.6905
+```
+
+3. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5`
+```
+Epoch 1/5 accuracy: 0.5359 - loss: 0.6926 - val_accuracy: 0.5361 - val_loss: 0.6925
+Epoch 2/5 accuracy: 0.5358 - loss: 0.6925 - val_accuracy: 0.5333 - val_loss: 0.6923
+Epoch 3/5 accuracy: 0.5370 - loss: 0.6923 - val_accuracy: 0.5369 - val_loss: 0.6920
+Epoch 4/5 accuracy: 0.5342 - loss: 0.6919 - val_accuracy: 0.5366 - val_loss: 0.6917
+Epoch 5/5 accuracy: 0.5378 - loss: 0.6915 - val_accuracy: 0.5444 - val_loss: 0.6914
+```
+
+4. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5 --hidden_layer=50`
+```
+Epoch 1/5 accuracy: 0.5377 - loss: 0.6923 - val_accuracy: 0.5414 - val_loss: 0.6911
+Epoch 2/5 accuracy: 0.5465 - loss: 0.6902 - val_accuracy: 0.5577 - val_loss: 0.6878
+Epoch 3/5 accuracy: 0.5600 - loss: 0.6862 - val_accuracy: 0.5450 - val_loss: 0.6811
+Epoch 4/5 accuracy: 0.5491 - loss: 0.6783 - val_accuracy: 0.5590 - val_loss: 0.6707
+Epoch 5/5 accuracy: 0.5539 - loss: 0.6678 - val_accuracy: 0.5433 - val_loss: 0.6591
+```
+
+5. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5 --hidden_layer=50 --clip_gradient=0.01`
+```
+Epoch 1/5 accuracy: 0.5421 - loss: 0.6923 - val_accuracy: 0.5409 - val_loss: 0.6910
+Epoch 2/5 accuracy: 0.5504 - loss: 0.6900 - val_accuracy: 0.5511 - val_loss: 0.6875
+Epoch 3/5 accuracy: 0.5566 - loss: 0.6860 - val_accuracy: 0.5494 - val_loss: 0.6816
+Epoch 4/5 accuracy: 0.5504 - loss: 0.6788 - val_accuracy: 0.5398 - val_loss: 0.6721
+Epoch 5/5 accuracy: 0.5539 - loss: 0.6699 - val_accuracy: 0.5494 - val_loss: 0.6624
+```
+#### Tests End:
+#### Examples Start: sequence_classification_examples
+_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._
+
+- `python3 sequence_classification.py --rnn=SimpleRNN --epochs=5`
+```
+Epoch 1/5 accuracy: 0.4984 - loss: 0.7004 - val_accuracy: 0.5223 - val_loss: 0.6884
+Epoch 2/5 accuracy: 0.5198 - loss: 0.6862 - val_accuracy: 0.5117 - val_loss: 0.6794
+Epoch 3/5 accuracy: 0.5132 - loss: 0.6784 - val_accuracy: 0.5121 - val_loss: 0.6732
+Epoch 4/5 accuracy: 0.5160 - loss: 0.6723 - val_accuracy: 0.5191 - val_loss: 0.6683
+Epoch 5/5 accuracy: 0.5235 - loss: 0.6680 - val_accuracy: 0.5276 - val_loss: 0.6639
+```
+
+- `python3 sequence_classification.py --rnn=GRU --epochs=5`
+```
+Epoch 1/5 accuracy: 0.5109 - loss: 0.6929 - val_accuracy: 0.5128 - val_loss: 0.6915
+Epoch 2/5 accuracy: 0.5174 - loss: 0.6894 - val_accuracy: 0.5155 - val_loss: 0.6785
+Epoch 3/5 accuracy: 0.5446 - loss: 0.6630 - val_accuracy: 0.9538 - val_loss: 0.2142
+Epoch 4/5 accuracy: 0.9812 - loss: 0.1270 - val_accuracy: 0.9987 - val_loss: 0.0304
+Epoch 5/5 accuracy: 0.9985 - loss: 0.0270 - val_accuracy: 0.9995 - val_loss: 0.0135
+```
+
+- `python3 sequence_classification.py --rnn=LSTM --epochs=5`
+```
+Epoch 1/5 accuracy: 0.5131 - loss: 0.6930 - val_accuracy: 0.5187 - val_loss: 0.6918
+Epoch 2/5 accuracy: 0.5187 - loss: 0.6892 - val_accuracy: 0.5340 - val_loss: 0.6760
+Epoch 3/5 accuracy: 0.6401 - loss: 0.5744 - val_accuracy: 1.0000 - val_loss: 0.0845
+Epoch 4/5 accuracy: 1.0000 - loss: 0.0585 - val_accuracy: 1.0000 - val_loss: 0.0194
+Epoch 5/5 accuracy: 1.0000 - loss: 0.0154 - val_accuracy: 1.0000 - val_loss: 0.0082
+```
+
+- `python3 sequence_classification.py --rnn=LSTM --epochs=5 --hidden_layer=85`
+```
+Epoch 1/5 accuracy: 0.5151 - loss: 0.6888 - val_accuracy: 0.5323 - val_loss: 0.6571
+Epoch 2/5 accuracy: 0.5387 - loss: 0.6497 - val_accuracy: 0.5575 - val_loss: 0.6321
+Epoch 3/5 accuracy: 0.5570 - loss: 0.6242 - val_accuracy: 0.6199 - val_loss: 0.5854
+Epoch 4/5 accuracy: 0.8367 - loss: 0.2854 - val_accuracy: 0.9897 - val_loss: 0.0503
+Epoch 5/5 accuracy: 0.9995 - loss: 0.0058 - val_accuracy: 0.9999 - val_loss: 0.0014
+```
+
+- `python3 sequence_classification.py --rnn=LSTM --epochs=5 --hidden_layer=85 --clip_gradient=1`
+```
+Epoch 1/5 accuracy: 0.5151 - loss: 0.6888 - val_accuracy: 0.5323 - val_loss: 0.6571
+Epoch 2/5 accuracy: 0.5387 - loss: 0.6497 - val_accuracy: 0.5582 - val_loss: 0.6321
+Epoch 3/5 accuracy: 0.5576 - loss: 0.6237 - val_accuracy: 0.6542 - val_loss: 0.5625
+Epoch 4/5 accuracy: 0.9033 - loss: 0.1909 - val_accuracy: 0.9999 - val_loss: 0.0014
+Epoch 5/5 accuracy: 0.9997 - loss: 0.0029 - val_accuracy: 1.0000 - val_loss: 4.4711e-04
+```
+#### Examples End:
diff --git a/tasks/sgd_manual.md b/tasks/sgd_manual.md
index fd268e7..d054d94 100644
--- a/tasks/sgd_manual.md
+++ b/tasks/sgd_manual.md
@@ -17,7 +17,7 @@ Start with the
 [sgd_manual.py](https://github.com/ufal/npfl138/tree/master/labs/02/sgd_manual.py)
 template, which is based on
 [sgd_backpropagation.py](https://github.com/ufal/npfl138/tree/master/labs/02/sgd_backpropagation.py)
-one. Be aware that these templates generates each a different output file.
+one.
 
 Note that ReCodEx disables the PyTorch automatic differentiation during
 evaluation.
diff --git a/tasks/speech_recognition.md b/tasks/speech_recognition.md
new file mode 100644
index 0000000..dc1587e
--- /dev/null
+++ b/tasks/speech_recognition.md
@@ -0,0 +1,42 @@
+### Assignment: speech_recognition
+#### Date: Deadline: Apr 30, 22:00
+#### Points: 5 points+5 bonus
+
+**The template is being finalized, final version will be released shortly.**
+
+This assignment is a competition task in speech recognition area. Specifically,
+your goal is to predict a sequence of letters given a spoken utterance.
+We will be using Czech recordings from the [Common Voice](https://commonvoice.mozilla.org/),
+with input sound waves passed through the usual preprocessing – computing
+[Mel-frequency cepstral coefficients (MFCCs)](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum).
+You can repeat this preprocessing on a given audio using the `load_audio` and
+`mfcc_extract` methods from the
+[common_voice_cs.py](https://github.com/ufal/npfl138/tree/master/labs/09/common_voice_cs.py) module.
+This module can also load the dataset, downloading it when necessary (note that
+it has 200MB, so it might take a while). Furthermore, you can listen to the
+[development portion of the dataset](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/common_voice_cs_dev.html).
+Lastly, the whole dataset is available for
+[download in MP3 format](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/common_voice_cs_mp3.tar)
+(but you are not expected to download that, only if you would like to perform some
+custom preprocessing).
+
+Additional following data can be utilized in this assignment:
+- You can use any _unannotated_ text data (Wikipedia, Czech National Corpus, …),
+  and also any pre-trained word embeddings or language models (assuming they
+  were trained on plain texts).
+- You can use any _unannotated_ speech data.
+
+The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions).
+The evaluation is performed by computing the edit distance to the gold letter
+sequence, normalized by its length (a corresponding metric
+`EditDistanceMetric` is provided by the [common_voice_cs.py](https://github.com/ufal/npfl138/tree/master/labs/09/common_voice_cs.py)).
+Everyone who submits a solution with at most 50% test set edit distance
+gets 5 points; the remaining 5 bonus points are distributed
+depending on relative ordering of your solutions. Note that
+you can evaluate the predictions as usual using the [common_voice_cs.py](https://github.com/ufal/npfl138/tree/master/labs/08/common_voice_cs.py)
+module, either by running with `--evaluate=path` arguments, or using its
+`evaluate_file` method.
+
+Start with the `speech_recognition.py`
+template which contains instructions for using the CTC loss and the CTC decoder,
+and it generates the test set annotation in the required format.
diff --git a/tasks/svhn_competition.md b/tasks/svhn_competition.md
new file mode 100644
index 0000000..902484f
--- /dev/null
+++ b/tasks/svhn_competition.md
@@ -0,0 +1,44 @@
+### Assignment: svhn_competition
+#### Date: Deadline: Apr 09, 22:00
+#### Points: 5 points+5 bonus
+
+The goal of this assignment is to implement a system performing object
+recognition, optionally utilizing the pretrained EfficientNetV2-B0 backbone
+(or any other model from `keras.applications`).
+
+The [Street View House Numbers (SVHN) dataset](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/svhn_train.html)
+annotates for every photo all digits appearing on it, including their bounding
+boxes. The dataset can be loaded using the [svhn_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_dataset.py)
+module. Similarly to the `CAGS` dataset, the `train/dev/test` are PyTorch
+`torch.utils.data.Dataset`s, and every element is a dictionary with the following keys:
+- `"image"`: a square 3-channel image stored using PyTorch tensor of type `torch.uint8`,
+- `"classes"`: a 1D `np.ndarray`  with all digit labels appearing in the image,
+- `"bboxes"`: a `[num_digits, 4]` 2D `np.ndarray` with bounding boxes of every
+  digit in the image, each represented as `[TOP, LEFT, BOTTOM, RIGHT]`.
+
+Each test set image annotation consists of a sequence of space separated
+five-tuples _label top left bottom right_, and the annotation is considered
+correct, if exactly the gold digits are predicted, each with IoU at least 0.5.
+The whole test set score is then the prediction accuracy of individual images.
+You can again evaluate your predictions using the
+[svhn_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_dataset.py)
+module, either by running with `--evaluate=path` arguments, or using its
+`evaluate_file` method.
+
+The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions).
+Everyone who submits a solution achieving at least _20%_ test set accuracy gets
+5 points; the remaining 5 bonus points are distributed depending on relative ordering
+of your solutions. Note that I usually need at least _35%_ development set
+accuracy to achieve the required test set performance.
+
+You should start with the
+[svhn_competition.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_competition.py)
+template, which generates the test set annotation in the required format.
+
+_A baseline solution can use RetinaNet-like single stage detector,
+using only a single level of convolutional features (no FPN)
+with single-scale and single-aspect anchors. Focal loss is available
+as [keras.losses.BinaryFocalCrossentropy](https://keras.io/api/losses/probabilistic_losses/#binaryfocalcrossentropy-class)
+and non-maximum suppression as
+[torchvision.ops.nms](https://pytorch.org/vision/main/generated/torchvision.ops.nms.html#nms) or
+[torchvision.ops.batched_nms](https://pytorch.org/vision/main/generated/torchvision.ops.batched_nms.html#batched-nms)._
diff --git a/tasks/tagger_cle.md b/tasks/tagger_cle.md
new file mode 100644
index 0000000..57bf95f
--- /dev/null
+++ b/tasks/tagger_cle.md
@@ -0,0 +1,50 @@
+### Assignment: tagger_cle
+#### Date: Deadline: Apr 23, 22:00
+#### Points: 3 points
+#### Tests: tagger_cle_tests
+#### Examples: tagger_cle_examples
+
+This assignment is a continuation of `tagger_we`. Using the
+[tagger_cle.py](https://github.com/ufal/npfl138/tree/master/labs/08/tagger_cle.py)
+template, implement character-level word embedding computation using
+a bidirectional character-level GRU.
+
+Once submitted to ReCodEx, you should experiment with the effect of CLEs
+compared to a plain `tagger_we`, and the influence of their dimensionality. Note
+that `tagger_cle` has by default smaller word embeddings so that the size
+of word representation (64 + 32 + 32) is the same as in the `tagger_we` assignment.
+
+#### Tests Start: tagger_cle_tests
+_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._
+
+1. `python3 tagger_cle.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16 --cle_dim=16`
+```
+Epoch=1/1 4.0s loss=2.2871 accuracy=0.2909 dev_loss=1.8784 dev_accuracy=0.4275
+```
+
+2. `python3 tagger_cle.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16 --cle_dim=16 --word_masking=0.1`
+```
+Epoch=1/1 4.4s loss=2.2846 accuracy=0.2875 dev_loss=1.8835 dev_accuracy=0.4289
+```
+#### Tests End:
+#### Examples Start: tagger_cle_examples
+_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._
+
+- `python3 tagger_cle.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=32 --cle_dim=32`
+```
+Epoch=1/5 22.6s loss=1.0757 accuracy=0.6784 dev_loss=0.3678 dev_accuracy=0.8969
+Epoch=2/5 21.5s loss=0.1476 accuracy=0.9684 dev_loss=0.1978 dev_accuracy=0.9375
+Epoch=3/5 22.1s loss=0.0490 accuracy=0.9881 dev_loss=0.1722 dev_accuracy=0.9488
+Epoch=4/5 21.3s loss=0.0303 accuracy=0.9912 dev_loss=0.1651 dev_accuracy=0.9470
+Epoch=5/5 21.1s loss=0.0201 accuracy=0.9942 dev_loss=0.1630 dev_accuracy=0.9479
+```
+
+- `python3 tagger_cle.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=32 --cle_dim=32 --word_masking=0.1`
+```
+Epoch=1/5 22.2s loss=1.1264 accuracy=0.6594 dev_loss=0.3980 dev_accuracy=0.8977
+Epoch=2/5 21.4s loss=0.2340 accuracy=0.9408 dev_loss=0.2175 dev_accuracy=0.9377
+Epoch=3/5 24.1s loss=0.1163 accuracy=0.9690 dev_loss=0.1624 dev_accuracy=0.9525
+Epoch=4/5 26.6s loss=0.0852 accuracy=0.9745 dev_loss=0.1493 dev_accuracy=0.9560
+Epoch=5/5 24.9s loss=0.0718 accuracy=0.9778 dev_loss=0.1450 dev_accuracy=0.9563
+```
+#### Examples End:
diff --git a/tasks/tagger_competition.md b/tasks/tagger_competition.md
new file mode 100644
index 0000000..659c7d9
--- /dev/null
+++ b/tasks/tagger_competition.md
@@ -0,0 +1,32 @@
+### Assignment: tagger_competition
+#### Date: Deadline: Apr 23, 22:00
+#### Points: 4 points+5 bonus
+
+In this assignment, you should extend `tagger_cle`
+into a real-world Czech part-of-speech tagger. We will use
+Czech PDT dataset loadable using the [morpho_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/08/morpho_dataset.py)
+module. Note that the dataset contains more than 1500 unique POS tags and that
+the POS tags have a fixed structure of 15 positions (so it is possible to
+generate the POS tag characters independently).
+
+You can use the following additional data in this assignment:
+- You can use outputs of a morphological analyzer loadable with
+  [morpho_analyzer.py](https://github.com/ufal/npfl138/tree/master/labs/08/morpho_analyzer.py).
+  If a word form in train, dev or test PDT data is known to the analyzer,
+  all its _(lemma, POS tag)_ pairs are returned.
+- You can use any _unannotated_ text data (Wikipedia, Czech National Corpus, …),
+  and also any pre-trained word embeddings (assuming they were trained on plain
+  texts).
+
+The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions).
+Everyone who submits a solution with at least 92.5% label accuracy gets
+4 points; the remaining 5 bonus points are distributed depending on relative ordering
+of your solutions. Lastly, **3 bonus points** will be given to anyone surpassing
+pre-neural-network state-of-the-art of **96.35%**.
+
+You can start with the
+[tagger_competition.py](https://github.com/ufal/npfl138/tree/master/labs/08/tagger_competition.py)
+template, which among others generates test set annotations in the required format. Note that
+you can evaluate the predictions as usual using the [morpho_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/08/morpho_dataset.py)
+module, either by running with `--task=tagger --evaluate=path` arguments, or using its
+`evaluate_file` method.
diff --git a/tasks/tagger_ner.md b/tasks/tagger_ner.md
new file mode 100644
index 0000000..a8dd777
--- /dev/null
+++ b/tasks/tagger_ner.md
@@ -0,0 +1,18 @@
+### Assignment: tagger_ner
+#### Date: Deadline: Apr 30, 22:00
+#### Points: 2 points
+
+**The template is being finalized, final version will be released shortly.**
+
+This assignment is an extension of `tagger_we` task. Using the
+`tagger_ner.py`
+template, implement optimal decoding of named entity spans from
+BIO-encoded tags.
+
+The evaluation is performed using the provided metric computing F1 score of the
+span prediction (i.e., a recognized possibly-multiword named entity is true
+positive if both the entity type and the span exactly match).
+
+In practice, character-level embeddings (and also pre-trained word embeddings)
+would be used to obtain superior results.
+
diff --git a/tasks/tagger_we.md b/tasks/tagger_we.md
new file mode 100644
index 0000000..648b30e
--- /dev/null
+++ b/tasks/tagger_we.md
@@ -0,0 +1,56 @@
+### Assignment: tagger_we
+#### Date: Deadline: Apr 23, 22:00
+#### Points: 3 points
+#### Tests: tagger_we_tests
+#### Examples: tagger_we_examples
+
+In this assignment you will create a simple part-of-speech tagger. For training
+and evaluation, we will use Czech dataset containing tokenized sentences, each
+word annotated by gold lemma and part-of-speech tag. The
+[morpho_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/08/morpho_dataset.py)
+module (down)loads the dataset and provides mappings between strings and integers.
+
+Your goal is to modify the
+[tagger_we.py](https://github.com/ufal/npfl138/tree/master/labs/08/tagger_we.py)
+template and implement the following:
+- Use specified RNN layer type (`GRU` and `LSTM`) and dimensionality.
+- Create word embeddings for training vocabulary.
+- Process the sentences using bidirectional RNN.
+- Predict part-of-speech tags.
+Note that you need to properly handle sentences of different lengths in one
+batch.
+
+#### Tests Start: tagger_we_tests
+_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._
+
+1. `python3 tagger_we.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16`
+```
+Epoch=1/1 3.1s loss=2.3541 accuracy=0.3138 dev_loss=2.0320 dev_accuracy=0.3611
+```
+
+2. `python3 tagger_we.py --epochs=1 --max_sentences=1000 --rnn=GRU --rnn_dim=16`
+```
+Epoch=1/1 3.2s loss=2.1970 accuracy=0.4233 dev_loss=1.5569 dev_accuracy=0.5121
+```
+#### Tests End:
+#### Examples Start: tagger_we_examples
+_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._
+
+- `python3 tagger_we.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=64`
+```
+Epoch=1/5 21.1s loss=0.9776 accuracy=0.7080 dev_loss=0.3744 dev_accuracy=0.8814
+Epoch=2/5 19.2s loss=0.1060 accuracy=0.9736 dev_loss=0.2947 dev_accuracy=0.9013
+Epoch=3/5 19.4s loss=0.0291 accuracy=0.9921 dev_loss=0.2794 dev_accuracy=0.9057
+Epoch=4/5 19.7s loss=0.0166 accuracy=0.9960 dev_loss=0.2976 dev_accuracy=0.9015
+Epoch=5/5 19.7s loss=0.0096 accuracy=0.9978 dev_loss=0.3159 dev_accuracy=0.8957
+```
+
+- `python3 tagger_we.py --epochs=5 --max_sentences=5000 --rnn=GRU --rnn_dim=64`
+```
+Epoch=1/5 20.5s loss=0.7698 accuracy=0.7703 dev_loss=0.3432 dev_accuracy=0.8903
+Epoch=2/5 18.9s loss=0.0735 accuracy=0.9807 dev_loss=0.2999 dev_accuracy=0.8969
+Epoch=3/5 19.0s loss=0.0245 accuracy=0.9923 dev_loss=0.3244 dev_accuracy=0.8965
+Epoch=4/5 19.2s loss=0.0153 accuracy=0.9955 dev_loss=0.3302 dev_accuracy=0.8929
+Epoch=5/5 19.0s loss=0.0088 accuracy=0.9977 dev_loss=0.3641 dev_accuracy=0.8923
+```
+#### Examples End:
diff --git a/tasks/tensorboard_projector.md b/tasks/tensorboard_projector.md
new file mode 100644
index 0000000..824bf8b
--- /dev/null
+++ b/tasks/tensorboard_projector.md
@@ -0,0 +1,13 @@
+### Assignment: tensorboard_projector
+
+You can try exploring the TensorBoard Projector with pre-trained embeddings
+for 20k most frequent lemmas in
+[Czech](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/cs_lemma_20k.zip)
+and [English](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/en_lemma_20k.zip)
+– after extracting the archive, start
+`tensorboard --logdir dir_where_the_archive_is_extracted`.
+
+In order to use the Projector tab yourself, you can take inspiration from the
+[projector_export.py](https://github.com/ufal/npfl138/tree/master/labs/09/projector_export.py)
+script, which was used to export the above pre-trained embeddings from the
+Word2vec format.
diff --git a/tasks/uppercase.md b/tasks/uppercase.md
index 66288d4..5c9a9e4 100644
--- a/tasks/uppercase.md
+++ b/tasks/uppercase.md
@@ -15,8 +15,8 @@ only used to understand the approach you took, and to indicate teams).
 Explicitly, submit **exactly one .txt file** and **at least one .py/ipynb file**.
 
 The task is also a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits
-a solution which achieves at least _98.5%_ accuracy will get 4 basic points; the
-5 bonus points will be distributed depending on relative ordering of your
+a solution achieving at least _98.5%_ accuracy gets 4 basic points; the
+remaining 5 bonus points are distributed depending on relative ordering of your
 solutions. The accuracy is computed per-character and can be evaluated
 by running [uppercase_data.py](https://github.com/ufal/npfl138/tree/master/labs/03/uppercase_data.py)
 with `--evaluate` argument, or using its `evaluate_file` method.