diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..ddb02da --- /dev/null +++ b/.gitignore @@ -0,0 +1,6 @@ +.venv +logs +*.tfrecord +*.npz +*.zip +*.pkl diff --git a/exam/questions.md b/exam/questions.md index 06452e5..2499f71 100644 --- a/exam/questions.md +++ b/exam/questions.md @@ -108,6 +108,8 @@ - Compare Cutout and DropBlock. [5] +- Describe in detail how is CutMix performed. [5] + - Describe Squeeze and Excitation applied to a ResNet block. [5] - Draw the Mobile inverted bottleneck block (including explanation of separable @@ -119,3 +121,91 @@ channels. Write down (or derive) the equation of transposed convolution (or equivalently backpropagation through a convolution to its inputs). [5] +#### Questions@:, Lecture 6 Questions +- Describe the differences among semantic segmentation, image classification, + object detection, and instance segmentation, and write down which metrics + are used for these tasks. [5] + +- Write down how is $\mathit{AP}_{50}$ computed. [5] + +- Considering a Fast-RCNN architecture, draw overall network architecture, + explain what a RoI-pooling layer is, show how the network parametrizes + bounding boxes and write down the loss. Finally, describe non-maximum + suppression and how the Fast-RCNN prediction is performed. [10] + +- Considering a Faster-RCNN architecture, describe the region proposal network + (what are anchors, architecture including both heads, how are the coordinates + of proposals parametrized, what does the loss look like). [10] + +- Considering Mask-RCNN architecture, describe the additions to a Faster-RCNN + architecture (the RoI-Align layer, the new mask-producing head). [5] + +- Write down the focal loss with class weighting, including the commonly used + hyperparameter values. [5] + +- Draw the overall architecture of a RetinaNet architecture (the computation of + $C_1, \ldots, C_7$, the FPN architecture computing $P_1, \ldots, P_7$ + including the block combining feature maps of different resolutions; the + classification and bounding box generation heads, including their output + size). Write down the losses for both heads. [10] + +- Describe GroupNorm, and compare it to BatchNorm and LayerNorm. [5] + +#### Questions@:, Lecture 8 Questions +- Write down how the Long Short-Term Memory (LSTM) cell operates, including + the explicit formulas. Also mention the forget gate bias. [10] + +- Write down how the Gated Recurrent Unit (GRU) operates, including + the explicit formulas. [10] + +- Describe Highway network computation. [5] + +- Why the usual dropout cannot be used on recurrent state? Describe + how the problem can be alleviated with variational dropout. [5] + +- Describe layer normalization including all its parameters, and write down how + it is computed (be sure to explicitly state over what is being normalized in + case of fully connected layers and convolutional layers). [5] + +- Draw a tagger architecture utilizing word embeddings, recurrent + character-level word embeddings (including how are these computed from + individual characters), and two sentence-level bidirectional RNNs (explaining + the bidirectionality) with a residual connection. Where would you put the + dropout layers? [10] + +#### Questions@:, Lecture 9 Questions +- In the context of named entity recognition, describe what the BIO encoding + is and why it is used. [5] + +- Write down the dynamic programming algorithm for decoding a BIO-tag sequence, + including its asymptotic complexity. [10] + +- In the context of CTC loss, describe regular and extended labelings and + write down the algorithm for computing the log probability of a gold label + sequence $\boldsymbol y$. [10] + +- Describe how CTC predictions are performed using a beam-search. [5] + +- Draw the CBOW architecture from `word2vec`, including the sizes of the inputs + and the sizes of the outputs and used non-linearities. Also make sure to + indicate where the embeddings are being trained. [5] + +- Draw the SkipGram architecture from `word2vec`, including the sizes of the + inputs and the sizes of the outputs and used non-linearities. Also make sure + to indicate where the embeddings are being trained. [5] + +- Describe the hierarchical softmax used in `word2vec`. [5] + +- Describe the negative sampling proposed in `word2vec`, including + the choice of distribution of negative samples. [5] + +#### Questions@:, Lecture 10 Questions +- Write down why are subword units used in text processing, and describe the BPE + algorithm for constructing a subword dictionary from a large corpus. [5] + +- Write down why are subword units used in text processing, and describe the + WordPieces algorithm for constructing a subword dictionary from a large + corpus. [5] + +- Pinpoint the differences between the BPE and WordPieces algorithms, both + during dictionary construction and during inference. [5] diff --git a/labs/.gitignore b/labs/.gitignore index 6319f80..acfd147 100644 --- a/labs/.gitignore +++ b/labs/.gitignore @@ -3,5 +3,5 @@ logs/ *.h5 *.keras *.npz -*.pickle +*.tfrecord *.zip diff --git a/labs/04/cifar10.py b/labs/04/cifar10.py index 0ed0533..ec06755 100644 --- a/labs/04/cifar10.py +++ b/labs/04/cifar10.py @@ -33,7 +33,8 @@ def dataset(self, transform: Callable[[dict[str, np.ndarray]], Any] | None = Non return CIFAR10.TorchDataset(self, transform) class TorchDataset(torch.utils.data.Dataset): - def __init__(self, dataset: "Dataset", transform: Callable[[dict[str, np.ndarray]], Any] | None) -> None: + def __init__(self, dataset: "CIFAR10.Dataset", + transform: Callable[[dict[str, np.ndarray]], Any] | None) -> None: self._dataset = dataset self._transform = transform diff --git a/labs/04/cifar10_v2.py b/labs/04/cifar10_v2.py new file mode 100644 index 0000000..e6b6748 --- /dev/null +++ b/labs/04/cifar10_v2.py @@ -0,0 +1,99 @@ +import os +import sys +from typing import Any, Callable, Sequence, TextIO, TypedDict +import urllib.request + +import numpy as np +import torch + + +class CIFAR10: + H: int = 32 + W: int = 32 + C: int = 3 + LABELS: list[str] = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"] + + Element = TypedDict("Element", {"image": np.ndarray, "label": np.ndarray}) + Elements = TypedDict("Elements", {"images": np.ndarray, "labels": np.ndarray}) + + _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/cifar10_competition.npz" + + class Dataset(torch.utils.data.Dataset): + def __init__(self, data: "CIFAR10.Elements") -> None: + self._data = data + self._data["labels"] = self._data["labels"].ravel() + + @property + def data(self) -> "CIFAR10.Elements": + return self._data + + def __len__(self) -> int: + return len(self._data["images"]) + + def __getitem__(self, index: int) -> "CIFAR10.Element": + return {key.removesuffix("s"): value[index] for key, value in self._data.items()} + + def transform(self, transform: Callable[["CIFAR10.Element"], Any]) -> "CIFAR10.TransformedDataset": + return CIFAR10.TransformedDataset(self, transform) + + class TransformedDataset(torch.utils.data.Dataset): + def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None: + self._dataset = dataset + self._transform = transform + + def __len__(self) -> int: + return len(self._dataset) + + def __getitem__(self, index: int) -> Any: + item = self._dataset[index] + return self._transform(*item) if isinstance(item, tuple) else self._transform(item) + + def transform(self, transform: Callable[..., Any]) -> "CIFAR10.TransformedDataset": + return CIFAR10.TransformedDataset(self, transform) + + def __init__(self, size: dict[str, int] = {}) -> None: + path = os.path.basename(self._URL) + if not os.path.exists(path): + print("Downloading CIFAR-10 dataset...", file=sys.stderr) + urllib.request.urlretrieve(self._URL, filename="{}.tmp".format(path)) + os.rename("{}.tmp".format(path), path) + + cifar = np.load(path) + for dataset in ["train", "dev", "test"]: + data = {key[len(dataset) + 1:]: cifar[key][:size.get(dataset, None)] + for key in cifar if key.startswith(dataset)} + setattr(self, dataset, self.Dataset(data)) + + train: Dataset + dev: Dataset + test: Dataset + + # Evaluation infrastructure. + @staticmethod + def evaluate(gold_dataset: Dataset, predictions: Sequence[int]) -> float: + gold = gold_dataset.data["labels"] + + if len(predictions) != len(gold): + raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format( + len(predictions), len(gold))) + + correct = sum(gold[i] == predictions[i] for i in range(len(gold))) + return 100 * correct / len(gold) + + @staticmethod + def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float: + predictions = [int(line) for line in predictions_file] + return CIFAR10.evaluate(gold_dataset, predictions) + + +if __name__ == "__main__": + import argparse + parser = argparse.ArgumentParser() + parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate") + parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate") + args = parser.parse_args() + + if args.evaluate: + with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file: + accuracy = CIFAR10.evaluate_file(getattr(CIFAR10(), args.dataset), predictions_file) + print("CIFAR10 accuracy: {:.2f}%".format(accuracy)) diff --git a/labs/05/cags_dataset.py b/labs/05/cags_dataset.py index 782d19f..127bc21 100644 --- a/labs/05/cags_dataset.py +++ b/labs/05/cags_dataset.py @@ -1,7 +1,8 @@ +import array import os import sys import struct -from typing import Any, Callable, Sequence, TextIO +from typing import Any, Callable, Sequence, TextIO, TypedDict import urllib.request os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise @@ -28,45 +29,60 @@ class CAGS: "scottish_terrier", "shiba_inu", "staffordshire_bull_terrier", "wheaten_terrier", "yorkshire_terrier", ] + Element = TypedDict("Element", {"image": torch.Tensor, "mask": torch.Tensor, "label": torch.Tensor}) _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/" class Dataset(torch.utils.data.Dataset): - def __init__(self, path: str, size: int) -> None: - self._path = path - self._data = None + def __init__(self, path: str, size: int, decode_on_demand: bool) -> None: self._size = size + arrays, indices = CAGS._load_data(path, size) + if decode_on_demand: + self._data, self._arrays, self._indices = None, arrays, indices + else: + self._data = [self._decode(arrays, indices, i) for i in range(size)] + def __len__(self) -> int: return self._size - def __getitem__(self, index: int) -> dict[str, torch.Tensor]: - if self._data is None: - self._data = [] - for entry in CAGS._load_data(self._path, self._size): - entry["image"] = torchvision.io.decode_image( - torch.from_numpy(entry["image"]), torchvision.io.ImageReadMode.RGB).permute(1, 2, 0) - entry["mask"] = (torchvision.io.decode_image(torch.from_numpy(entry["mask"])).to( - dtype=torch.float32) / 255).permute(1, 2, 0) - entry["label"] = torch.tensor(entry["label"][0]) - self._data.append(entry) - return self._data[index] - - def transform(self, transform: Callable[[dict[str, torch.Tensor]], Any]) -> torch.utils.data.Dataset: + def __getitem__(self, index: int) -> "CAGS.Element": + if self._data: + return self._data[index] + return self._decode(self._arrays, self._indices, index) + + def transform(self, transform: Callable[["CAGS.Element"], Any]) -> "CAGS.TransformedDataset": return CAGS.TransformedDataset(self, transform) + def _decode(self, data: dict, indices: dict, index: int) -> "CAGS.Element": + return { + "image": torchvision.io.decode_image( + torch.frombuffer(data["image"], dtype=torch.uint8, offset=indices["image"][:-1][index], + count=indices["image"][1:][index] - indices["image"][:-1][index]), + torchvision.io.ImageReadMode.RGB).permute(1, 2, 0), + "mask": torchvision.io.decode_image( + torch.frombuffer(data["mask"], dtype=torch.uint8, offset=indices["mask"][:-1][index], + count=indices["mask"][1:][index] - indices["mask"][:-1][index]), + torchvision.io.ImageReadMode.GRAY).to(dtype=torch.float32).div(255).permute(1, 2, 0), + "label": torch.tensor(data["label"][index]), + } + class TransformedDataset(torch.utils.data.Dataset): - def __init__(self, dataset: "Dataset", transform: Callable[[dict[str, torch.Tensor]], Any]) -> None: + def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None: self._dataset = dataset self._transform = transform def __len__(self) -> int: - return self._dataset._size + return len(self._dataset) def __getitem__(self, index: int) -> Any: - return self._transform(self._dataset[index]) + item = self._dataset[index] + return self._transform(*item) if isinstance(item, tuple) else self._transform(item) - def __init__(self) -> None: + def transform(self, transform: Callable[..., Any]) -> "CAGS.TransformedDataset": + return CAGS.TransformedDataset(self, transform) + + def __init__(self, decode_on_demand: bool = False) -> None: for dataset, size in [("train", 2_142), ("dev", 306), ("test", 612)]: path = "cags.{}.tfrecord".format(dataset) if not os.path.exists(path): @@ -74,7 +90,7 @@ def __init__(self) -> None: urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path)) os.rename("{}.tmp".format(path), path) - setattr(self, dataset, self.Dataset(path, size)) + setattr(self, dataset, self.Dataset(path, size, decode_on_demand)) train: Dataset dev: Dataset @@ -82,24 +98,22 @@ def __init__(self) -> None: # TFRecord loading @staticmethod - def _load_data(path: str, items: int) -> list[dict[str, Any]]: - def get_value() -> int: + def _load_data(path: str, items: int) -> tuple[dict[str, array.array], dict[str, array.array]]: + def get_value() -> np.int64: nonlocal data, offset value = np.int64(data[offset] & 0x7F); start = offset; offset += 1 while data[offset - 1] & 0x80: value |= (data[offset] & 0x7F) << (7 * (offset - start)); offset += 1 return value - def get_value_of_kind(kind: int) -> int: + def get_value_of_kind(kind: int) -> np.int64: nonlocal data, offset assert data[offset] == kind; offset += 1 return get_value() - entries = [] + arrays, indices = {}, {} with open(path, "rb") as file: - while len(entries) < items: - entries.append({}) - + for _ in range(items): length = file.read(8); assert len(length) == 8 length, = struct.unpack(" int: get_value_of_kind(0x0A) length = get_value_of_kind(0x0A) key = data[offset:offset + length].decode("utf-8"); offset += length - get_value_of_kind(0x12) + if key not in arrays: + arrays[key] = array.array({0x0A: "B", 0x1A: "q", 0x12: "f"}.get(data[offset], "B")) + indices[key] = array.array("L", [0]) + if data[offset] == 0x0A: - get_value_of_kind(0x0A) - length = get_value_of_kind(0x0A) - entries[-1][key] = np.frombuffer(data, np.uint8, length, offset).copy(); offset += length + length = get_value_of_kind(0x0A) and get_value_of_kind(0x0A) + arrays[key].frombytes(data[offset:offset + length]); offset += length elif data[offset] == 0x1A: - get_value_of_kind(0x1A) - length = get_value_of_kind(0x0A) - values, target_offset = [], offset + length + length = get_value_of_kind(0x1A) and get_value_of_kind(0x0A) + target_offset = offset + length while offset < target_offset: - values.append(get_value()) - entries[-1][key] = np.array(values, dtype=np.int64) + arrays[key].append(get_value()) elif data[offset] == 0x12: - get_value_of_kind(0x12) - length = get_value_of_kind(0x0A) - entries[-1][key] = np.frombuffer( - data, np.dtype("> 2, offset).astype(np.float32).copy(); offset += length + length = get_value_of_kind(0x12) and get_value_of_kind(0x0A) + arrays[key].frombytes(np.frombuffer( + data, np.dtype("> 2, offset).astype(np.float32).tobytes()); offset += length else: raise ValueError("Unsupported data tag {}".format(data[offset])) - return entries + indices[key].append(len(arrays[key])) + return arrays, indices # Keras IoU metric class MaskIoUMetric(keras.metrics.Mean): @@ -203,18 +217,20 @@ def evaluate_segmentation_file(gold_dataset: Dataset, predictions_file: TextIO) if __name__ == "__main__": import argparse parser = argparse.ArgumentParser() - parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate") parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate") + parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate") parser.add_argument("--task", default="classification", type=str, help="Task to evaluate") args = parser.parse_args() if args.evaluate: + gold_dataset = getattr(CAGS(decode_on_demand=True), args.dataset) + if args.task == "classification": with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file: - accuracy = CAGS.evaluate_classification_file(getattr(CAGS(), args.dataset), predictions_file) + accuracy = CAGS.evaluate_classification_file(gold_dataset, predictions_file) print("CAGS accuracy: {:.2f}%".format(accuracy)) if args.task == "segmentation": with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file: - iou = CAGS.evaluate_segmentation_file(getattr(CAGS(), args.dataset), predictions_file) + iou = CAGS.evaluate_segmentation_file(gold_dataset, predictions_file) print("CAGS IoU: {:.2f}%".format(iou)) diff --git a/labs/05/cnn_manual.py b/labs/05/cnn_manual.py index 26d3303..ac6c99b 100644 --- a/labs/05/cnn_manual.py +++ b/labs/05/cnn_manual.py @@ -72,13 +72,13 @@ def backward( if self._verify: inputs.requires_grad_(True) inputs.grad = self._kernel.value.grad = self._bias.value.grad = None - reference = keras.ops.relu(keras.ops.conv(inputs, self._kernel, self._stride) + self._bias) + reference = (outputs > 0) * (keras.ops.conv(inputs, self._kernel, self._stride) + self._bias) reference.backward(gradient=outputs_gradient, inputs=[inputs, self._kernel.value, self._bias.value]) for name, computed, reference in zip( - ["Inputs", "Kernel", "Bias"], [inputs_gradient, kernel_gradient, bias_gradient], - [inputs.grad, self._kernel.value.grad, self._bias.value.grad]): + ["Bias", "Kernel", "Inputs"], [bias_gradient, kernel_gradient, inputs_gradient], + [self._bias.value.grad, self._kernel.value.grad, inputs.grad]): np.testing.assert_allclose(keras.ops.convert_to_numpy(computed), keras.ops.convert_to_numpy(reference), - atol=1e-4, err_msg=name + " gradient differs!") + atol=2e-4, err_msg=name + " gradient differs!") # Return the inputs gradient, the layer variables, and their gradients. return inputs_gradient, [self._kernel, self._bias], [kernel_gradient, bias_gradient] diff --git a/labs/06/Untitled-1.py b/labs/06/Untitled-1.py new file mode 100644 index 0000000..f3c5870 --- /dev/null +++ b/labs/06/Untitled-1.py @@ -0,0 +1,201 @@ +#!/usr/bin/env python3 +import argparse +import datetime +import os +import re + +import torch.utils +import torch.utils.data +os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +import keras +import numpy as np +import torch, torchvision + +import bboxes_utils +from svhn_dataset import SVHN + +# Jonas Glerup RΓΈssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 + +# TODO: Define reasonable defaults and optionally more parameters. +# Also, you can set the number of threads to 0 to use all your CPU cores. +parser = argparse.ArgumentParser() +parser.add_argument("--batch_size", default=64, type=int, help="Batch size.") +parser.add_argument("--epochs", default=10, type=int, help="Number of epochs.") +parser.add_argument("--seed", default=42, type=int, help="Random seed.") +parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") +parser.add_argument("--learning_rate", default=0.001, type=float, help="Learning rate for training.") +parser.add_argument("--image_size", default=224, type=int, help="A fixed image size.") +parser.add_argument("--iou_threshold", default=0.7, type=int, help="The intersection over union threshold.") +parser.add_argument("--model_file", default=None, type=str, help="Pretrained model to load.") + + +class TorchTensorBoardCallback(keras.callbacks.Callback): + def __init__(self, path): + self._path = path + self._writers = {} + + def writer(self, writer): + if writer not in self._writers: + import torch.utils.tensorboard + self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer)) + return self._writers[writer] + + def add_logs(self, writer, logs, step): + if logs: + for key, value in logs.items(): + self.writer(writer).add_scalar(key, value, step) + self.writer(writer).flush() + + def on_epoch_end(self, epoch, logs=None): + if logs: + if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer): + logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)} + self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1) + self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1) + + +def main(args: argparse.Namespace) -> None: + # Set the random seed and the number of threads. + keras.utils.set_random_seed(args.seed) + if args.threads: + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) + + # Create logdir name + args.logdir = os.path.join("logs", "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) + )) + + # Load the data. The individual examples are dictionaries with the keys: + # - "image", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range, + # - "classes", a `[num_digits]` vector with classes of image digits, + # - "bboxes", a `[num_digits, 4]` vector with bounding boxes of image digits. + svhn = SVHN() + # e.g., image H=224, W=224, top level pixel = 7X7, anchor height, width = 224/7, 224/7=32,32 + def get_anchors(backbone_output_pixel=14, image_size=args.image_size): + square_anchors = [] + img_H, img_W = image_size, image_size + square_anchor_h, square_anchor_w = img_H//backbone_output_pixel, img_W//backbone_output_pixel + for h in range(0, img_H, square_anchor_h): + for w in range(0, img_W, square_anchor_w): + square_anchors.append([h, w, h+square_anchor_h, w+square_anchor_w]) + return np.array(square_anchors) + + + def prepare_data(example): + gold_bboxes = example["bboxes"]/example["image"].shape[0] + image_resized = keras.ops.image.resize(example["image"], (args.image_size, args.image_size)) + gold_classes, iou_threshold = example["classes"], args.iou_threshold + anchor_classes, anchor_bboxes = bboxes_utils.bboxes_training(anchors, gold_classes, gold_bboxes, iou_threshold) + anchor_classes_one_hot = keras.ops.one_hot(anchor_classes-1, svhn.LABELS) + classes_sample_weight = keras.ops.ones_like(anchor_classes) + bboxes_sample_weight = anchor_classes > 0 + return image_resized, (anchor_classes_one_hot, anchor_bboxes), (classes_sample_weight, bboxes_sample_weight) + + model = None + test = torch.utils.data.DataLoader(svhn.test, batch_size=args.batch_size) + + if args.model_file: + model = keras.models.load_model(args.model_file) + else: + + anchors = get_anchors() + svhn.train, svhn.dev, svhn.test = svhn.train.transform(prepare_data), svhn.dev.transform(prepare_data), svhn.test.transform(prepare_data) + print(svhn.train[0]) + train = torch.utils.data.DataLoader( + svhn.train, batch_size=args.batch_size, shuffle=True) #num_workers=1, persistent_workers=True) + dev = torch.utils.data.DataLoader(svhn.dev, batch_size=args.batch_size) + #train_imgs, train_labels, train_sample_weights = np.array([e[0] for e in train]), np.array([e[1] for e in train]), np.array([e[2] for e in train]) + #dev_imgs, dev_labels, dev_sample_weights = np.array([e[0] for e in dev]), np.array([e[1] for e in dev]), np.array([e[2] for e in dev]) + + # Load the EfficientNetV2-B0 model. It assumes the input images are + # represented in the [0-255] range. + backbone = keras.applications.EfficientNetV2B0(include_top=False) + + # Extract features of different resolution. Assuming 224x224 input images + # (you can set this explicitly via `input_shape` of the above constructor), + # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112. + backbone = keras.Model( + inputs=backbone.input, + outputs=[backbone.get_layer(layer).output for layer in [ + "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]] + ) + + # TODO: Create the model and train it + backbone.trainable = False + inputs = keras.layers.Input(shape=(args.image_size,args.image_size, 3)) + # backbone outputs bottom to up (layer 1-5): block1a, block2b, block3b, block5e, top + # shapes: 7x7x1280, 14x14x112, 28x28x40, 56x56x24, 112x112x16 + top, block5e, block3b, block2b, block1a = backbone(inputs) + + def bn_relu(inputs): + return keras.layers.ReLU()(keras.layers.BatchNormalization()(inputs)) + + ### classification and bbox regression head + ### 9 is the anchor number for RetinaNet + def heads(input_feature, type="classification", anchor_number=1): + activ, output_size = None, 0 + if type.lower() == "classification": + activ, output_size = "sigmoid", svhn.LABELS*anchor_number + elif type.lower() == "regression": + activ, output_size = None, 4*anchor_number + else: + print("Type can only be 'classification' or 'regression'!") + conv1 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(input_feature)) + conv2 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(conv1)) + conv3 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(conv2)) + outputs = keras.layers.Conv2D(output_size, 3, 1, "same", activation=activ)(conv3) + outputs = keras.layers.Reshape((outputs.shape[1]*outputs.shape[2], outputs.shape[3]))(outputs) + return outputs + + # only use the top layer output + feature = block5e + cls_output = heads(feature) + reg_output = heads(feature, "regression") + + model = keras.Model(inputs, [cls_output, reg_output], name="baseline") + model.summary() + + model.compile( + optimizer=keras.optimizers.Adam(learning_rate=args.learning_rate), + loss=( + keras.losses.BinaryFocalCrossentropy(), + keras.losses.Huber()), + metrics=[keras.metrics.BinaryCrossentropy(name="binaryce"), + keras.metrics.MeanSquaredError(name="mse")], + ) + + model.fit(train, epochs=args.epochs, validation_data=dev) + model.save("svhn_model.keras") + + # Generate test set annotations, but in `args.logdir` to allow parallel execution. + os.makedirs(args.logdir, exist_ok=True) + with open(os.path.join(args.logdir, "svhn_competition.txt"), "w", encoding="utf-8") as predictions_file: + # TODO: Predict the digits and their bounding boxes on the test set. + # Assume that for a single test image we get + # - `predicted_classes`: a 1D array with the predicted digits, + # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes; + pred_classes, pred_rcnns = model.predict(test) + print(pred_classes.shape, pred_rcnns.shape) + # shape of pred_classes, pred_rcnns: (4535, 196, 10) (4535, 196, 4) + for predicted_classes, predicted_bboxes in zip(pred_classes, pred_rcnns): + scores = torch.tensor(np.max(predicted_classes, axis=-1)) + predicted_bboxes = torch.tensor(bboxes_utils.bboxes_from_rcnn(anchors, predicted_bboxes), dtype=torch.float32) + chosen_bboxes = torchvision.ops.nms(predicted_bboxes, scores, args.iou_threshold) + output = [] + for cls, bbox_id in zip(predicted_classes, chosen_bboxes): + label = np.argmax(cls) + bbox = predicted_bboxes[bbox_id] + output += [label] + list(bbox) + print(*output, file=predictions_file) + + +if __name__ == "__main__": + args = parser.parse_args([] if "__file__" not in globals() else None) + main(args) diff --git a/labs/06/Untitled-2.py b/labs/06/Untitled-2.py new file mode 100644 index 0000000..63cf2b5 --- /dev/null +++ b/labs/06/Untitled-2.py @@ -0,0 +1,207 @@ +#!/usr/bin/env python3 +import argparse +import datetime +import os +import re +os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +import keras +import numpy as np +import torch, torchvision +import pickle + +import bboxes_utils +from svhn_dataset import SVHN + +# Jonas Glerup RΓΈssum +# 31a0a96a-c590-4486-b194-f72765b2ce25 +# Xiao Wang +# 91d4d1d7-b800-4765-96b9-df098ac36a66 + +# TODO: Define reasonable defaults and optionally more parameters. +# Also, you can set the number of threads to 0 to use all your CPU cores. +parser = argparse.ArgumentParser() +parser.add_argument("--batch_size", default=64, type=int, help="Batch size.") +parser.add_argument("--epochs", default=10, type=int, help="Number of epochs.") +parser.add_argument("--seed", default=42, type=int, help="Random seed.") +parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") +parser.add_argument("--learning_rate", default=0.001, type=float, help="Learning rate for training.") +parser.add_argument("--image_size", default=224, type=int, help="A fixed image size.") +parser.add_argument("--iou_threshold", default=0.7, type=int, help="The intersection over union threshold.") + + +class TorchTensorBoardCallback(keras.callbacks.Callback): + def __init__(self, path): + self._path = path + self._writers = {} + + def writer(self, writer): + if writer not in self._writers: + import torch.utils.tensorboard + self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer)) + return self._writers[writer] + + def add_logs(self, writer, logs, step): + if logs: + for key, value in logs.items(): + self.writer(writer).add_scalar(key, value, step) + self.writer(writer).flush() + + def on_epoch_end(self, epoch, logs=None): + if logs: + if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer): + logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)} + self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1) + self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1) + + +def main(args: argparse.Namespace) -> None: + # Set the random seed and the number of threads. + keras.utils.set_random_seed(args.seed) + if args.threads: + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) + + # Create logdir name + args.logdir = os.path.join("logs", "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) + )) + + print("πŸ“Š Loading data") + + # Load the data. The individual examples are dictionaries with the keys: + # - "image", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range, + # - "classes", a `[num_digits]` vector with classes of image digits, + # - "bboxes", a `[num_digits, 4]` vector with bounding boxes of image digits. + svhn = SVHN() + # e.g., image H=224, W=224, top level pixel = 7X7, anchor height, width = 224/7, 224/7=32,32 + def get_anchors(backbone_output_pixel=14, image_size=args.image_size): + square_anchors = [] + img_H, img_W = image_size, image_size + square_anchor_h, square_anchor_w = img_H//backbone_output_pixel, img_W//backbone_output_pixel + for h in range(0, img_H, square_anchor_h): + for w in range(0, img_W, square_anchor_w): + square_anchors.append([h, w, h+square_anchor_h, w+square_anchor_w]) + return np.array(square_anchors) + + print("πŸ“Š Creating anchors") + + anchors = get_anchors() + + + + def prepare_data(example): + gold_bboxes = example["bboxes"]/example["image"].shape[0] + image_resized = keras.ops.image.resize(example["image"], (args.image_size, args.image_size)) + gold_classes, iou_threshold = example["classes"], args.iou_threshold + anchor_classes, anchor_bboxes = bboxes_utils.bboxes_training(anchors, gold_classes, gold_bboxes, iou_threshold) + anchor_classes_one_hot = keras.ops.one_hot(anchor_classes-1, svhn.LABELS) + classes_sample_weight = keras.ops.ones_like(anchor_classes) + bboxes_sample_weight = anchor_classes > 0 + return image_resized, (anchor_classes_one_hot, anchor_bboxes), (classes_sample_weight, bboxes_sample_weight) + + print("πŸ“Š Transforming data") + + print("length: ", len(svhn.train.transform(prepare_data))) + train_imgs, train_labels, train_weights = zip(*[(entry[0], entry[1], entry[2]) for entry in svhn.train.transform(prepare_data)]) + dev_imgs, dev_labels, dev_weights = zip(*[(entry[0], entry[1], entry[2]) for entry in svhn.dev.transform(prepare_data)]) + test_imgs, _, _ = zip(*[(entry[0], entry[1], entry[2]) for entry in svhn.test.transform(prepare_data)]) + + pickle.dump((train_imgs, train_labels, train_weights, dev_imgs, dev_labels, dev_weights, test_imgs), open("data.pkl", "wb")) + # train_imgs, train_labels, train_sample_weights, dev_imgs, dev_labels, dev_sample_weights, test_imgs = pickle.load(open("data.pkl", "rb")) + + print("πŸ“Š Creating efficientnet") + # Load the EfficientNetV2-B0 model. It assumes the input images are + # represented in the [0-255] range. + backbone = keras.applications.EfficientNetV2B0(include_top=False) + + print("πŸ“Š Instantiating model") + # Extract features of different resolution. Assuming 224x224 input images + # (you can set this explicitly via `input_shape` of the above constructor), + # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112. + backbone = keras.Model( + inputs=backbone.input, + outputs=[backbone.get_layer(layer).output for layer in [ + "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]] + ) + + # TODO: Create the model and train it + backbone.trainable = False + inputs = keras.layers.Input(shape=(args.image_size,args.image_size, 3)) + # backbone outputs bottom to up (layer 1-5): block1a, block2b, block3b, block5e, top + # shapes: 7x7x1280, 14x14x112, 28x28x40, 56x56x24, 112x112x16 + top, block5e, block3b, block2b, block1a = backbone(inputs) + + def bn_relu(inputs): + return keras.layers.ReLU()(keras.layers.BatchNormalization()(inputs)) + + ### classification and bbox regression head + ### 9 is the anchor number for RitinaNet + def heads(input_feature, type="classification", anchor_number=1): + activ, output_size = None, 0 + if type.lower() == "classification": + activ, output_size = "sigmoid", svhn.LABELS*anchor_number + elif type.lower() == "regression": + activ, output_size = None, 4*anchor_number + else: + print("Type can only be 'classification' or 'regression'!") + conv1 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(input_feature)) + conv2 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(conv1)) + conv3 = bn_relu(keras.layers.Conv2D(256, 3, 1, "same")(conv2)) + outputs = keras.layers.Conv2D(output_size, 3, 1, "same", activation=activ)(conv3) + return outputs + + print("πŸ“Š Preprocessing") + + # only use the top layer output + feature = block5e + cls_output = heads(feature) + reg_output = heads(feature, "regression") + + print("πŸ“Š Creating model") + + model = keras.Model(inputs, [cls_output, reg_output], name="baseline") + model.summary() + + print("πŸ“Š Compiling") + + model.compile( + optimizer=keras.optimizers.Adam(learning_rate=args.learning_rate), + loss=( + keras.losses.BinaryFocalCrossentropy(), + keras.losses.Huber()), + metrics=["accuracy"], + ) + + print("πŸ“Š Training") + + model.fit(train_imgs, train_labels, + batch_size=args.batch_size, epochs=args.epochs, + validation_data=(dev_imgs, dev_labels), + sample_weight = train_sample_weights, + ) + + # Generate test set annotations, but in `args.logdir` to allow parallel execution. + os.makedirs(args.logdir, exist_ok=True) + with open(os.path.join(args.logdir, "svhn_competition.txt"), "w", encoding="utf-8") as predictions_file: + # TODO: Predict the digits and their bounding boxes on the test set. + # Assume that for a single test image we get + # - `predicted_classes`: a 1D array with the predicted digits, + # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes; + pre_classes, pre_rcnns = model.predict(test_imgs) + pre_bboxes = bboxes_utils.bboxes_from_rcnn(anchors, pre_rcnns) + for predicted_classes, predicted_bboxes in zip(pre_classes, pre_bboxes): + scores = np.max(predicted_classes, axis=-1) + chosen_bboxes = torchvision.ops.nms(predicted_bboxes, scores, args.iou_threshold) + print(chosen_bboxes.shape, test_imgs.shape) + output = [] + for label, bbox in zip(predicted_classes, chosen_bboxes): + output += [label] + list(bbox) + print(*output, file=predictions_file) + + +if __name__ == "__main__": + args = parser.parse_args([] if "__file__" not in globals() else None) + main(args) diff --git a/labs/06/bboxes_utils.py b/labs/06/bboxes_utils.py new file mode 100644 index 0000000..332ea79 --- /dev/null +++ b/labs/06/bboxes_utils.py @@ -0,0 +1,181 @@ +#!/usr/bin/env python3 +import argparse +from typing import Callable +import unittest + +import numpy as np + +# Bounding boxes and anchors are expected to be Numpy tensors, +# where the last dimension has size 4. + +# For bounding boxes in pixel coordinates, the 4 values correspond to: +TOP: int = 0 +LEFT: int = 1 +BOTTOM: int = 2 +RIGHT: int = 3 + + +def bboxes_area(bboxes: np.ndarray) -> np.ndarray: + """ Compute area of given set of bboxes. + + Each bbox is parametrized as a four-tuple (top, left, bottom, right). + + If the bboxes.shape is [..., 4], the output shape is bboxes.shape[:-1]. + """ + return np.maximum(bboxes[..., BOTTOM] - bboxes[..., TOP], 0) \ + * np.maximum(bboxes[..., RIGHT] - bboxes[..., LEFT], 0) + + +def bboxes_iou(xs: np.ndarray, ys: np.ndarray) -> np.ndarray: + """ Compute IoU of corresponding pairs from two sets of bboxes `xs` and `ys`. + + Each bbox is parametrized as a four-tuple (top, left, bottom, right). + + Note that broadcasting is supported, so passing inputs with + `xs.shape=[num_xs, 1, 4]` and `ys.shape=[1, num_ys, 4]` produces an output + with shape `[num_xs, num_ys]`, computing IoU for all pairs of bboxes from + `xs` and `ys`. Formally, the output shape is `np.broadcast(xs, ys).shape[:-1]`. + """ + intersections = np.stack([ + np.maximum(xs[..., TOP], ys[..., TOP]), + np.maximum(xs[..., LEFT], ys[..., LEFT]), + np.minimum(xs[..., BOTTOM], ys[..., BOTTOM]), + np.minimum(xs[..., RIGHT], ys[..., RIGHT]), + ], axis=-1) + + xs_area, ys_area, intersections_area = bboxes_area(xs), bboxes_area(ys), bboxes_area(intersections) + + return intersections_area / (xs_area + ys_area - intersections_area) + + +def bboxes_to_rcnn(anchors: np.ndarray, bboxes: np.ndarray) -> np.ndarray: + """ Convert `bboxes` to a R-CNN-like representation relative to `anchors`. + + The `anchors` and `bboxes` are arrays of four-tuples (top, left, bottom, right); + you can use the TOP, LEFT, BOTTOM, RIGHT constants as indices of the + respective coordinates. + + The resulting representation of a single bbox is a four-tuple with: + - (bbox_y_center - anchor_y_center) / anchor_height + - (bbox_x_center - anchor_x_center) / anchor_width + - log(bbox_height / anchor_height) + - log(bbox_width / anchor_width) + + If the `anchors.shape` is `[anchors_len, 4]` and `bboxes.shape` is `[anchors_len, 4]`, + the output shape is `[anchors_len, 4]`. + """ + + # TODO: Implement according to the docstring. + raise NotImplementedError() + + +def bboxes_from_rcnn(anchors: np.ndarray, rcnns: np.ndarray) -> np.ndarray: + """ Convert R-CNN-like representation relative to `anchor` to a `bbox`. + + If the `anchors.shape` is `[anchors_len, 4]` and `rcnns.shape` is `[anchors_len, 4]`, + the output shape is `[anchors_len, 4]`. + """ + + # TODO: Implement according to the docstring. + raise NotImplementedError() + + +def bboxes_training( + anchors: np.ndarray, gold_classes: np.ndarray, gold_bboxes: np.ndarray, iou_threshold: float +) -> tuple[np.ndarray, np.ndarray]: + """ Compute training data for object detection. + + Arguments: + - `anchors` is an array of four-tuples (top, left, bottom, right) + - `gold_classes` is an array of zero-based classes of the gold objects + - `gold_bboxes` is an array of four-tuples (top, left, bottom, right) + of the gold objects + - `iou_threshold` is a given threshold + + Returns: + - `anchor_classes` contains for every anchor either 0 for background + (if no gold object is assigned) or `1 + gold_class` if a gold object + with `gold_class` is assigned to it + - `anchor_bboxes` contains for every anchor a four-tuple + `(center_y, center_x, height, width)` representing the gold bbox of + a chosen object using parametrization of R-CNN; zeros if no gold object + was assigned to the anchor + If the `anchors` shape is `[anchors_len, 4]`, the `anchor_classes` shape + is `[anchors_len]` and the `anchor_bboxes` shape is `[anchors_len, 4]`. + + Algorithm: + - First, for each gold object, assign it to an anchor with the largest IoU + (the anchor with smaller index if there are several). In case several gold + objects are assigned to a single anchor, use the gold object with smaller + index. + - For each unused anchor, find the gold object with the largest IoU + (again the gold object with smaller index if there are several), and if + the IoU is >= iou_threshold, assign the object to the anchor. + """ + + # TODO: First, for each gold object, assign it to an anchor with the + # largest IoU (the anchor with smaller index if there are several). In case + # several gold objects are assigned to a single anchor, use the gold object + # with smaller index. + + # TODO: For each unused anchor, find the gold object with the largest IoU + # (again the gold object with smaller index if there are several), and if + # the IoU is >= threshold, assign the object to the anchor. + + anchor_classes, anchor_bboxes = ..., ... + + return anchor_classes, anchor_bboxes + + +def main(args: argparse.Namespace) -> tuple[Callable, Callable, Callable]: + return bboxes_to_rcnn, bboxes_from_rcnn, bboxes_training + + +class Tests(unittest.TestCase): + def test_bboxes_to_from_rcnn(self): + data = [ + [[0, 0, 10, 10], [0, 0, 10, 10], [0, 0, 0, 0]], + [[0, 0, 10, 10], [5, 0, 15, 10], [.5, 0, 0, 0]], + [[0, 0, 10, 10], [0, 5, 10, 15], [0, .5, 0, 0]], + [[0, 0, 10, 10], [0, 0, 20, 30], [.5, 1, np.log(2), np.log(3)]], + [[0, 9, 10, 19], [2, 10, 5, 16], [-0.15, -0.1, -1.20397, -0.51083]], + [[5, 3, 15, 13], [7, 7, 10, 9], [-0.15, 0, -1.20397, -1.60944]], + [[7, 6, 17, 16], [9, 10, 12, 13], [-0.15, 0.05, -1.20397, -1.20397]], + [[5, 6, 15, 16], [7, 7, 10, 10], [-0.15, -0.25, -1.20397, -1.20397]], + [[6, 3, 16, 13], [8, 5, 12, 8], [-0.1, -0.15, -0.91629, -1.20397]], + [[5, 2, 15, 12], [9, 6, 12, 8], [0.05, 0, -1.20397, -1.60944]], + [[2, 10, 12, 20], [6, 11, 8, 17], [0, -0.1, -1.60944, -0.51083]], + [[10, 9, 20, 19], [12, 13, 17, 16], [-0.05, 0.05, -0.69315, -1.20397]], + [[6, 7, 16, 17], [10, 11, 12, 14], [0, 0.05, -1.60944, -1.20397]], + [[2, 2, 12, 12], [3, 5, 8, 8], [-0.15, -0.05, -0.69315, -1.20397]], + ] + # First run on individual anchors, and then on all together + for anchors, bboxes, rcnns in [map(lambda x: [x], row) for row in data] + [zip(*data)]: + anchors, bboxes, rcnns = [np.array(data, np.float32) for data in [anchors, bboxes, rcnns]] + np.testing.assert_almost_equal(bboxes_to_rcnn(anchors, bboxes), rcnns, decimal=3) + np.testing.assert_almost_equal(bboxes_from_rcnn(anchors, rcnns), bboxes, decimal=3) + + def test_bboxes_training(self): + anchors = np.array([[0, 0, 10, 10], [0, 10, 10, 20], [10, 0, 20, 10], [10, 10, 20, 20]], np.float32) + for gold_classes, gold_bboxes, anchor_classes, anchor_bboxes, iou in [ + [[1], [[14., 14, 16, 16]], [0, 0, 0, 2], [[0, 0, 0, 0]] * 3 + [[0, 0, np.log(.2), np.log(.2)]], 0.5], + [[2], [[0., 0, 20, 20]], [3, 0, 0, 0], [[.5, .5, np.log(2), np.log(2)]] + [[0, 0, 0, 0]] * 3, 0.26], + [[2], [[0., 0, 20, 20]], [3, 3, 3, 3], + [[y, x, np.log(2), np.log(2)] for y in [.5, -.5] for x in [.5, -.5]], 0.24], + [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 0, 0, 1], + [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [-0.35, -0.45, 0.53062, 0.40546]], 0.5], + [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 0, 2, 1], + [[0, 0, 0, 0], [0, 0, 0, 0], [-0.1, 0.6, -0.22314, 0.69314], [-0.35, -0.45, 0.53062, 0.40546]], 0.3], + [[0, 1], [[3, 3, 20, 18], [10, 1, 18, 21]], [0, 1, 2, 1], + [[0, 0, 0, 0], [0.65, -0.45, 0.53062, 0.40546], [-0.1, 0.6, -0.22314, 0.69314], + [-0.35, -0.45, 0.53062, 0.40546]], 0.17], + ]: + gold_classes, anchor_classes = np.array(gold_classes, np.int32), np.array(anchor_classes, np.int32) + gold_bboxes, anchor_bboxes = np.array(gold_bboxes, np.float32), np.array(anchor_bboxes, np.float32) + computed_classes, computed_bboxes = bboxes_training(anchors, gold_classes, gold_bboxes, iou) + np.testing.assert_almost_equal(computed_classes, anchor_classes, decimal=3) + np.testing.assert_almost_equal(computed_bboxes, anchor_bboxes, decimal=3) + + +if __name__ == '__main__': + unittest.main() diff --git a/labs/06/svhn.ipynb b/labs/06/svhn.ipynb new file mode 100644 index 0000000..c5d935f --- /dev/null +++ b/labs/06/svhn.ipynb @@ -0,0 +1,375 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "c:\\Users\\jonas\\p\\cu\\NPFL138\\repo\\.venv\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + } + ], + "source": [ + "#!/usr/bin/env python3\n", + "import argparse\n", + "import datetime\n", + "import os\n", + "import re\n", + "os.environ.setdefault(\"KERAS_BACKEND\", \"torch\") # Use PyTorch backend unless specified otherwise\n", + "\n", + "import keras\n", + "import numpy as np\n", + "import torch, torchvision\n", + "\n", + "import bboxes_utils\n", + "from svhn_dataset import SVHN\n", + "\n", + "# Jonas Glerup RΓΈssum \n", + "# 31a0a96a-c590-4486-b194-f72765b2ce25\n", + "# Xiao Wang \n", + "# 91d4d1d7-b800-4765-96b9-df098ac36a66\n", + "\n", + "# TODO: Define reasonable defaults and optionally more parameters.\n", + "# Also, you can set the number of threads to 0 to use all your CPU cores.\n", + "parser = argparse.ArgumentParser()\n", + "parser.add_argument(\"--batch_size\", default=64, type=int, help=\"Batch size.\")\n", + "parser.add_argument(\"--epochs\", default=10, type=int, help=\"Number of epochs.\")\n", + "parser.add_argument(\"--seed\", default=42, type=int, help=\"Random seed.\")\n", + "parser.add_argument(\"--threads\", default=1, type=int, help=\"Maximum number of threads to use.\")\n", + "parser.add_argument(\"--learning_rate\", default=0.001, type=float, help=\"Learning rate for training.\")\n", + "parser.add_argument(\"--image_size\", default=224, type=int, help=\"A fixed image size.\")\n", + "parser.add_argument(\"--iou_threshold\", default=0.7, type=int, help=\"The intersection over union threshold.\")\n", + "\n", + "\n", + "class TorchTensorBoardCallback(keras.callbacks.Callback):\n", + " def __init__(self, path):\n", + " self._path = path\n", + " self._writers = {}\n", + "\n", + " def writer(self, writer):\n", + " if writer not in self._writers:\n", + " import torch.utils.tensorboard\n", + " self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer))\n", + " return self._writers[writer]\n", + "\n", + " def add_logs(self, writer, logs, step):\n", + " if logs:\n", + " for key, value in logs.items():\n", + " self.writer(writer).add_scalar(key, value, step)\n", + " self.writer(writer).flush()\n", + "\n", + " def on_epoch_end(self, epoch, logs=None):\n", + " if logs:\n", + " if isinstance(getattr(self.model, \"optimizer\", None), keras.optimizers.Optimizer):\n", + " logs = logs | {\"learning_rate\": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)}\n", + " self.add_logs(\"train\", {k: v for k, v in logs.items() if not k.startswith(\"val_\")}, epoch + 1)\n", + " self.add_logs(\"val\", {k[4:]: v for k, v in logs.items() if k.startswith(\"val_\")}, epoch + 1)\n", + "\n", + "args = parser.parse_args([] if \"__file__\" not in globals() else None)\n", + "\n", + "def get_anchors(backbone_output_pixel=14, image_size=args.image_size):\n", + " square_anchors = []\n", + " img_H, img_W = image_size, image_size\n", + " square_anchor_h, square_anchor_w = img_H//backbone_output_pixel, img_W//backbone_output_pixel\n", + " for h in range(0, img_H, square_anchor_h):\n", + " for w in range(0, img_W, square_anchor_w):\n", + " square_anchors.append([h, w, h+square_anchor_h, w+square_anchor_w])\n", + " return np.array(square_anchors)\n", + "\n", + "def prepare_data(example):\n", + " gold_bboxes = example[\"bboxes\"]/example[\"image\"].shape[0]\n", + " image_resized = keras.ops.image.resize(example[\"image\"], (args.image_size, args.image_size))\n", + " gold_classes, iou_threshold = example[\"classes\"], args.iou_threshold\n", + " anchor_classes, anchor_bboxes = bboxes_utils.bboxes_training(anchors, gold_classes, gold_bboxes, iou_threshold)\n", + " anchor_classes_one_hot = keras.ops.one_hot(anchor_classes-1, svhn.LABELS)\n", + " classes_sample_weight = keras.ops.ones_like(anchor_classes)\n", + " bboxes_sample_weight = anchor_classes > 0\n", + " return image_resized, (anchor_classes_one_hot, anchor_bboxes), (classes_sample_weight, bboxes_sample_weight)\n", + "\n", + "def bn_relu(inputs):\n", + " return keras.layers.ReLU()(keras.layers.BatchNormalization()(inputs))\n", + "\n", + "### classification and bbox regression head\n", + "### 9 is the anchor number for RitinaNet\n", + "def heads(input_feature, type=\"classification\", anchor_number=1):\n", + " activ, output_size = None, 0\n", + " if type.lower() == \"classification\":\n", + " activ, output_size = \"sigmoid\", svhn.LABELS*anchor_number\n", + " elif type.lower() == \"regression\":\n", + " activ, output_size = None, 4*anchor_number\n", + " else:\n", + " print(\"Type can only be 'classification' or 'regression'!\")\n", + " conv1 = bn_relu(keras.layers.Conv2D(256, 3, 1, \"same\")(input_feature))\n", + " conv2 = bn_relu(keras.layers.Conv2D(256, 3, 1, \"same\")(conv1))\n", + " conv3 = bn_relu(keras.layers.Conv2D(256, 3, 1, \"same\")(conv2))\n", + " outputs = keras.layers.Conv2D(output_size, 3, 1, \"same\", activation=activ)(conv3)\n", + " return outputs\n", + "\n", + "\n", + "\n", + " # Set the random seed and the number of threads.\n", + "keras.utils.set_random_seed(args.seed)\n", + "if args.threads:\n", + " torch.set_num_threads(args.threads)\n", + " # torch.set_num_interop_threads(args.threads)\n", + "\n", + "# Create logdir name\n", + "args.logdir = os.path.join(\"logs\", \"{}-{}-{}\".format(\n", + " os.path.basename(globals().get(\"__file__\", \"notebook\")),\n", + " datetime.datetime.now().strftime(\"%Y-%m-%d_%H%M%S\"),\n", + " \",\".join((\"{}={}\".format(re.sub(\"(.)[^_]*_?\", r\"\\1\", k), v) for k, v in sorted(vars(args).items())))\n", + "))\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "πŸ“Š Loading data\n" + ] + } + ], + "source": [ + "print(\"πŸ“Š Loading data\")\n", + "\n", + "# Load the data. The individual examples are dictionaries with the keys:\n", + "# - \"image\", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range,\n", + "# - \"classes\", a `[num_digits]` vector with classes of image digits,\n", + "# - \"bboxes\", a `[num_digits, 4]` vector with bounding boxes of image digits.\n", + "svhn = SVHN()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'image': tensor([[[124, 93, 64],\n", + " [128, 93, 63],\n", + " [127, 92, 60],\n", + " ...,\n", + " [118, 82, 60],\n", + " [115, 80, 58],\n", + " [113, 78, 58]],\n", + " \n", + " [[126, 95, 64],\n", + " [129, 94, 62],\n", + " [128, 94, 59],\n", + " ...,\n", + " [118, 82, 58],\n", + " [115, 80, 58],\n", + " [113, 78, 58]],\n", + " \n", + " [[124, 93, 62],\n", + " [128, 94, 59],\n", + " [127, 93, 56],\n", + " ...,\n", + " [118, 82, 58],\n", + " [115, 81, 56],\n", + " [113, 78, 58]],\n", + " \n", + " ...,\n", + " \n", + " [[106, 74, 49],\n", + " [108, 74, 49],\n", + " [108, 74, 49],\n", + " ...,\n", + " [ 90, 72, 62],\n", + " [ 89, 72, 62],\n", + " [ 89, 72, 62]],\n", + " \n", + " [[106, 74, 49],\n", + " [108, 74, 49],\n", + " [108, 74, 49],\n", + " ...,\n", + " [ 95, 75, 66],\n", + " [ 95, 77, 67],\n", + " [ 96, 78, 68]],\n", + " \n", + " [[105, 73, 48],\n", + " [107, 73, 48],\n", + " [105, 71, 46],\n", + " ...,\n", + " [104, 79, 72],\n", + " [104, 81, 73],\n", + " [104, 81, 73]]], dtype=torch.uint8),\n", + " 'classes': array([4], dtype=int64),\n", + " 'bboxes': array([[ 5, 13, 32, 33]], dtype=int64)}" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "svhn.train[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "πŸ“Š Creating anchors\n" + ] + } + ], + "source": [ + "print(\"πŸ“Š Creating anchors\")\n", + "\n", + "anchors = get_anchors()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "πŸ“Š Transforming data\n" + ] + }, + { + "ename": "ValueError", + "evalue": "too many values to unpack (expected 3)", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[5], line 3\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mπŸ“Š Transforming data\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m----> 3\u001b[0m train_imgs, train_labels, train_weights \u001b[38;5;241m=\u001b[39m svhn\u001b[38;5;241m.\u001b[39mtrain\u001b[38;5;241m.\u001b[39mtransform(prepare_data)\n\u001b[0;32m 4\u001b[0m dev_imgs, dev_labels, dev_weights \u001b[38;5;241m=\u001b[39m svhn\u001b[38;5;241m.\u001b[39mdev\u001b[38;5;241m.\u001b[39mtransform(prepare_data)\n\u001b[0;32m 5\u001b[0m test_imgs, _, _ \u001b[38;5;241m=\u001b[39m svhn\u001b[38;5;241m.\u001b[39mtest\u001b[38;5;241m.\u001b[39mtransform(prepare_data)\n", + "\u001b[1;31mValueError\u001b[0m: too many values to unpack (expected 3)" + ] + } + ], + "source": [ + "print(\"πŸ“Š Transforming data\")\n", + "\n", + "train_imgs, train_labels, train_weights = svhn.train.transform(prepare_data)\n", + "dev_imgs, dev_labels, dev_weights = svhn.dev.transform(prepare_data)\n", + "test_imgs, _, _ = svhn.test.transform(prepare_data)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "print(\"πŸ“Š Creating efficientnet\")\n", + "# Load the EfficientNetV2-B0 model. It assumes the input images are\n", + "# represented in the [0-255] range.\n", + "backbone = keras.applications.EfficientNetV2B0(include_top=False)\n", + "\n", + "print(\"πŸ“Š Instantiating model\")\n", + "# Extract features of different resolution. Assuming 224x224 input images\n", + "# (you can set this explicitly via `input_shape` of the above constructor),\n", + "# the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112.\n", + "backbone = keras.Model(\n", + " inputs=backbone.input,\n", + " outputs=[backbone.get_layer(layer).output for layer in [\n", + " \"top_activation\", \"block5e_add\", \"block3b_add\", \"block2b_add\", \"block1a_project_activation\"]]\n", + ")\n", + "\n", + "# TODO: Create the model and train it\n", + "backbone.trainable = False\n", + "inputs = keras.layers.Input(shape=(args.image_size,args.image_size, 3))\n", + "# backbone outputs bottom to up (layer 1-5): block1a, block2b, block3b, block5e, top\n", + "# shapes: 7x7x1280, 14x14x112, 28x28x40, 56x56x24, 112x112x16\n", + "top, block5e, block3b, block2b, block1a = backbone(inputs)\n", + "\n", + "print(\"πŸ“Š Preprocessing\")\n", + "\n", + "# only use the top layer output\n", + "feature = block5e\n", + "cls_output = heads(feature)\n", + "reg_output = heads(feature, \"regression\")\n", + "\n", + "print(\"πŸ“Š Creating model\")\n", + "\n", + "model = keras.Model(inputs, [cls_output, reg_output], name=\"baseline\")\n", + "model.summary()\n", + "\n", + "print(\"πŸ“Š Compiling\")\n", + "\n", + "model.compile(\n", + " optimizer=keras.optimizers.Adam(learning_rate=args.learning_rate),\n", + " loss=(\n", + " keras.losses.BinaryFocalCrossentropy(),\n", + " keras.losses.Huber()),\n", + " metrics=[\"accuracy\"],\n", + ")\n", + "\n", + "print(\"πŸ“Š Training\")\n", + "\n", + "model.fit(train_imgs, train_labels,\n", + " batch_size=args.batch_size, epochs=args.epochs,\n", + " validation_data=(dev_imgs, dev_labels),\n", + " sample_weight = train_sample_weights,\n", + ")\n", + "\n", + "# Generate test set annotations, but in `args.logdir` to allow parallel execution.\n", + "os.makedirs(args.logdir, exist_ok=True)\n", + "with open(os.path.join(args.logdir, \"svhn_competition.txt\"), \"w\", encoding=\"utf-8\") as predictions_file:\n", + " # TODO: Predict the digits and their bounding boxes on the test set.\n", + " # Assume that for a single test image we get\n", + " # - `predicted_classes`: a 1D array with the predicted digits,\n", + " # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes;\n", + " pre_classes, pre_rcnns = model.predict(test_imgs)\n", + " pre_bboxes = bboxes_utils.bboxes_from_rcnn(anchors, pre_rcnns)\n", + " for predicted_classes, predicted_bboxes in zip(pre_classes, pre_bboxes):\n", + " scores = np.max(predicted_classes, axis=-1)\n", + " chosen_bboxes = torchvision.ops.nms(predicted_bboxes, scores, args.iou_threshold)\n", + " print(chosen_bboxes.shape, test_imgs.shape)\n", + " output = []\n", + " for label, bbox in zip(predicted_classes, chosen_bboxes):\n", + " output += [label] + list(bbox)\n", + " print(*output, file=predictions_file)\n", + "\n", + "\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "undefined.undefined.undefined" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/labs/06/svhn_competition.py b/labs/06/svhn_competition.py new file mode 100644 index 0000000..ef3e6d0 --- /dev/null +++ b/labs/06/svhn_competition.py @@ -0,0 +1,101 @@ +#!/usr/bin/env python3 +import argparse +import datetime +import os +import re +os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +import keras +import numpy as np +import torch + +import bboxes_utils +from svhn_dataset import SVHN + +# TODO: Define reasonable defaults and optionally more parameters. +# Also, you can set the number of threads to 0 to use all your CPU cores. +parser = argparse.ArgumentParser() +parser.add_argument("--batch_size", default=..., type=int, help="Batch size.") +parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.") +parser.add_argument("--seed", default=42, type=int, help="Random seed.") +parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") + + +class TorchTensorBoardCallback(keras.callbacks.Callback): + def __init__(self, path): + self._path = path + self._writers = {} + + def writer(self, writer): + if writer not in self._writers: + import torch.utils.tensorboard + self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer)) + return self._writers[writer] + + def add_logs(self, writer, logs, step): + if logs: + for key, value in logs.items(): + self.writer(writer).add_scalar(key, value, step) + self.writer(writer).flush() + + def on_epoch_end(self, epoch, logs=None): + if logs: + if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer): + logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)} + self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1) + self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1) + + +def main(args: argparse.Namespace) -> None: + # Set the random seed and the number of threads. + keras.utils.set_random_seed(args.seed) + if args.threads: + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) + + # Create logdir name + args.logdir = os.path.join("logs", "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) + )) + + # Load the data. The individual examples are dictionaries with the keys: + # - "image", a `[SIZE, SIZE, 3]` tensor of `torch.uint8` values in [0-255] range, + # - "classes", a `[num_digits]` vector with classes of image digits, + # - "bboxes", a `[num_digits, 4]` vector with bounding boxes of image digits. + svhn = SVHN() + + # Load the EfficientNetV2-B0 model. It assumes the input images are + # represented in the [0-255] range. + backbone = keras.applications.EfficientNetV2B0(include_top=False) + + # Extract features of different resolution. Assuming 224x224 input images + # (you can set this explicitly via `input_shape` of the above constructor), + # the below model returns five outputs with resolution 7x7, 14x14, 28x28, 56x56, 112x112. + backbone = keras.Model( + inputs=backbone.input, + outputs=[backbone.get_layer(layer).output for layer in [ + "top_activation", "block5e_add", "block3b_add", "block2b_add", "block1a_project_activation"]] + ) + + # TODO: Create the model and train it + model = ... + + # Generate test set annotations, but in `args.logdir` to allow parallel execution. + os.makedirs(args.logdir, exist_ok=True) + with open(os.path.join(args.logdir, "svhn_competition.txt"), "w", encoding="utf-8") as predictions_file: + # TODO: Predict the digits and their bounding boxes on the test set. + # Assume that for a single test image we get + # - `predicted_classes`: a 1D array with the predicted digits, + # - `predicted_bboxes`: a [len(predicted_classes), 4] array with bboxes; + for predicted_classes, predicted_bboxes in ...: + output = [] + for label, bbox in zip(predicted_classes, predicted_bboxes): + output += [label] + list(bbox) + print(*output, file=predictions_file) + + +if __name__ == "__main__": + args = parser.parse_args([] if "__file__" not in globals() else None) + main(args) diff --git a/labs/06/svhn_dataset.py b/labs/06/svhn_dataset.py new file mode 100644 index 0000000..b111b19 --- /dev/null +++ b/labs/06/svhn_dataset.py @@ -0,0 +1,253 @@ +import array +import os +import sys +import struct +from typing import Any, Callable, Sequence, TextIO, TypedDict +import urllib.request + +import numpy as np +import torch +import torchvision + + +class SVHN: + LABELS: int = 10 + + # Type alias for a bounding box -- a list of floats. + BBox = list[float] + + # The indices of the bounding box coordinates. + TOP: int = 0 + LEFT: int = 1 + BOTTOM: int = 2 + RIGHT: int = 3 + + Element = TypedDict("Element", {"image": torch.Tensor, "classes": np.ndarray, "bboxes": np.ndarray}) + + _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/" + + class Dataset(torch.utils.data.Dataset): + def __init__(self, path: str, size: int, decode_on_demand: bool) -> None: + self._size = size + + arrays, indices = SVHN._load_data(path, size) + if decode_on_demand: + self._data, self._arrays, self._indices = None, arrays, indices + else: + self._data = [self._decode(arrays, indices, i) for i in range(size)] + + def __len__(self) -> int: + return self._size + + def __getitem__(self, index: int) -> "SVHN.Element": + if self._data: + return self._data[index] + return self._decode(self._arrays, self._indices, index) + + def transform(self, transform: Callable[["SVHN.Element"], Any]) -> "SVHN.TransformedDataset": + return SVHN.TransformedDataset(self, transform) + + def _decode(self, data: dict, indices: dict, index: int) -> "SVHN.Element": + return { + "image": torchvision.io.decode_image( + torch.frombuffer(data["image"], dtype=torch.uint8, offset=indices["image"][:-1][index], + count=indices["image"][1:][index] - indices["image"][:-1][index]), + torchvision.io.ImageReadMode.RGB).permute(1, 2, 0), + "classes": np.frombuffer( + data["classes"], dtype=np.int64, offset=indices["classes"][:-1][index] << 3, + count=indices["classes"][1:][index] - indices["classes"][:-1][index]), + "bboxes": np.frombuffer( + data["bboxes"], dtype=np.int64, offset=indices["bboxes"][:-1][index] << 3, + count=indices["bboxes"][1:][index] - indices["bboxes"][:-1][index]).reshape(-1, 4), + } + + class TransformedDataset(torch.utils.data.Dataset): + def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None: + self._dataset = dataset + self._transform = transform + + def __len__(self) -> int: + return len(self._dataset) + + def __getitem__(self, index: int) -> Any: + item = self._dataset[index] + return self._transform(*item) if isinstance(item, tuple) else self._transform(item) + + def transform(self, transform: Callable[..., Any]) -> "SVHN.TransformedDataset": + return SVHN.TransformedDataset(self, transform) + + def __init__(self, decode_on_demand: bool = False) -> None: + for dataset, size in [("train", 10_000), ("dev", 1_267), ("test", 4_535)]: + path = "svhn.{}.tfrecord".format(dataset) + if not os.path.exists(path): + print("Downloading file {}...".format(path), file=sys.stderr) + urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path)) + os.rename("{}.tmp".format(path), path) + + setattr(self, dataset, self.Dataset(path, size, decode_on_demand)) + + train: Dataset + dev: Dataset + test: Dataset + + # TFRecord loading + @staticmethod + def _load_data(path: str, items: int) -> tuple[dict[str, array.array], dict[str, array.array]]: + def get_value() -> np.int64: + nonlocal data, offset + value = np.int64(data[offset] & 0x7F); start = offset; offset += 1 + while data[offset - 1] & 0x80: + value |= (data[offset] & 0x7F) << (7 * (offset - start)); offset += 1 + return value + + def get_value_of_kind(kind: int) -> np.int64: + nonlocal data, offset + assert data[offset] == kind; offset += 1 + return get_value() + + arrays, indices = {}, {} + with open(path, "rb") as file: + for _ in range(items): + length = file.read(8); assert len(length) == 8 + length, = struct.unpack("> 2, offset).astype(np.float32).tobytes()); offset += length + else: + raise ValueError("Unsupported data tag {}".format(data[offset])) + indices[key].append(len(arrays[key])) + return arrays, indices + + # Evaluation infrastructure. + @staticmethod + def evaluate( + gold_dataset: "SVHN.Dataset", predictions: Sequence[tuple[list[int], list[BBox]]], iou_threshold: float = 0.5, + ) -> float: + def bbox_iou(x: SVHN.BBox, y: SVHN.BBox) -> float: + def area(bbox: SVHN.BBox) -> float: + return max(bbox[SVHN.BOTTOM] - bbox[SVHN.TOP], 0) * max(bbox[SVHN.RIGHT] - bbox[SVHN.LEFT], 0) + intersection = [max(x[SVHN.TOP], y[SVHN.TOP]), max(x[SVHN.LEFT], y[SVHN.LEFT]), + min(x[SVHN.BOTTOM], y[SVHN.BOTTOM]), min(x[SVHN.RIGHT], y[SVHN.RIGHT])] + x_area, y_area, intersection_area = area(x), area(y), area(intersection) + return intersection_area / (x_area + y_area - intersection_area) + + gold = [(np.array(example["classes"]), np.array(example["bboxes"])) for example in gold_dataset] + + if len(predictions) != len(gold): + raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format( + len(predictions), len(gold))) + + correct = 0 + for (gold_classes, gold_bboxes), (prediction_classes, prediction_bboxes) in zip(gold, predictions): + if len(gold_classes) != len(prediction_classes): + continue + + used = [False] * len(gold_classes) + for cls, bbox in zip(prediction_classes, prediction_bboxes): + best = None + for i in range(len(gold_classes)): + if used[i] or gold_classes[i] != cls: + continue + iou = bbox_iou(bbox, gold_bboxes[i]) + if iou >= iou_threshold and (best is None or iou > best_iou): + best, best_iou = i, iou + if best is None: + break + used[best] = True + correct += all(used) + + return 100 * correct / len(gold) + + @staticmethod + def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float: + predictions = [] + for line in predictions_file: + values = line.split() + if len(values) % 5: + raise RuntimeError("Each prediction must contain multiple of 5 numbers, found {}".format(len(values))) + + predictions.append(([], [])) + for i in range(0, len(values), 5): + predictions[-1][0].append(int(values[i])) + predictions[-1][1].append([float(value) for value in values[i + 1:i + 5]]) + + return SVHN.evaluate(gold_dataset, predictions) + + # Visualization infrastructure. + @staticmethod + def visualize(image: np.ndarray, labels: list[Any], bboxes: list[BBox], show: bool): + """Visualize the given image plus recognized objects. + + Arguments: + - `image` is NumPy input image with pixels in range [0-255]; + - `labels` is a list of labels to be shown using the `str` method; + - `bboxes` is a list of `BBox`es (fourtuples TOP, LEFT, BOTTOM, RIGHT); + - `show` controls whether to show the figure or return it: + - if `True`, the figure is shown using `plt.show()`; + - if `False`, the `plt.Figure` instance is returned; it can be saved + to TensorBoard using a the `add_figure` method of a `SummaryWriter`. + """ + import matplotlib.pyplot as plt + + figure = plt.figure(figsize=(4, 4)) + plt.axis("off") + plt.imshow(np.asarray(image, np.uint8)) + for label, (top, left, bottom, right) in zip(labels, bboxes): + plt.gca().add_patch(plt.Rectangle( + [left, top], right - left, bottom - top, fill=False, edgecolor=[1, 0, 1], linewidth=2)) + plt.gca().text(left, top, str(label), bbox={"facecolor": [1, 0, 1], "alpha": 0.5}, + clip_box=plt.gca().clipbox, clip_on=True, ha="left", va="top") + + if show: + plt.show() + else: + return figure + + +if __name__ == "__main__": + import argparse + parser = argparse.ArgumentParser() + parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate") + parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate") + parser.add_argument("--visualize", default=None, type=str, help="Prediction file to visualize") + args = parser.parse_args() + + if args.evaluate: + with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file: + accuracy = SVHN.evaluate_file(getattr(SVHN(decode_on_demand=True), args.dataset), predictions_file) + print("SVHN accuracy: {:.2f}%".format(accuracy)) + + if args.visualize: + with open(args.visualize, "r", encoding="utf-8-sig") as predictions_file: + for line, example in zip(predictions_file, getattr(SVHN(decode_on_demand=True), args.dataset)): + values = line.split() + classes, bboxes = [], [] + for i in range(0, len(values), 5): + classes.append(values[i]) + bboxes.append([float(value) for value in values[i + 1:i + 5]]) + SVHN.visualize(example["image"], classes, bboxes, show=True) diff --git a/labs/07/3d_recognition.py b/labs/07/3d_recognition.py new file mode 100644 index 0000000..fefe6d1 --- /dev/null +++ b/labs/07/3d_recognition.py @@ -0,0 +1,81 @@ +#!/usr/bin/env python3 +import argparse +import datetime +import os +import re +os.environ.setdefault("KERAS_BACKEND", "torch") # Use PyTorch backend unless specified otherwise + +import keras +import numpy as np +import torch + +from modelnet import ModelNet + +# TODO: Define reasonable defaults and optionally more parameters. +# Also, you can set the number of threads to 0 to use all your CPU cores. +parser = argparse.ArgumentParser() +parser.add_argument("--batch_size", default=..., type=int, help="Batch size.") +parser.add_argument("--epochs", default=..., type=int, help="Number of epochs.") +parser.add_argument("--modelnet", default=20, type=int, help="ModelNet dimension.") +parser.add_argument("--seed", default=42, type=int, help="Random seed.") +parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") + + +class TorchTensorBoardCallback(keras.callbacks.Callback): + def __init__(self, path): + self._path = path + self._writers = {} + + def writer(self, writer): + if writer not in self._writers: + import torch.utils.tensorboard + self._writers[writer] = torch.utils.tensorboard.SummaryWriter(os.path.join(self._path, writer)) + return self._writers[writer] + + def add_logs(self, writer, logs, step): + if logs: + for key, value in logs.items(): + self.writer(writer).add_scalar(key, value, step) + self.writer(writer).flush() + + def on_epoch_end(self, epoch, logs=None): + if logs: + if isinstance(getattr(self.model, "optimizer", None), keras.optimizers.Optimizer): + logs = logs | {"learning_rate": keras.ops.convert_to_numpy(self.model.optimizer.learning_rate)} + self.add_logs("train", {k: v for k, v in logs.items() if not k.startswith("val_")}, epoch + 1) + self.add_logs("val", {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, epoch + 1) + + +def main(args: argparse.Namespace) -> None: + # Set the random seed and the number of threads. + keras.utils.set_random_seed(args.seed) + if args.threads: + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) + + # Create logdir name + args.logdir = os.path.join("logs", "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) + )) + + # Load the data + modelnet = ModelNet(args.modelnet) + + # TODO: Create the model and train it + model = ... + + # Generate test set annotations, but in `args.logdir` to allow parallel execution. + os.makedirs(args.logdir, exist_ok=True) + with open(os.path.join(args.logdir, "3d_recognition.txt"), "w", encoding="utf-8") as predictions_file: + # TODO: Predict the probabilities on the test set + test_probabilities = model.predict(...) + + for probs in test_probabilities: + print(np.argmax(probs), file=predictions_file) + + +if __name__ == "__main__": + args = parser.parse_args([] if "__file__" not in globals() else None) + main(args) diff --git a/labs/07/modelnet.py b/labs/07/modelnet.py new file mode 100644 index 0000000..2e7d513 --- /dev/null +++ b/labs/07/modelnet.py @@ -0,0 +1,108 @@ +import os +import sys +from typing import Any, Callable, Sequence, TextIO, TypedDict +import urllib.request + +import numpy as np +import torch + + +class ModelNet: + # The D, H, W are set in the constructor depending + # on requested resolution and are only instance variables. + D: int + H: int + W: int + C: int = 1 + LABELS: list[str] = [ + "bathtub", "bed", "chair", "desk", "dresser", "monitor", "night_stand", "sofa", "table", "toilet", + ] + Element = TypedDict("Element", {"grid": np.ndarray, "label": np.ndarray}) + Elements = TypedDict("Elements", {"grids": np.ndarray, "labels": np.ndarray}) + + _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/modelnet{}.npz" + + class Dataset(torch.utils.data.Dataset): + def __init__(self, data: "ModelNet.Elements", seed: int = 42) -> None: + self._data = data + + @property + def data(self) -> "ModelNet.Elements": + return self._data + + def __len__(self) -> int: + return len(self._data["grids"]) + + def __getitem__(self, index: int) -> "ModelNet.Element": + return {key.removesuffix("s"): value[index] for key, value in self._data.items()} + + def transform(self, transform: Callable[["ModelNet.Element"], Any]) -> "ModelNet.TransformedDataset": + return ModelNet.TransformedDataset(self, transform) + + class TransformedDataset(torch.utils.data.Dataset): + def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None: + self._dataset = dataset + self._transform = transform + + def __len__(self) -> int: + return len(self._dataset) + + def __getitem__(self, index: int) -> Any: + item = self._dataset[index] + return self._transform(*item) if isinstance(item, tuple) else self._transform(item) + + def transform(self, transform: Callable[..., Any]) -> "ModelNet.TransformedDataset": + return ModelNet.TransformedDataset(self, transform) + + # The resolution parameter can be either 20 or 32. + def __init__(self, resolution: int) -> None: + assert resolution in [20, 32], "Only 20 or 32 resolution is supported" + + self.D = self.H = self.W = resolution + url = self._URL.format(resolution) + + path = os.path.basename(url) + if not os.path.exists(path): + print("Downloading {} dataset...".format(path), file=sys.stderr) + urllib.request.urlretrieve(url, filename="{}.tmp".format(path)) + os.rename("{}.tmp".format(path), path) + + modelnet = np.load(path) + for dataset, _size in [("train", 3_718), ("dev", 273), ("test", 908)]: + data = dict((key[len(dataset) + 1:], modelnet[key]) for key in modelnet if key.startswith(dataset)) + setattr(self, dataset, self.Dataset(data)) + + train: Dataset + dev: Dataset + test: Dataset + + # Evaluation infrastructure. + @staticmethod + def evaluate(gold_dataset: Dataset, predictions: Sequence[int]) -> float: + gold = gold_dataset.data["labels"] + + if len(predictions) != len(gold): + raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format( + len(predictions), len(gold))) + + correct = sum(gold[i] == predictions[i] for i in range(len(gold))) + return 100 * correct / len(gold) + + @staticmethod + def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float: + predictions = [int(line) for line in predictions_file] + return ModelNet.evaluate(gold_dataset, predictions) + + +if __name__ == "__main__": + import argparse + parser = argparse.ArgumentParser() + parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate") + parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate") + parser.add_argument("--dim", default=20, type=int, help="ModelNet dimensionality to use") + args = parser.parse_args() + + if args.evaluate: + with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file: + accuracy = ModelNet.evaluate_file(getattr(ModelNet(args.dim), args.dataset), predictions_file) + print("ModelNet accuracy: {:.2f}%".format(accuracy)) diff --git a/labs/08/morpho_analyzer.py b/labs/08/morpho_analyzer.py new file mode 100644 index 0000000..d0f0fa7 --- /dev/null +++ b/labs/08/morpho_analyzer.py @@ -0,0 +1,45 @@ +import os +import sys +import urllib.request +import zipfile + + +class MorphoAnalyzer: + """ Loads a morphological analyses in a vertical format. + + The analyzer provides only a method `get(word: str)` returning a list + of analyses, each containing two fields `lemma` and `tag`. + If an analysis of the word is not found, an empty list is returned. + """ + + _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/" + + class LemmaTag: + def __init__(self, lemma: str, tag: str) -> None: + self.lemma = lemma + self.tag = tag + + def __repr__(self) -> str: + return "(lemma: {}, tag: {})".format(self.lemma, self.tag) + + def __init__(self, dataset: str) -> None: + path = "{}.zip".format(dataset) + if not os.path.exists(path): + print("Downloading dataset {}...".format(dataset), file=sys.stderr) + urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path)) + os.rename("{}.tmp".format(path), path) + + self.analyses = {} + with zipfile.ZipFile(path, "r") as zip_file: + with zip_file.open("{}.txt".format(dataset), "r") as analyses_file: + for line in analyses_file: + line = line.decode("utf-8").rstrip("\n") + columns = line.split("\t") + + analyses = [] + for i in range(1, len(columns) - 1, 2): + analyses.append(self.LemmaTag(columns[i], columns[i + 1])) + self.analyses[columns[0]] = analyses + + def get(self, word: str) -> list[LemmaTag]: + return self.analyses.get(word, []) diff --git a/labs/08/morpho_dataset.py b/labs/08/morpho_dataset.py new file mode 100644 index 0000000..5a47c41 --- /dev/null +++ b/labs/08/morpho_dataset.py @@ -0,0 +1,253 @@ +import os +import sys +from typing import Any, BinaryIO, Callable, Sequence, TextIO, TypedDict +import urllib.request +import zipfile + +import torch + + +# A class for managing mapping between strings and indices. +# It provides: +# - `__len__`: number of strings in the vocabulary +# - `string(index: int) -> str`: string for a given index to the vocabulary +# - `strings(indices: Sequence[int]) -> list[str]`: list of strings for given indices +# - `index(string: str) -> int`: index of a given string in the vocabulary +# - `indices(strings: Sequence[str]) -> list[int]`: list of indices for given strings +class Vocabulary: + PAD: int = 0 + UNK: int = 1 + + def __init__(self, strings: Sequence[str]) -> None: + self._strings = ["[PAD]", "[UNK]"] + self._strings.extend(strings) + self._string_map = {string: index for index, string in enumerate(self._strings)} + + def __len__(self) -> int: + return len(self._strings) + + def string(self, index: int) -> str: + return self._strings[index] + + def strings(self, indices: Sequence[int]) -> list[str]: + return [self._strings[index] for index in indices] + + def index(self, string: str) -> int: + return self._string_map.get(string, Vocabulary.UNK) + + def indices(self, strings: Sequence[str]) -> list[int]: + return [self._string_map.get(string, Vocabulary.UNK) for string in strings] + + +# Loads a morphological dataset in a vertical format. +# - The data consists of three datasets +# - `train` +# - `dev` +# - `test` +# - Each dataset is a `torch.utils.data.Dataset` providing +# - `__len__`: number of sentences in the dataset +# - `__getitem__`: return the requested sentence as an `Element` +# instance, which is a dictionary with keys "forms"/"lemmas"/"tags", +# each being a list of strings +# - `forms`, `lemmas`, `tags`: instances of type `Factor` containing +# the following fields: +# - `strings`: a Python list containing input sentences, each being +# a list of strings (forms/lemmas/tags) +# - `word_vocab`: a `Vocabulary` object capable of mapping words to +# indices. It is constructed on the train set and shared by the dev +# and test sets +# - `char_vocab`: a `Vocabulary` object capable of mapping characters +# to indices. It is constructed on the train set and shared by the dev +# and test sets +# - `cle_batch`: a method for creating inputs for character-level embeddings. +# It takes a list of sentences, each being a list of string forms, and produces +# a tuple of two tensors: +# - `unique_forms` with shape `[num_unique_forms, max_form_length]` containing +# each unique form as a sequence of character ids +# - `forms_indices` with shape `[num_sentences, max_sentence_length]` +# containing for every form its index in `unique_forms` +class MorphoDataset: + PAD: int = 0 + UNK: int = 1 + BOW: int = 2 + EOW: int = 3 + + _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/" + + Element = TypedDict("Element", {"forms": list[str], "lemmas": list[str], "tags": list[str]}) + + class Factor: + word_vocab: Vocabulary + char_vocab: Vocabulary + strings: list[list[str]] + + def __init__(self) -> None: + self.strings = [] + + def finalize(self, train: Any | None = None) -> None: + # Create vocabularies + if train: + self.word_vocab = train.word_vocab + self.char_vocab = train.char_vocab + else: + strings = sorted(set(string for sentence in self.strings for string in sentence)) + self.word_vocab = Vocabulary(strings) + + bow_eow = ["[BOW]", "[EOW]"] + self.char_vocab = Vocabulary(bow_eow + sorted(set(char for string in strings for char in string))) + + class Dataset(torch.utils.data.Dataset): + def __init__(self, data_file: BinaryIO, train: Any | None = None, max_sentences: int | None = None) -> None: + # Create factors + self._factors = (MorphoDataset.Factor(), MorphoDataset.Factor(), MorphoDataset.Factor()) + self._factors_tensors = None + + # Load the data + self._size = 0 + in_sentence = False + for line in data_file: + line = line.decode("utf-8").rstrip("\r\n") + if line: + if not in_sentence: + for factor in self._factors: + factor.strings.append([]) + self._size += 1 + + columns = line.split("\t") + assert len(columns) == len(self._factors) + for column, factor in zip(columns, self._factors): + factor.strings[-1].append(column) + + in_sentence = True + else: + in_sentence = False + if max_sentences is not None and self._size >= max_sentences: + break + + # Finalize the mappings + for i, factor in enumerate(self._factors): + factor.finalize(train._factors[i] if train else None) + + @property + def forms(self) -> "MorphoDataset.Factor": + return self._factors[0] + + @property + def lemmas(self) -> "MorphoDataset.Factor": + return self._factors[1] + + @property + def tags(self) -> "MorphoDataset.Factor": + return self._factors[2] + + def __len__(self) -> int: + return self._size + + def __getitem__(self, index: int) -> "MorphoDataset.Element": + return {"forms": self.forms.strings[index], + "lemmas": self.lemmas.strings[index], + "tags": self.tags.strings[index]} + + def transform(self, transform: Callable[["MorphoDataset.Element"], Any]) -> "MorphoDataset.TransformedDataset": + return MorphoDataset.TransformedDataset(self, transform) + + def cle_batch(self, forms: list[list[str]]) -> tuple[torch.Tensor, torch.Tensor]: + unique_strings = list(set(form for sentence in forms for form in sentence)) + unique_string_map = {form: index + 1 for index, form in enumerate(unique_strings)} + unique_forms = torch.nn.utils.rnn.pad_sequence( + [torch.tensor([MorphoDataset.UNK])] + + [torch.tensor(self.forms.char_vocab.indices(form)) for form in unique_strings], batch_first=True) + forms_indices = torch.nn.utils.rnn.pad_sequence( + [torch.tensor([unique_string_map[form] for form in sentence]) for sentence in forms], batch_first=True) + return unique_forms, forms_indices + + class TransformedDataset(torch.utils.data.Dataset): + def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None: + self._dataset = dataset + self._transform = transform + + def __len__(self) -> int: + return len(self._dataset) + + def __getitem__(self, index: int) -> Any: + item = self._dataset[index] + return self._transform(*item) if isinstance(item, tuple) else self._transform(item) + + def transform(self, transform: Callable[..., Any]) -> "MorphoDataset.TransformedDataset": + return MorphoDataset.TransformedDataset(self, transform) + + def __init__(self, dataset, max_sentences=None): + path = "{}.zip".format(dataset) + if not os.path.exists(path): + print("Downloading dataset {}...".format(dataset), file=sys.stderr) + urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path)) + os.rename("{}.tmp".format(path), path) + + with zipfile.ZipFile(path, "r") as zip_file: + for dataset in ["train", "dev", "test"]: + with zip_file.open("{}_{}.txt".format(os.path.splitext(path)[0], dataset), "r") as dataset_file: + setattr(self, dataset, self.Dataset( + dataset_file, train=self.train if dataset != "train" else None, + max_sentences=max_sentences)) + + train: Dataset + dev: Dataset + test: Dataset + + # Evaluation infrastructure. + @staticmethod + def evaluate(gold_dataset: "MorphoDataset.Factor", predictions: Sequence[str]) -> float: + gold_sentences = gold_dataset.strings + + predicted_sentences, in_sentence = [], False + for line in predictions: + line = line.rstrip("\n") + if not line: + in_sentence = False + else: + if not in_sentence: + predicted_sentences.append([]) + in_sentence = True + predicted_sentences[-1].append(line) + + if len(predicted_sentences) != len(gold_sentences): + raise RuntimeError("The predictions contain different number of sentences than gold data: {} vs {}".format( + len(predicted_sentences), len(gold_sentences))) + + correct, total = 0, 0 + for i, (predicted_sentence, gold_sentence) in enumerate(zip(predicted_sentences, gold_sentences)): + if len(predicted_sentence) != len(gold_sentence): + raise RuntimeError("Predicted sentence {} has different number of words than gold: {} vs {}".format( + i + 1, len(predicted_sentence), len(gold_sentence))) + correct += sum(predicted == gold for predicted, gold in zip(predicted_sentence, gold_sentence)) + total += len(predicted_sentence) + + return 100 * correct / total + + @staticmethod + def evaluate_file(gold_dataset: "MorphoDataset.Factor", predictions_file: TextIO) -> float: + predictions = predictions_file.readlines() + return MorphoDataset.evaluate(gold_dataset, predictions) + + +if __name__ == "__main__": + import argparse + parser = argparse.ArgumentParser() + parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate") + parser.add_argument("--corpus", default="czech_pdt", type=str, help="The corpus to evaluate") + parser.add_argument("--dataset", default="dev", type=str, help="The dataset to evaluate (dev/test)") + parser.add_argument("--task", default="tagger", type=str, help="Task to evaluate (tagger/lemmatizer)") + args = parser.parse_args() + + if args.evaluate: + gold = getattr(MorphoDataset(args.corpus), args.dataset) + if args.task == "tagger": + gold = gold.tags + elif args.task == "lemmatizer": + gold = gold.lemmas + else: + raise ValueError("Unknown task '{}', valid values are only 'tagger' or 'lemmatizer'".format(args.task)) + + with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file: + accuracy = MorphoDataset.evaluate_file(gold, predictions_file) + print("{} accuracy: {:.2f}%".format(args.task.title(), accuracy)) diff --git a/labs/08/sequence_classification.py b/labs/08/sequence_classification.py new file mode 100644 index 0000000..05a43bb --- /dev/null +++ b/labs/08/sequence_classification.py @@ -0,0 +1,222 @@ +#!/usr/bin/env python3 +import argparse +import datetime +import os +import re + +os.environ.setdefault( + "KERAS_BACKEND", "torch" +) # Use PyTorch backend unless specified otherwise + +import keras +import numpy as np +import torch + +parser = argparse.ArgumentParser() +# These arguments will be set appropriately by ReCodEx, even if you change them. +parser.add_argument("--batch_size", default=16, type=int, help="Batch size.") +parser.add_argument( + "--clip_gradient", default=None, type=float, help="Norm for gradient clipping." +) +parser.add_argument("--epochs", default=20, type=int, help="Number of epochs.") +parser.add_argument( + "--hidden_layer", default=0, type=int, help="Additional hidden layer after RNN." +) +parser.add_argument( + "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx." +) +parser.add_argument( + "--rnn", + default="LSTM", + choices=["LSTM", "GRU", "SimpleRNN"], + help="RNN layer type.", +) +parser.add_argument("--rnn_dim", default=10, type=int, help="RNN layer dimension.") +parser.add_argument("--seed", default=42, type=int, help="Random seed.") +parser.add_argument( + "--sequence_dim", default=1, type=int, help="Sequence element dimension." +) +parser.add_argument("--sequence_length", default=50, type=int, help="Sequence length.") +parser.add_argument( + "--test_sequences", default=1000, type=int, help="Number of testing sequences." +) +parser.add_argument( + "--threads", default=1, type=int, help="Maximum number of threads to use." +) +parser.add_argument( + "--train_sequences", default=10000, type=int, help="Number of training sequences." +) +# If you add more arguments, ReCodEx will keep them with your default values. + + +class TorchTensorBoardCallback(keras.callbacks.Callback): + def __init__(self, path): + self._path = path + self._writers = {} + + def writer(self, writer): + if writer not in self._writers: + import torch.utils.tensorboard + + self._writers[writer] = torch.utils.tensorboard.SummaryWriter( + os.path.join(self._path, writer) + ) + return self._writers[writer] + + def add_logs(self, writer, logs, step): + if logs: + for key, value in logs.items(): + self.writer(writer).add_scalar(key, value, step) + self.writer(writer).flush() + + def on_epoch_end(self, epoch, logs=None): + if logs: + if isinstance( + getattr(self.model, "optimizer", None), keras.optimizers.Optimizer + ): + logs = logs | { + "learning_rate": keras.ops.convert_to_numpy( + self.model.optimizer.learning_rate + ) + } + self.add_logs( + "train", + {k: v for k, v in logs.items() if not k.startswith("val_")}, + epoch + 1, + ) + self.add_logs( + "val", + {k[4:]: v for k, v in logs.items() if k.startswith("val_")}, + epoch + 1, + ) + + +# Dataset for generating sequences, with labels predicting whether the cumulative sum +# is odd/even. +class Dataset: + def __init__( + self, sequences_num: int, sequence_length: int, sequence_dim: int, seed: int + ) -> None: + sequences = np.zeros([sequences_num, sequence_length, sequence_dim], np.int32) + labels = np.zeros([sequences_num, sequence_length, 1], bool) + generator = np.random.RandomState(seed) + for i in range(sequences_num): + sequences[i, :, 0] = generator.randint( + 0, max(2, sequence_dim), size=[sequence_length] + ) + labels[i, :, 0] = np.bitwise_and(np.cumsum(sequences[i, :, 0]), 1) + if sequence_dim > 1: + sequences[i] = np.eye(sequence_dim)[sequences[i, :, 0]] + self._data = {"sequences": sequences.astype(np.float32), "labels": labels} + self._size = sequences_num + + @property + def data(self) -> dict[str, np.ndarray]: + return self._data + + @property + def size(self) -> int: + return self._size + + +class Model(keras.Model): + def __init__(self, args: argparse.Namespace) -> None: + # Construct the model. + sequences = keras.Input(shape=[args.sequence_length, args.sequence_dim]) + + # DO: Process the sequence using a RNN with type `args.rnn` and + # with dimensionality `args.rnn_dim`. Use `return_sequences=True` + # to get outputs for all sequence elements. + # + # Prefer `keras.layers.{LSTM,GRU,SimpleRNN}` to + # `keras.layers.RNN` wrapper with `keras.layers.{LSTM,GRU,SimpleRNN}Cell`, + # because the former is considerably faster (even if the GPU support in + # PyTorch is not optimal in the current Keras 3.2.1.) + + layer_type = ( + keras.layers.LSTM + if args.rnn == "LSTM" + else keras.layers.GRU + if args.rnn == "GRU" + else keras.layers.SimpleRNN + ) + + hidden = layer_type(units=args.rnn_dim, return_sequences=True)(sequences) + + # DO: If `args.hidden_layer` is nonzero, process the result using + # a ReLU-activated fully connected layer with `args.hidden_layer` units. + + if args.hidden_layer: + hidden = keras.layers.Dense(args.hidden_layer, activation="relu")(hidden) + + # DO: Generate `predictions` using a fully connected layer + # with one output and sigmoid activation. + + predictions = keras.layers.Dense(1, activation="sigmoid")(hidden) + + super().__init__(inputs=sequences, outputs=predictions) + + self.compile( + # DO: Create an Adam optimizer, passing the option `clipnorm=args.clip_gradient` + # to clip the gradient, with `None` representing no clipping (the default). + optimizer=keras.optimizers.Adam(clipnorm=args.clip_gradient), + loss=keras.losses.BinaryCrossentropy(), + metrics=[keras.metrics.BinaryAccuracy("accuracy")], + ) + + self.tb_callback = TorchTensorBoardCallback(args.logdir) + + +def main(args: argparse.Namespace) -> dict[str, float]: + # Set the random seed and the number of threads. + keras.utils.set_random_seed(args.seed) + if args.threads: + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) + + # Create logdir name + args.logdir = os.path.join( + "logs", + "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join( + ( + "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) + for k, v in sorted(vars(args).items()) + ) + ), + ), + ) + + # Create the data + train = Dataset( + args.train_sequences, args.sequence_length, args.sequence_dim, seed=42 + ) + test = Dataset( + args.test_sequences, args.sequence_length, args.sequence_dim, seed=43 + ) + + # Create the model and train + model = Model(args) + + logs = model.fit( + train.data["sequences"], + train.data["labels"], + batch_size=args.batch_size, + epochs=args.epochs, + validation_data=(test.data["sequences"], test.data["labels"]), + callbacks=[model.tb_callback], + ) + + # Return development metrics for ReCodEx to validate. + return { + metric: values[-1] + for metric, values in logs.history.items() + if metric.startswith("val_") + } + + +if __name__ == "__main__": + args = parser.parse_args([] if "__file__" not in globals() else None) + main(args) diff --git a/labs/08/tagger_cle.py b/labs/08/tagger_cle.py new file mode 100644 index 0000000..b2cda59 --- /dev/null +++ b/labs/08/tagger_cle.py @@ -0,0 +1,562 @@ +#!/usr/bin/env python3 +import argparse +import datetime +import os +import re + +import numpy as np +import torch +import torchmetrics + +from morpho_dataset import MorphoDataset + +parser = argparse.ArgumentParser() +# These arguments will be set appropriately by ReCodEx, even if you change them. +parser.add_argument("--batch_size", default=10, type=int, help="Batch size.") +parser.add_argument("--cle_dim", default=32, type=int, help="CLE embedding dimension.") +parser.add_argument("--epochs", default=5, type=int, help="Number of epochs.") +parser.add_argument( + "--max_sentences", + default=None, + type=int, + help="Maximum number of sentences to load.", +) +parser.add_argument( + "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx." +) +parser.add_argument( + "--rnn", default="LSTM", choices=["LSTM", "GRU"], help="RNN layer type." +) +parser.add_argument("--rnn_dim", default=64, type=int, help="RNN layer dimension.") +parser.add_argument("--seed", default=42, type=int, help="Random seed.") +parser.add_argument( + "--threads", default=1, type=int, help="Maximum number of threads to use." +) +parser.add_argument("--we_dim", default=64, type=int, help="Word embedding dimension.") +parser.add_argument( + "--word_masking", + default=0.0, + type=float, + help="Mask words with the given probability.", +) +# If you add more arguments, ReCodEx will keep them with your default values. + + +class TrainableModule(torch.nn.Module): + """A simple Keras-like module for training with raw PyTorch. + + The module provides fit/evaluate/predict methods, computes loss and metrics, + and generates both TensorBoard and console logs. By default, it uses GPU + if available, and CPU otherwise. Additionally, it offers a Keras-like + initialization of the weights. + + The current implementation supports models with either single input or + a tuple of inputs; however, only one output is currently supported. + """ + + from torch.utils.tensorboard import SummaryWriter as _SummaryWriter + from time import time as _time + from tqdm import tqdm as _tqdm + + def configure( + self, + *, + optimizer=None, + schedule=None, + loss=None, + metrics={}, + logdir=None, + device="auto", + ): + """Configure the module process. + + - `optimizer` is the optimizer to use for training; + - `schedule` is an optional learning rate scheduler used after every batch; + - `loss` is the loss function to minimize; + - `metrics` is a dictionary of additional metrics to compute; + - `logdir` is an optional directory where TensorBoard logs should be written; + - `device` is the device to use; when "auto", `cuda` is used when available, `cpu` otherwise. + """ + self.optimizer = optimizer + self.schedule = schedule + self.loss, self.loss_metric = loss, torchmetrics.MeanMetric() + self.metrics = torchmetrics.MetricCollection(metrics) + self.logdir, self._writers = logdir, {} + self.device = torch.device( + ("cuda" if torch.cuda.is_available() else "cpu") + if device == "auto" + else device + ) + self.to(self.device) + + def load_weights(self, path, device="auto"): + """Load the model weights from the given path.""" + self.device = torch.device( + ("cuda" if torch.cuda.is_available() else "cpu") + if device == "auto" + else device + ) + self.load_state_dict(torch.load(path, map_location=self.device)) + + def save_weights(self, path): + """Save the model weights to the given path.""" + state_dict = self.state_dict() + torch.save(state_dict, path) + + def fit(self, dataloader, epochs, dev=None, callbacks=[], verbose=1): + """Train the model on the given dataset. + + - `dataloader` is the training dataset, each element a pair of inputs and an output; + the inputs can be either a single tensor or a tuple of tensors; + - `dev` is an optional development dataset; + - `epochs` is the number of epochs to train; + - `callbacks` is a list of callbacks to call after each epoch with + arguments `self`, `epoch`, and `logs`; + - `verbose` controls the verbosity: 0 for silent, 1 for persistent progress bar, + 2 for a progress bar only when writing to a console. + """ + for epoch in range(epochs): + self.train() + self.loss_metric.reset() + self.metrics.reset() + start = self._time() + epoch_message = f"Epoch={epoch+1}/{epochs}" + data_and_progress = self._tqdm( + dataloader, + epoch_message, + unit="batch", + leave=False, + disable=None if verbose == 2 else not verbose, + ) + for xs, y in data_and_progress: + xs, y = ( + tuple( + x.to(self.device) + for x in (xs if isinstance(xs, tuple) else (xs,)) + ), + y.to(self.device), + ) + logs = self.train_step(xs, y) + message = [epoch_message] + [ + f"{k}={v:.{0 None: + super().__init__() + + # Create all needed layers. + # DID: Create a word masking layer `self.MaskElements` with the given + # `args.word_masking` probability and `MorphoDataset.UNK` as the masking value. + self._word_masking = self.MaskElements(args.word_masking, MorphoDataset.UNK) + + # DID: Create a `torch.nn.Embedding` layer for embedding the character ids + # from `train.forms.char_vocab` to dimensionality `args.cle_dim`. + self._char_embedding = torch.nn.Embedding( + len(train.forms.char_vocab), args.cle_dim + ) + + # DID: Create a `torch.nn.GRU` layer processing the character embeddings, + # producing output of dimensionality `args.cle_dim`, concatenating the + # outputs of forward and backward directions. Also pass `batch_first=True`. + self._char_rnn = torch.nn.GRU( + input_size=args.cle_dim, + hidden_size=args.cle_dim, + bidirectional=True, + batch_first=True, + ) + + # DO:(tagger_we) Create a `torch.nn.Embedding` layer, embedding the form ids + # from `train.forms.word_vocab` to dimensionality `args.we_dim`. + self._word_embedding = torch.nn.Embedding( + len(train.forms.word_vocab), args.we_dim + ) + + # DID: Create an RNN layer, either `torch.nn.LSTM` or `torch.nn.GRU` depending + # on `args.rnn`. The layer should be bidirectional (`bidirectional=True`), summing + # the outputs of forward and backward directions. The layer processes the above + # embeddings generated by the `self._word_embedding` layer, **now concatenated + # with the character-level embeddings**, and produces output of dimensionality + # `args.rnn_dim`; pass `batch_first=True` to the constructor. + self._word_rnn = (torch.nn.LSTM if args.rnn == "LSTM" else torch.nn.GRU)( + input_size=args.we_dim + 2 * args.cle_dim, + hidden_size=args.rnn_dim, + bidirectional=True, + batch_first=True, + ) + + # TODO(tagger_we): Create an output linear layer (`torch.nn.Linear`) processing the RNN output, + # producing logits for tag prediction; `train.tags.word_vocab` is the tag vocabulary. + self._output_layer = torch.nn.Linear(args.rnn_dim, len(train.tags.word_vocab)) + + # Initialize the layers using the Keras-inspired initialization. You can try + # removing this line to see how much worse the default PyTorch initialization is. + self.apply(self.keras_init) + + def forward( + self, + form_ids: torch.Tensor, + unique_forms: torch.Tensor, + form_indices: torch.Tensor, + ) -> torch.Tensor: + # DID: Mask the input `form_ids` using the `self._word_masking` layer. + form_ids = self._word_masking(form_ids) + + # DID(tagger_we): Embed the masked `form_ids` using the word embedding layer. + hidden = self._word_embedding(form_ids) + + # DID: Embed the `unique_forms` using the character embedding layer. + cle = self._char_embedding(unique_forms) + + # DID: Pass the character embeddings through the character-level RNN. + # As during word-level RNN, start by packing the input sequence. + packed = torch.nn.utils.rnn.pack_padded_sequence( + input=cle, + lengths=torch.sum(unique_forms != MorphoDataset.PAD, dim=-1), + batch_first=True, + enforce_sorted=False, + ) + + # Pass the `PackedSequence` through the character RNN. Note that this time + # we are interested only in the second output (the last hidden state of the RNN). + _, cle = self._char_rnn(packed) + + forward_pass = cle[0, :, :] + backward_pass = cle[1, :, :] + + # DID: Concatenate the states of the forward and backward directions. + cle = torch.cat((forward_pass, backward_pass), dim=-1) + + # DID: With `cle` being the character-level embeddings of the unique forms + # of shape `[num_unique_forms, 2 * cle_dim]`, create the representation of the + # (not necessary unique) sentence forms by indexing the character-level + # embeddings with the `form_indices`. The result should have a shape + # `[batch_size, max_sentence_length, 2 * cle_dim]`. You can use for example + # the `torch.nn.functional.embedding` function. + cle = torch.nn.functional.embedding(form_indices, cle) + + # DID: Concatenate the word embeddings with the character-level embeddings (in this order). + hidden = torch.cat((hidden, cle), dim=-1) + + # DID(tagger_we): Process the embeddings through the RNN layer. Because the sentences + # have different length, you have to use `torch.nn.utils.rnn.pack_padded_sequence` + # to construct a variable-length `PackedSequence` from the input. You need to compute + # the length of each sentence in the batch (by counting non-`MorphoDataset.PAD` tokens); + # note that these lengths must be on CPU, so you might need to use the `.cpu()` method. + # Finally, also pass `batch_first=True` and `enforce_sorted=False` to the call. + packed = torch.nn.utils.rnn.pack_padded_sequence( + input=hidden, + lengths=torch.sum(form_ids != MorphoDataset.PAD, dim=-1), + batch_first=True, + enforce_sorted=False, + ) + + # Pass the `PackedSequence` through the RNN. + hidden, _ = self._word_rnn(packed) + + # DID(tagger_we): Unpack the RNN output using the `torch.nn.utils.rnn.pad_packed_sequence` with + # `batch_first=True` argument. Then sum the outputs of forward and backward directions. + stacked, _ = torch.nn.utils.rnn.pad_packed_sequence(hidden, batch_first=True) + + forward_pass, backward_pass = torch.chunk(stacked, 2, dim=-1) + + hidden = forward_pass + backward_pass + + # DID(tagger_we): Pass the RNN output through the output layer. Such an output has a shape + # `[batch_size, sequence_length, num_tags]`, but the loss and the metric expect + # the `num_tags` dimension to be in front (`[batch_size, num_tags, sequence_length]`), + # so you need to reorder the dimension. + + hidden = self._output_layer(hidden).permute(0, 2, 1) + + return hidden + + +def main(args: argparse.Namespace) -> dict[str, float]: + # Set the random seed and the number of threads. + np.random.seed(args.seed) + torch.manual_seed(args.seed) + if args.threads: + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) + + # Create logdir name + args.logdir = os.path.join( + "logs", + "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join( + ( + "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) + for k, v in sorted(vars(args).items()) + ) + ), + ), + ) + + # Load the data + morpho = MorphoDataset("czech_cac", max_sentences=args.max_sentences) + + # Create the model and train + model = Model(args, morpho.train) + + def prepare_tagging_data(example): + # TODO(tagger_we): Construct a single example, each consisting of the following pair: + # - a PyTorch tensor of integer ids of input forms as input, + # - a PyTorch tensor of integer tag ids as targets. + # To create the ids, use `word_vocab` of `morpho.train.forms` and `morpho.train.tags`. + form_ids = torch.tensor( + morpho.train.forms.word_vocab.indices(example["forms"]), + dtype=torch.int64, + ) + tag_ids = torch.tensor( + morpho.train.tags.word_vocab.indices(example["tags"]), + dtype=torch.int64, + ) + # Note that compared to `tagger_we`, we also return the original + # forms in order to be able to compute the character-level embeddings. + return form_ids, example["forms"], tag_ids + + train = morpho.train.transform(prepare_tagging_data) + dev = morpho.dev.transform(prepare_tagging_data) + + def prepare_batch(data): + # Construct a single batch, where `data` is a list of examples + # generated by `prepare_tagging_data`. + form_ids, forms, tag_ids = zip(*data) + # TODO(tagger_we): Combine `form_ids` into a single tensor, padding shorter + # sequences to length of the longest sequence in the batch with zeros + # using `torch.nn.utils.rnn.pad_sequence` with `batch_first=True` argument. + form_ids = torch.nn.utils.rnn.pad_sequence(form_ids, batch_first=True) + # DID: Create required inputs for the character-level embeddings using + # the provided `morpho.train.cle_batch` function on `forms`. The function + # returns a pair of two PyTorch tensors: + # - `unique_forms` with shape `[num_unique_forms, max_form_length]` containing + # each unique form as a sequence of character ids, + # - `forms_indices` with shape `[num_sentences, max_sentence_length]` + # containing for every form its index in `unique_forms`. + unique_forms, forms_indices = morpho.train.cle_batch(forms) + # TODO(tagger_we): Process `tag_ids` analogously to `form_ids`. + tag_ids = torch.nn.utils.rnn.pad_sequence(tag_ids, batch_first=True) + return (form_ids, unique_forms, forms_indices), tag_ids + + train = torch.utils.data.DataLoader( + train, batch_size=args.batch_size, collate_fn=prepare_batch, shuffle=True + ) + dev = torch.utils.data.DataLoader( + dev, batch_size=args.batch_size, collate_fn=prepare_batch + ) + + model.configure( + # TODO(tagger_we): Create the optimizer by creating an instance of + # `torch.optim.Adam`which will train the `model.parameters()`. + optimizer=torch.optim.Adam(model.parameters()), + # TODO(tagger_we): Use `torch.nn.CrossEntropyLoss` to instantiate the loss function. + # Pass `ignore_index=morpho.PAD` to the constructor so that the padded + # tags are ignored during the loss computation. Note that the loss + # expects the input to be of shape `[batch_size, num_tags, sequence_length]`. + loss=torch.nn.CrossEntropyLoss( + ignore_index=morpho.PAD, + ), + # TODO(tagger_we): Create a `torchmetrics.Accuracy` metric, passing "multiclass" as + # the first argument, `num_classes` set to the number of unique tags, and + # again `ignore_index=morpho.PAD` to ignore the padded tags. + metrics={ + "accuracy": torchmetrics.Accuracy( + "multiclass", + num_classes=len(morpho.train.tags.word_vocab), + ignore_index=morpho.PAD, + ) + }, + logdir=args.logdir, + device="cpu", + ) + + logs = model.fit(train, dev=dev, epochs=args.epochs) + + # Return development metrics for ReCodEx to validate. + return { + metric: value for metric, value in logs.items() if metric.startswith("dev_") + } + + +if __name__ == "__main__": + args = parser.parse_args([] if "__file__" not in globals() else None) + main(args) diff --git a/labs/08/tagger_competition.py b/labs/08/tagger_competition.py new file mode 100644 index 0000000..ab8b80c --- /dev/null +++ b/labs/08/tagger_competition.py @@ -0,0 +1,297 @@ +#!/usr/bin/env python3 +import argparse +import datetime +import os +import re + +import numpy as np +import torch +import torchmetrics + +from morpho_analyzer import MorphoAnalyzer +from morpho_dataset import MorphoDataset + +from tagger_cle1 import Model +# from tagger_model import Model +# TODO: Always use masking!!! + +# TODO: Define reasonable defaults and optionally more parameters. +# Also, you can set the number of threads to 0 to use all your CPU cores. +parser = argparse.ArgumentParser() +parser.add_argument("--batch_size", default=64, type=int, help="Batch size.") +parser.add_argument("--cle_dim", default=32, type=int, help="CLE embedding dimension.") +parser.add_argument("--epochs", default=3, type=int, help="Number of epochs.") +parser.add_argument("--rnn", default="LSTM", choices=["LSTM", "GRU"], help="RNN layer type.") +parser.add_argument("--rnn_dim", default=64, type=int, help="RNN layer dimension.") +parser.add_argument("--seed", default=42, type=int, help="Random seed.") +parser.add_argument("--threads", default=1, type=int, help="Maximum number of threads to use.") +parser.add_argument("--we_dim", default=64, type=int, help="Word embedding dimension.") +parser.add_argument("--word_masking", default=0.05, type=float, help="Mask words with the given probability.") + + +class TrainableModule(torch.nn.Module): + """A simple Keras-like module for training with raw PyTorch. + + The module provides fit/evaluate/predict methods, computes loss and metrics, + and generates both TensorBoard and console logs. By default, it uses GPU + if available, and CPU otherwise. Additionally, it offers a Keras-like + initialization of the weights. + + The current implementation supports models with either single input or + a tuple of inputs; however, only one output is currently supported. + """ + from torch.utils.tensorboard import SummaryWriter as _SummaryWriter + from time import time as _time + from tqdm import tqdm as _tqdm + + def configure(self, *, optimizer=None, schedule=None, loss=None, metrics={}, logdir=None, device="auto"): + """Configure the module process. + + - `optimizer` is the optimizer to use for training; + - `schedule` is an optional learning rate scheduler used after every batch; + - `loss` is the loss function to minimize; + - `metrics` is a dictionary of additional metrics to compute; + - `logdir` is an optional directory where TensorBoard logs should be written; + - `device` is the device to use; when "auto", `cuda` is used when available, `cpu` otherwise. + """ + self.optimizer = optimizer + self.schedule = schedule + self.loss, self.loss_metric = loss, torchmetrics.MeanMetric() + self.metrics = torchmetrics.MetricCollection(metrics) + self.logdir, self._writers = logdir, {} + self.device = torch.device(("cuda" if torch.cuda.is_available() else "cpu") if device == "auto" else device) + self.to(self.device) + + def load_weights(self, path, device="auto"): + """Load the model weights from the given path.""" + self.device = torch.device(("cuda" if torch.cuda.is_available() else "cpu") if device == "auto" else device) + self.load_state_dict(torch.load(path, map_location=self.device)) + + def save_weights(self, path): + """Save the model weights to the given path.""" + state_dict = self.state_dict() + torch.save(state_dict, path) + + def fit(self, dataloader, epochs, dev=None, callbacks=[], verbose=1): + """Train the model on the given dataset. + + - `dataloader` is the training dataset, each element a pair of inputs and an output; + the inputs can be either a single tensor or a tuple of tensors; + - `dev` is an optional development dataset; + - `epochs` is the number of epochs to train; + - `callbacks` is a list of callbacks to call after each epoch with + arguments `self`, `epoch`, and `logs`; + - `verbose` controls the verbosity: 0 for silent, 1 for persistent progress bar, + 2 for a progress bar only when writing to a console. + """ + for epoch in range(epochs): + self.train() + self.loss_metric.reset() + self.metrics.reset() + start = self._time() + epoch_message = f"Epoch={epoch+1}/{epochs}" + data_and_progress = self._tqdm( + dataloader, epoch_message, unit="batch", leave=False, disable=None if verbose == 2 else not verbose) + for xs, y in data_and_progress: + xs, y = tuple(x.to(self.device) for x in (xs if isinstance(xs, tuple) else (xs,))), y.to(self.device) + logs = self.train_step(xs, y) + message = [epoch_message] + [f"{k}={v:.{0 None: + # Set the random seed and the number of threads. + np.random.seed(args.seed) + torch.manual_seed(args.seed) + if args.threads: + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) + + # Create logdir name + args.logdir = os.path.join("logs", "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join(("{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) for k, v in sorted(vars(args).items()))) + )) + + # Load the data. Using analyses is only optional. + morpho = MorphoDataset("czech_pdt") + analyses = MorphoAnalyzer("czech_pdt_analyses") + + # TODO: Create the model and train it + model = Model(args, morpho.train) + + def prepare_tagging_data(example): + form_ids = torch.tensor(data=morpho.train.forms.word_vocab.indices(example["forms"]), dtype=torch.int64) + tag_ids = torch.tensor(data=morpho.train.tags.word_vocab.indices(example["tags"]), dtype=torch.int64) + return form_ids, example["forms"], tag_ids + train = morpho.train.transform(prepare_tagging_data) + dev = morpho.dev.transform(prepare_tagging_data) + + # Create a function that prepares test data + def prepare_testing_data(example): + form_ids = torch.tensor(data=morpho.test.forms.word_vocab.indices(example["forms"]), dtype=torch.int64) + return form_ids, example["forms"] + test = morpho.test.transform(prepare_testing_data) + + def prepare_batch(data): + form_ids, forms, tag_ids = zip(*data) + form_ids = torch.nn.utils.rnn.pad_sequence(sequences=form_ids, batch_first=True) + unique_forms, forms_indices = morpho.train.cle_batch(forms) + tag_ids = torch.nn.utils.rnn.pad_sequence(sequences=tag_ids, batch_first=True) + return (form_ids, unique_forms, forms_indices), tag_ids + train = torch.utils.data.DataLoader(train, batch_size=args.batch_size, collate_fn=prepare_batch, shuffle=True) + dev = torch.utils.data.DataLoader(dev, batch_size=args.batch_size, collate_fn=prepare_batch) + + # Create a function that creates test batches + def prepare_test_batch(data): + form_ids, forms, tag_ids = zip(*data) + form_ids = torch.nn.utils.rnn.pad_sequence(sequences=form_ids, batch_first=True) + unique_forms, forms_indices = morpho.train.cle_batch(forms) + return (form_ids, unique_forms, forms_indices) + test = torch.utils.data.DataLoader(test, batch_size=args.batch_size, collate_fn=prepare_test_batch) + + model.configure( + optimizer=torch.optim.Adam(model.parameters()), + loss=torch.nn.CrossEntropyLoss(ignore_index=morpho.PAD), + metrics={"accuracy": torchmetrics.Accuracy(task="multiclass", num_classes=len(morpho.train.tags.word_vocab), ignore_index=morpho.PAD)}, + logdir=args.logdir, + ) + + model.fit(train, dev=dev, epochs=args.epochs) + + # Save model + + model.save_weights(os.path.join(args.logdir, "model.pth")) + + # Generate test set annotations, but in `args.logdir` to allow parallel execution. + os.makedirs(args.logdir, exist_ok=True) + with open(os.path.join(args.logdir, "tagger_competition.txt"), "w", encoding="utf-8") as predictions_file: + # TODO: Predict the tags on the test set; update the following code + # if you use other output structure than in tagger_we. + predictions = model.predict(test) + + for predicted_tags, forms in zip(predictions, morpho.test.forms.strings): + for predicted_tag in np.argmax(predicted_tags[:, :len(forms)], axis=0): + print(morpho.train.tags.word_vocab.string(predicted_tag), file=predictions_file) + print(file=predictions_file) + + +if __name__ == "__main__": + args = parser.parse_args([] if "__file__" not in globals() else None) + main(args) diff --git a/labs/08/tagger_we.ps1 b/labs/08/tagger_we.ps1 new file mode 100644 index 0000000..de55694 --- /dev/null +++ b/labs/08/tagger_we.ps1 @@ -0,0 +1,10 @@ +"πŸ‘‰ TEST 1" +" Expected: Epoch=1/1 3.1s loss=2.3541 accuracy=0.3138 dev_loss=2.0320 dev_accuracy=0.3611" +# Actual: Epoch=1/1 3.6s loss=2.3641 accuracy=0.2857 dev_loss=2.0174 dev_accuracy=0.3669 +python ./labs/08/tagger_we.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16 + +"πŸ‘‰ TEST 2" +" Expected: Epoch=1/1 3.2s loss=2.1970 accuracy=0.4233 dev_loss=1.5569 dev_accuracy=0.5121" +# Actual: Epoch=1/1 3.5s loss=2.2395 accuracy=0.4611 dev_loss=1.5898 dev_accuracy=0.5481 +python ./labs/08/tagger_we.py --epochs=1 --max_sentences=1000 --rnn=GRU --rnn_dim=16 + diff --git a/labs/08/tagger_we.py b/labs/08/tagger_we.py new file mode 100644 index 0000000..f1b48ae --- /dev/null +++ b/labs/08/tagger_we.py @@ -0,0 +1,468 @@ +#!/usr/bin/env python3 +import argparse +import datetime +import os +import re + +import numpy as np +import torch +import torchmetrics + +from morpho_dataset import MorphoDataset + +parser = argparse.ArgumentParser() +# These arguments will be set appropriately by ReCodEx, even if you change them. +parser.add_argument("--batch_size", default=10, type=int, help="Batch size.") +parser.add_argument("--epochs", default=5, type=int, help="Number of epochs.") +parser.add_argument( + "--max_sentences", + default=None, + type=int, + help="Maximum number of sentences to load.", +) +parser.add_argument( + "--recodex", default=False, action="store_true", help="Evaluation in ReCodEx." +) +parser.add_argument( + "--rnn", default="LSTM", choices=["LSTM", "GRU"], help="RNN layer type." +) +parser.add_argument("--rnn_dim", default=64, type=int, help="RNN layer dimension.") +parser.add_argument("--seed", default=42, type=int, help="Random seed.") +parser.add_argument( + "--threads", default=1, type=int, help="Maximum number of threads to use." +) +parser.add_argument("--we_dim", default=128, type=int, help="Word embedding dimension.") +# If you add more arguments, ReCodEx will keep them with your default values. + + +class TrainableModule(torch.nn.Module): + """A simple Keras-like module for training with raw PyTorch. + + The module provides fit/evaluate/predict methods, computes loss and metrics, + and generates both TensorBoard and console logs. By default, it uses GPU + if available, and CPU otherwise. Additionally, it offers a Keras-like + initialization of the weights. + + The current implementation supports models with either single input or + a tuple of inputs; however, only one output is currently supported. + """ + + from torch.utils.tensorboard import SummaryWriter as _SummaryWriter + from time import time as _time + from tqdm import tqdm as _tqdm + + def configure( + self, + *, + optimizer=None, + schedule=None, + loss=None, + metrics={}, + logdir=None, + device="auto", + ): + """Configure the module process. + + - `optimizer` is the optimizer to use for training; + - `schedule` is an optional learning rate scheduler used after every batch; + - `loss` is the loss function to minimize; + - `metrics` is a dictionary of additional metrics to compute; + - `logdir` is an optional directory where TensorBoard logs should be written; + - `device` is the device to use; when "auto", `cuda` is used when available, `cpu` otherwise. + """ + self.optimizer = optimizer + self.schedule = schedule + self.loss, self.loss_metric = loss, torchmetrics.MeanMetric() + self.metrics = torchmetrics.MetricCollection(metrics) + self.logdir, self._writers = logdir, {} + self.device = torch.device( + ("cuda" if torch.cuda.is_available() else "cpu") + if device == "auto" + else device + ) + self.to(self.device) + + def load_weights(self, path, device="auto"): + """Load the model weights from the given path.""" + self.device = torch.device( + ("cuda" if torch.cuda.is_available() else "cpu") + if device == "auto" + else device + ) + self.load_state_dict(torch.load(path, map_location=self.device)) + + def save_weights(self, path): + """Save the model weights to the given path.""" + state_dict = self.state_dict() + torch.save(state_dict, path) + + def fit(self, dataloader, epochs, dev=None, callbacks=[], verbose=1): + """Train the model on the given dataset. + + - `dataloader` is the training dataset, each element a pair of inputs and an output; + the inputs can be either a single tensor or a tuple of tensors; + - `dev` is an optional development dataset; + - `epochs` is the number of epochs to train; + - `callbacks` is a list of callbacks to call after each epoch with + arguments `self`, `epoch`, and `logs`; + - `verbose` controls the verbosity: 0 for silent, 1 for persistent progress bar, + 2 for a progress bar only when writing to a console. + """ + for epoch in range(epochs): + self.train() + self.loss_metric.reset() + self.metrics.reset() + start = self._time() + epoch_message = f"Epoch={epoch+1}/{epochs}" + data_and_progress = self._tqdm( + dataloader, + epoch_message, + unit="batch", + leave=False, + disable=None if verbose == 2 else not verbose, + ) + for xs, y in data_and_progress: + xs, y = ( + tuple( + x.to(self.device) + for x in (xs if isinstance(xs, tuple) else (xs,)) + ), + y.to(self.device), + ) + logs = self.train_step(xs, y) + message = [epoch_message] + [ + f"{k}={v:.{0 None: + super().__init__() + + # Create all needed layers. + # DO: Create a `torch.nn.Embedding` layer, embedding the form ids + # from `train.forms.word_vocab` to dimensionality `args.we_dim`. + self._word_embedding = torch.nn.Embedding( + num_embeddings=len(train.forms.word_vocab), embedding_dim=args.we_dim + ) + + # DO: Create an RNN layer, either `torch.nn.LSTM` or `torch.nn.GRU` depending + # on `args.rnn`. The layer should be bidirectional (`bidirectional=True`), summing + # the outputs of forward and backward directions. The layer processes the word + # embeddings generated by the `self._word_embedding` layer and produces output + # of dimensionality `args.rnn_dim`. Finally, pass `batch_first=True` to the constructor. + self._word_rnn = ( + torch.nn.LSTM( + input_size=args.we_dim, + hidden_size=args.rnn_dim, + bidirectional=True, + batch_first=True, + ) + if args.rnn == "LSTM" + else torch.nn.GRU( + input_size=args.we_dim, + hidden_size=args.rnn_dim, + bidirectional=True, + batch_first=True, + ) + ) + + # DO: Create an output linear layer (`torch.nn.Linear`) processing the RNN output, + # producing logits for tag prediction; `train.tags.word_vocab` is the tag vocabulary. + self._output_layer = torch.nn.Linear(args.rnn_dim, len(train.tags.word_vocab)) + + # Initialize the layers using the Keras-inspired initialization. You can try + # removing this line to see how much worse the default PyTorch initialization is. + self.apply(self.keras_init) + + def forward(self, form_ids: torch.Tensor) -> torch.Tensor: + # DO: Start by embedding the `form_ids` using the word embedding layer. + hidden = self._word_embedding(form_ids) + + # DO: Process the embedded forms through the RNN layer. Because the sentences + # have different length, you have to use `torch.nn.utils.rnn.pack_padded_sequence` + # to construct a variable-length `PackedSequence` from the input. You need to compute + # the length of each sentence in the batch (by counting non-`MorphoDataset.PAD` tokens); + # note that these lengths must be on CPU, so you might need to use the `.cpu()` method. + # Finally, also pass `batch_first=True` and `enforce_sorted=False` to the call. + packed = torch.nn.utils.rnn.pack_padded_sequence( + hidden, + form_ids.ne(MorphoDataset.PAD).sum(dim=1).cpu(), + batch_first=True, + enforce_sorted=False, + ) + + # Pass the `PackedSequence` through the RNN. + hidden, _ = self._word_rnn(packed) + + # DO: Unpack the RNN output using the `torch.nn.utils.rnn.pad_packed_sequence` with + # `batch_first=True` argument. Then sum the outputs of forward and backward directions. + stacked, _ = torch.nn.utils.rnn.pad_packed_sequence(hidden, batch_first=True) + + forward_pass, backward_pass = torch.chunk(stacked, 2, dim=-1) + + hidden = forward_pass + backward_pass + + # DO: Pass the RNN output through the output layer. Such an output has a shape + # `[batch_size, sequence_length, num_tags]`, but the loss and the metric expect + # the `num_tags` dimension to be in front (`[batch_size, num_tags, sequence_length]`), + # so you need to reorder the dimension. + hidden = self._output_layer(hidden).permute(0, 2, 1) + + return hidden + + +def main(args: argparse.Namespace) -> dict[str, float]: + # Set the random seed and the number of threads. + np.random.seed(args.seed) + torch.manual_seed(args.seed) + if args.threads: + torch.set_num_threads(args.threads) + torch.set_num_interop_threads(args.threads) + + # Create logdir name + args.logdir = os.path.join( + "logs", + "{}-{}-{}".format( + os.path.basename(globals().get("__file__", "notebook")), + datetime.datetime.now().strftime("%Y-%m-%d_%H%M%S"), + ",".join( + ( + "{}={}".format(re.sub("(.)[^_]*_?", r"\1", k), v) + for k, v in sorted(vars(args).items()) + ) + ), + ), + ) + + # Load the data + morpho = MorphoDataset("czech_cac", max_sentences=args.max_sentences) + + # Create the model and train + model = Model(args, morpho.train) + + def prepare_tagging_data(example): + # DO: Construct a single example, each consisting of the following pair: + # - a PyTorch tensor of integer ids of input forms as input, + # - a PyTorch tensor of integer tag ids as targets. + # To create the ids, use `word_vocab` of `morpho.train.forms` and `morpho.train.tags`. + + form_ids = torch.tensor( + morpho.train.forms.word_vocab.indices(example["forms"]), + dtype=torch.int64, + ) + tag_ids = torch.tensor( + morpho.train.tags.word_vocab.indices(example["tags"]), + dtype=torch.int64, + ) + return form_ids, tag_ids + + train = morpho.train.transform(prepare_tagging_data) + dev = morpho.dev.transform(prepare_tagging_data) + + def prepare_batch(data): + # Construct a single batch, where `data` is a list of examples + # generated by `prepare_tagging_data`. + form_ids, tag_ids = zip(*data) + # DO: Combine `form_ids` into a single tensor, padding shorter + # sequences to length of the longest sequence in the batch with zeros + # using `torch.nn.utils.rnn.pad_sequence` with `batch_first=True` argument. + form_ids = torch.nn.utils.rnn.pad_sequence(form_ids, batch_first=True) + # DO: Process `tag_ids` analogously to `form_ids`. + tag_ids = torch.nn.utils.rnn.pad_sequence(tag_ids, batch_first=True) + return form_ids, tag_ids + + train = torch.utils.data.DataLoader( + train, batch_size=args.batch_size, collate_fn=prepare_batch, shuffle=True + ) + dev = torch.utils.data.DataLoader( + dev, batch_size=args.batch_size, collate_fn=prepare_batch + ) + + model.configure( + # DO: Create the optimizer by creating an instance of + # `torch.optim.Adam`which will train the `model.parameters()`. + optimizer=torch.optim.Adam(model.parameters()), + # DO: Use `torch.nn.CrossEntropyLoss` to instantiate the loss function. + # Pass `ignore_index=morpho.PAD` to the constructor so that the padded + # tags are ignored during the loss computation. Note that the loss + # expects the input to be of shape `[batch_size, num_tags, sequence_length]`. + loss=torch.nn.CrossEntropyLoss( + ignore_index=morpho.PAD, + ), + # DO: Create a `torchmetrics.Accuracy` metric, passing "multiclass" as + # the first argument, `num_classes` set to the number of unique tags, and + # again `ignore_index=morpho.PAD` to ignore the padded tags. + metrics={ + "accuracy": torchmetrics.Accuracy( + "multiclass", + num_classes=len(morpho.train.tags.word_vocab), + ignore_index=morpho.PAD, + ) + }, + logdir=args.logdir, + ) + + logs = model.fit(train, dev=dev, epochs=args.epochs) + + # Return development metrics for ReCodEx to validate. + return { + metric: value for metric, value in logs.items() if metric.startswith("dev_") + } + + +if __name__ == "__main__": + args = parser.parse_args([] if "__file__" not in globals() else None) + main(args) diff --git a/labs/09/.gitignore b/labs/09/.gitignore new file mode 100644 index 0000000..426a1aa --- /dev/null +++ b/labs/09/.gitignore @@ -0,0 +1,2 @@ +/cs_lemma_20k/ +/en_lemma_20k/ diff --git a/labs/09/common_voice_cs.py b/labs/09/common_voice_cs.py new file mode 100644 index 0000000..e90600e --- /dev/null +++ b/labs/09/common_voice_cs.py @@ -0,0 +1,243 @@ +import array +import os +import struct +import sys +from typing import Any, Callable, Sequence, TextIO, TypedDict +import urllib.request + +import numpy as np +import torch +import torchaudio +import torchmetrics + + +# A class for managing mapping between strings and indices. +# It provides: +# - `__len__`: number of strings in the vocabulary +# - `string(index: int) -> str`: string for a given index to the vocabulary +# - `strings(indices: Sequence[int]) -> list[str]`: list of strings for given indices +# - `index(string: str) -> int`: index of a given string in the vocabulary +# - `indices(strings: Sequence[str]) -> list[int]`: list of indices for given strings +class Vocabulary: + def __init__(self, strings: Sequence[str]) -> None: + self._strings = list(strings) + self._string_map = {string: index for index, string in enumerate(self._strings)} + + def __len__(self) -> int: + return len(self._strings) + + def string(self, index: int) -> str: + return self._strings[index] + + def strings(self, indices: Sequence[int]) -> list[str]: + return [self._strings[index] for index in indices] + + def index(self, string: str) -> int: + return self._string_map[string] + + def indices(self, strings: Sequence[str]) -> list[int]: + return [self._string_map[string] for string in strings] + + +class CommonVoiceCs: + MFCC_DIM: int = 13 + + LETTERS: list[str] = [ + " ", "a", "Γ‘", "Γ€", "b", "c", "č", "d", "ď", "e", "Γ©", "Γ¨", "Δ›", + "f", "g", "h", "i", "Γ­", "Γ―", "j", "k", "l", "m", "n", "ň", "o", + "Γ³", "ΓΆ", "p", "q", "r", "Ε™", "s", "Ε‘", "t", "Ε₯", "u", "ΓΊ", "Ε―", + "ΓΌ", "v", "w", "x", "y", "Γ½", "z", "ΕΎ", + ] + + Element = TypedDict("Element", {"mfccs": torch.Tensor, "sentence": str}) + + _URL: str = "https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/" + + class Dataset(torch.utils.data.Dataset): + def __init__(self, path: str, size: int, decode_on_demand: bool) -> None: + self._size = size + + arrays, indices = CommonVoiceCs._load_data(path, size) + if decode_on_demand: + self._data, self._arrays, self._indices = None, arrays, indices + else: + self._data = [self._decode(arrays, indices, i) for i in range(size)] + + def __len__(self) -> int: + return self._size + + def __getitem__(self, index: int) -> "CommonVoiceCs.Element": + if self._data: + return self._data[index] + return self._decode(self._arrays, self._indices, index) + + def transform(self, transform: Callable[["CommonVoiceCs.Element"], Any]) -> "CommonVoiceCs.TransformedDataset": + return CommonVoiceCs.TransformedDataset(self, transform) + + def _decode(self, data: dict, indices: dict, index: int) -> "CommonVoiceCs.Element": + return { + "mfccs": torch.frombuffer( + data["mfccs"], dtype=torch.float32, offset=indices["mfccs"][:-1][index], + count=indices["mfccs"][1:][index] - indices["mfccs"][:-1][index]).view(-1, CommonVoiceCs.MFCC_DIM), + "sentence": data["sentence"][ + indices["sentence"][index]:indices["sentence"][index + 1]].tobytes().decode("utf-8"), + } + + class TransformedDataset(torch.utils.data.Dataset): + def __init__(self, dataset: torch.utils.data.Dataset, transform: Callable[..., Any]) -> None: + self._dataset = dataset + self._transform = transform + + def __len__(self) -> int: + return len(self._dataset) + + def __getitem__(self, index: int) -> Any: + item = self._dataset[index] + return self._transform(*item) if isinstance(item, tuple) else self._transform(item) + + def transform(self, transform: Callable[..., Any]) -> "CommonVoiceCs.TransformedDataset": + return CommonVoiceCs.TransformedDataset(self, transform) + + def __init__(self, decode_on_demand: bool = False) -> None: + for dataset, size in [("train", 9_773), ("dev", 904), ("test", 3_240)]: + path = "common_voice_cs.{}.tfrecord".format(dataset) + if not os.path.exists(path): + print("Downloading file {}...".format(path), file=sys.stderr) + urllib.request.urlretrieve("{}/{}".format(self._URL, path), filename="{}.tmp".format(path)) + os.rename("{}.tmp".format(path), path) + + setattr(self, dataset, self.Dataset(path, size, decode_on_demand)) + + self._letters_vocab = Vocabulary(self.LETTERS) + + train: Dataset + dev: Dataset + test: Dataset + + @property + def letters_vocab(self) -> Vocabulary: + return self._letters_vocab + + # TFRecord loading + @staticmethod + def _load_data(path: str, items: int) -> tuple[dict[str, array.array], dict[str, array.array]]: + def get_value() -> np.int64: + nonlocal data, offset + value = np.int64(data[offset] & 0x7F); start = offset; offset += 1 + while data[offset - 1] & 0x80: + value |= (data[offset] & 0x7F) << (7 * (offset - start)); offset += 1 + return value + + def get_value_of_kind(kind: int) -> np.int64: + nonlocal data, offset + assert data[offset] == kind; offset += 1 + return get_value() + + arrays, indices = {}, {} + with open(path, "rb") as file: + for _ in range(items): + length = file.read(8); assert len(length) == 8 + length, = struct.unpack("> 2, offset).astype(np.float32).tobytes()); offset += length + else: + raise ValueError("Unsupported data tag {}".format(data[offset])) + indices[key].append(len(arrays[key])) + return arrays, indices + + # Methods for generating MFCCs. + def load_audio(self, path: str, target_sample_rate: int | None = None) -> tuple[torch.Tensor, int]: + audio, sample_rate = torchaudio.load(path) + if target_sample_rate is not None and target_sample_rate != sample_rate: + audio = torchaudio.functional.resample(audio, sample_rate, target_sample_rate) + sample_rate = target_sample_rate + return torch.mean(audio, dim=0), sample_rate + + # Note that while the dataset MFCCs were generated using an implementation + # functionally equivalent to the following, different resampling was used, + # so the values are not exactly the same. + def mfcc_extract(self, audio: torch.Tensor, sample_rate: int = 16_000) -> torch.Tensor: + assert sample_rate == 16000, "Only 16k sample rate is supported" + + if not hasattr(self, "_mfcc_fn"): + # Compute a 1024-point STFT with frames of 64 ms and 75% overlap. + # Then warp the linear scale spectrograms into the mel-scale. + # Compute a stabilized log to get log-magnitude mel-scale spectrograms. + # Finally, compute MFCCs from log-mel-spectrograms and take the first + # `CommonVoiceCs.MFCC_DIM=13` of them. + self._mfcc_fn = torchaudio.transforms.MFCC( + sample_rate=16_000, n_mfcc=self.MFCC_DIM, log_mels=True, + melkwargs={"n_fft":1024, "win_length":1024, "hop_length":256, + "f_min": 80., "f_max": 7600., "n_mels": 80, "center": False} + ) + # Compute MFCCs of shape `[sequence_length, CommonVoiceCs.MFCC_DIM=13]`. + mfccs = self._mfcc_fn(audio).permute(1, 0) + return mfccs + + # Torchmetric for computing mean edit distance + class EditDistanceMetric(torchmetrics.MeanMetric): + def update(self, pred: Sequence[Sequence[Any]], true: Sequence[Sequence[Any]]) -> None: + edit_distances = [] + for y_pred, y_true in zip(pred, true): + edit_distances.append(torchaudio.functional.edit_distance(y_pred, y_true) / len(y_true)) + return super().update(edit_distances) + + # Evaluation infrastructure + @staticmethod + def evaluate(gold_dataset: Dataset, predictions: Sequence[str]) -> float: + gold = [example["sentence"] for example in gold_dataset] + + if len(predictions) != len(gold): + raise RuntimeError("The predictions are of different size than gold data: {} vs {}".format( + len(predictions), len(gold))) + + edit_distance = CommonVoiceCs.EditDistanceMetric() + for gold_sentence, prediction in zip(gold, predictions): + edit_distance([prediction], [gold_sentence]) + return 100 * edit_distance.compute() + + @staticmethod + def evaluate_file(gold_dataset: Dataset, predictions_file: TextIO) -> float: + predictions = [] + for line in predictions_file: + predictions.append(line.rstrip("\n")) + return CommonVoiceCs.evaluate(gold_dataset, predictions) + + +if __name__ == "__main__": + import argparse + parser = argparse.ArgumentParser() + parser.add_argument("--evaluate", default=None, type=str, help="Prediction file to evaluate") + parser.add_argument("--dataset", default="dev", type=str, help="Gold dataset to evaluate") + args = parser.parse_args() + + if args.evaluate: + with open(args.evaluate, "r", encoding="utf-8-sig") as predictions_file: + edit_distance = CommonVoiceCs.evaluate_file(getattr(CommonVoiceCs(), args.dataset), predictions_file) + print("CommonVoiceCs edit distance: {:.2f}%".format(edit_distance)) diff --git a/labs/09/projector_export.py b/labs/09/projector_export.py new file mode 100644 index 0000000..babfabd --- /dev/null +++ b/labs/09/projector_export.py @@ -0,0 +1,33 @@ +#!/usr/bin/env python +import argparse +import os + +import numpy as np +import torch +import torch.utils.tensorboard + +if __name__ == "__main__": + # Parse arguments + parser = argparse.ArgumentParser() + parser.add_argument("input_embeddings", type=str, help="Embedding file to use.") + parser.add_argument("--elements", default=20000, type=int, help="Words to export.") + parser.add_argument("--output_dir", default="embeddings", type=str, help="Output directory.") + args = parser.parse_args([] if "__file__" not in globals() else None) + + # Generate the embeddings for the projector + with open(args.input_embeddings, "r") as embedding_file: + _, dim = map(int, embedding_file.readline().split()) + + embeddings = np.zeros([args.elements, dim], np.float32) + words = [] + for i, line in zip(range(args.elements), embedding_file): + word, *embedding = line.split() + words.append(word) + embeddings[i] = list(map(float, embedding)) + + # Save the embeddings + torch.utils.tensorboard.SummaryWriter(args.output_dir).add_embedding( + torch.tensor(embeddings), + metadata=words, + tag="embeddings", + ) diff --git a/lectures/lecture06.md b/lectures/lecture06.md new file mode 100644 index 0000000..3631cee --- /dev/null +++ b/lectures/lecture06.md @@ -0,0 +1,20 @@ +### Lecture: 6. Object Detection +#### Date: Mar 25 +#### Slides: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides/?06 +#### Reading: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides.pdf/npfl138-2324-06.pdf, PDF Slides +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-czech.mp4, CZ Lecture +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-czech.practicals.mp4, CZ Practicals +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-english.mp4, EN Lecture +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-06-english.practicals.mp4, EN Practicals +#### Questions: #lecture_6_questions +#### Lecture assignment: bboxes_utils +#### Lecture assignment: svhn_competition + +- R-CNN [[R-CNN](https://arxiv.org/abs/1311.2524)] +- Fast R-CNN [[Fast R-CNN](https://arxiv.org/abs/1504.08083)] +- Proposing RoIs using Faster R-CNN [[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)] +- Mask R-CNN [[Mask R-CNN](https://arxiv.org/abs/1703.06870)] +- Feature Pyramid Networks [[Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144)] +- Focal Loss, RetinaNet [[Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002)] +- _EfficientDet [[EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070)]_ +- Group Normalization [[Group Normalization](https://arxiv.org/abs/1803.08494)] diff --git a/lectures/lecture07.md b/lectures/lecture07.md new file mode 100644 index 0000000..e37111f --- /dev/null +++ b/lectures/lecture07.md @@ -0,0 +1,5 @@ +### Lecture: 7. Easter Monday +#### Date: Apr 01 +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-07-czech.practicals.mp4, CZ Practicals +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-07-english.practicals.mp4, EN Practicals +#### Lecture assignment: 3d_recognition diff --git a/lectures/lecture08.md b/lectures/lecture08.md new file mode 100644 index 0000000..c5c8752 --- /dev/null +++ b/lectures/lecture08.md @@ -0,0 +1,26 @@ +### Lecture: 8. Recurrent Neural Networks +#### Date: Apr 8 +#### Slides: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides/?08 +#### Reading: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides.pdf/npfl138-2324-08.pdf, PDF Slides +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-08-czech.mp4, CZ Lecture +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-08-english.mp4, EN Lecture +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-08-english.practicals-svhn_competition.mp4, EN SVHN Competition +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-08-english.practicals.mp4, EN Practicals +#### Questions: #lecture_8_questions +#### Lecture assignment: sequence_classification +#### Lecture assignment: tagger_we +#### Lecture assignment: tagger_cle +#### Lecture assignment: tagger_competition + +- Sequence modelling using Recurrent Neural Networks (RNN) [Chapter 10 until Section 10.2.1 (excluding) of DLB] +- The challenge of long-term dependencies [Section 10.7 of DLB] +- Long Short-Term Memory (LSTM) [Section 10.10.1 of DLB, _[Sepp Hochreiter, JΓΌrgen Schmidhuber (1997): Long short-term memory](http://www.bioinf.jku.at/publications/older/2604.pdf), [Felix A. Gers, JΓΌrgen Schmidhuber, Fred Cummins (2000): Learning to Forget: Continual Prediction with LSTM](ftp://ftp.idsia.ch/pub/juergen/FgGates-NC.pdf)_] +- Gated Recurrent Unit (GRU) [Section 10.10.2 of DLB, _[Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation](https://arxiv.org/abs/1406.1078)_] +- Highway Networks [[Training Very Deep Networks](https://arxiv.org/abs/1507.06228)] +- RNN Regularization + - Variational Dropout [[A Theoretically Grounded Application of Dropout in Recurrent Neural Networks](https://arxiv.org/abs/1512.05287)] + - Layer Normalization [[Layer Normalization](https://arxiv.org/abs/1607.06450)] +- Bidirectional RNN [Section 10.3 of DLB] +- Word Embeddings [Section 14.2.4 of DLB] +- Character-level embeddings using Recurrent neural networks [C2W model from [Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation](http://arxiv.org/abs/1508.02096)] +- _Character-level embeddings using Convolutional neural networks [CharCNN from [Character-Aware Neural Language Models](https://arxiv.org/abs/1508.06615)]_ diff --git a/lectures/lecture09.md b/lectures/lecture09.md new file mode 100644 index 0000000..395c6ca --- /dev/null +++ b/lectures/lecture09.md @@ -0,0 +1,21 @@ +### Lecture: 9. Structured Prediction, CTC, Word2Vec +#### Date: Apr 15 +#### Slides: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides/?09 +#### Reading: https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/slides.pdf/npfl138-2324-09.pdf, PDF Slides +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-09-czech.mp4, CZ Lecture +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-09-czech.practicals.mp4, CZ Practicals +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-09-english.mp4, EN Lecture +#### Video: https://lectures.ms.mff.cuni.cz/video/rec/npfl138/2324/npfl138-2324-09-english.practicals.mp4, EN Practicals +#### Questions: #lecture_9_questions +#### Lecture assignment: tensorboard_projector +#### Lecture assignment: tagger_ner +#### Lecture assignment: ctc_loss +#### Lecture assignment: speech_recognition + +- Structured prediction +- Connectionist Temporal Classification (CTC) loss [[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.cs.toronto.edu/~graves/icml_2006.pdf)] +- `Word2vec` word embeddings, notably the CBOW and Skip-gram architectures [[Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/abs/1301.3781)] + - Hierarchical softmax [Section 12.4.3.2 of DLB or [Distributed Representations of Words and Phrases and their Compositionality](https://arxiv.org/abs/1310.4546)] + - Negative sampling [Distributed Representations of Words and Phrases and their Compositionality](https://arxiv.org/abs/1310.4546)] +- _Character-level embeddings using character n-grams [Described simultaneously in several papers as Charagram ([Charagram: Embedding Words and Sentences via Character n-grams](https://arxiv.org/abs/1607.02789)), Subword Information ([Enriching Word Vectors with Subword Information](https://arxiv.org/abs/1607.04606) or SubGram ([SubGram: Extending Skip-Gram Word Representation with Substrings](http://link.springer.com/chapter/10.1007/978-3-319-45510-5_21))]_ +- _ELMO [[Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer: Deep contextualized word representations](https://arxiv.org/abs/1802.05365)]_ diff --git a/slides/06/06.md b/slides/06/06.md new file mode 100644 index 0000000..b795c5d --- /dev/null +++ b/slides/06/06.md @@ -0,0 +1,711 @@ +title: NPFL138, Lecture 6 +class: title, langtech, cc-by-sa + +# Object Detection + +## Milan Straka + +### March 25, 2024 + +--- +section: FastR-CNN +class: middle, center +# Beyond Image Classification + +# Beyond Image Classification + +--- +# Beyond Image Classification + +![w=70%,f=right](../01/object_detection.svgz) + +- Object detection (including location) +
+ +~~~ +![w=70%,f=right](../01/image_segmentation.svgz) + +- Image segmentation +
+ +~~~ +![w=70%,f=right](../01/human_pose_estimation.jpg) + +- Human pose estimation + +--- +# Beyond Image Classification + +![w=100%,v=middle](cv_tasks.jpg) + +--- +# Object Localization + +![w=100%](object_localization.png) + +We can perform object localization by jointly predicting the bounding box +coordinates using regression. + +--- +# R-CNN + +![w=42%,f=right](roi_generation.jpg) + +To be able to recognize and localize _several_ objects, assume we were given +multiple interesting regions of the image, called **regions of interest** (RoI). +For each of them, we decide: +- whether it contains an object; +- the location of the object relative to the RoI. + +~~~ +![w=45%,f=right](rcnn_architecture.svgz) + +In R-CNN, we start with a network pre-trained on ImageNet (VGG-16 is used in the +original paper), and we use it to process _every RoI_, rescaling every one of +them to the size of $224Γ—224$. + +~~~ +For every RoI, two sibling heads are added: +- _classification head_ predicts either _background_ or one of $K$ object types + ($K+1$ in total), +~~~ +- _bounding box regression head_ predicts 4 bounding box parameters relative + to RoI. + +--- +# R-CNN – Bounding Boxes + +A bounding box is parametrized as follows. Let $x_r, y_r, w_r, h_r$ be +center coordinates and width and height of the RoI respectively, and let $x, y, w, h$ be +parameters of the bounding box. We represent the bounding box relative +to the RoI as follows: +$$\begin{aligned} +t_x &= (x - x_r)/w_r, & t_y &= (y - y_r)/h_r, \\ +t_w &= \log (w/w_r), & t_h &= \log (h/h_r). +\end{aligned}$$ + +~~~ +In Fast R-CNN, the $\textrm{smooth}_{L_1}$ loss, or **Huber loss**, is employed for bounding box parameters: + +![w=19.5%,f=right](huber_loss.svgz) + +$$\textrm{smooth}_{L_1}(x) = \begin{cases} + 0.5x^2 & \textrm{if }|x| < 1, \\ + |x| - 0.5 & \textrm{otherwise}. +\end{cases}$$ + +~~~ +The complete loss is then ($Ξ»=1$ is used in the Fast R-CNN paper) +$$L(cΜ‚, tΜ‚, c, t) = L_\textrm{cls}(cΜ‚, c) + Ξ» β‹… [c β‰₯ 1] β‹… + βˆ‘\nolimits_{i ∈ \lbrace \mathrm{x, y, w, h}\rbrace} \textrm{smooth}_{L_1}(tΜ‚_i - t_i).$$ + +--- +# R-CNN – Bounding Boxes + +The described bounding box representation is usually called `CXCYWH`: + +![w=60%,h=center](bbox_representation_cxcywh.webp) + +--- +# R-CNN – Bounding Boxes + +In the datasets, the bounding boxes are usually represented using `XYXY` format: + +![w=60%,h=center](bbox_representation_xyxy.webp) + +--- +# R-CNN – Bounding Boxes + +Finally, you could also come across the `XYWH` format: + +![w=60%,h=center](bbox_representation_xywh.webp) + +--- +# Fast R-CNN Architecture + +The R-CNN is slow, because it needs to process every RoI by the convolutional +backbone. To speed it up, we might want to first process the whole image by the +backbone and only then extract a fixed-size representation for every RoI. + +~~~ + +We achieve that using **RoI pooling**, replacing the last max-pool $14Γ—14 β†’ 7Γ—7$ +VGG layer. + +![w=50%](roi_projection.svgz)![w=50%,mw=50%,h=center](roi_pooling.svgz) + +During RoI pooling, we obtain a $7Γ—7$ RoI representation by first projecting the +RoI to the $14Γ—14$ resolution and then computing each of the $7Γ—7$ values by +**max-pooling** the corresponding β€œpixels” of the convolutional image features. + +--- +# Fast R-CNN + +![w=85%,h=center](fast_rcnn_rumcajs.svgz) + +~~~ +![w=85%,h=center](fast_rcnn_vgg.png) + +--- +# Fast R-CNN and R-CNN Comparison + +![w=100%](fast_rcnn_architecture.svgz) + +--- +# Fast R-CNN Architecture + +![w=100%,v=middle](fast_rcnn.jpg) + +--- +# Fast R-CNN Training and Inference + +## Intersection over Union +For two bounding boxes (or two masks) the _intersection over union_ (_IoU_) +is a ratio of the intersection of the boxes (or masks) and the union +of the boxes (or masks). + +~~~ +## Choosing RoIs for Training +During training, we use 2 images with 64 RoIs each. The RoIs are selected +so that 25% have intersection over union (IoU) overlap of at least 0.5 +with ground-truth boxes; the others are chosen to have the IoU in range $[0.1, 0.5)$, +the so-called _hard examples_. + +~~~ +## Running Inference +During inference, we utilize all RoIs, but a single object can be found in +several of them. To choose the most salient prediction, we perform **non-maximum +suppression** – we ignore predictions which have an overlap with a higher +scoring prediction of the _same class_, where the overlap is computed using IoU +(0.3 threshold is used in the paper). Higher scoring predictions are the ones +with higher probability from the _classification head_. + +--- +# Object Detection Evaluation + +## Average Precision +Evaluation is performed using _Average Precision_ ($\mathit{AP}$ or $\mathit{AP}_{50}$). + +We assume all bounding boxes (or masks) produced by a system have confidence +values which can be used to rank them. Then, for a single class, we take the +boxes (or masks) in the order of the ranks and generate precision/recall curve, +considering a bounding box correct if it has IoU at least 50% with any +ground-truth box. + +![w=60%,mw=50%,h=center](precision_recall_person.svgz)![w=60%,mw=50%,h=center](precision_recall_bottle.svgz) + +--- +# Object Detection Evaluation – Average Precision + +The general idea of AP is to compute the area under the precision/recall curve. + +![w=80%,mw=49%,h=center](precision_recall_curve.png) + +~~~ +![w=80%,mw=49%,h=center](precision_recall_curve_interpolated.jpg) + +We start by interpolating the precision/recall curve, so that it is always +nonincreasing. + +~~~ +![w=80%,mw=49%,h=center,f=right](average_precision.jpg) + +Finally, the average precision for a single class is an average of precision at +recall $0.0, 0.1, 0.2, …, 1.0$. + +~~~ +The final AP is a mean of average precision of all classes. + +--- +class: tablewide +style: table {line-height: 1} +# Object Detection Evaluation – Average Precision + +For the COCO dataset, the AP is computed slightly differently. First, it is an +average over 101 recall points $0.00, 0.01, 0.02, …, 1.00$. + +~~~ +In the original metric, IoU of 50% is enough to consider a prediction valid. +We can generalize the definition to $\mathit{AP}_{t}$, where an object +prediction is considered valid if IoU is at least $t$%. + +~~~ +The main COCO metric, denoted just $\mathit{AP}$, is the mean of +$\mathit{AP}_{50},\mathit{AP}_{55}, \mathit{AP}_{60}, …, \mathit{AP}_{95}$. + +~~~ +| Metric | Description | +|:------:|:------------| +| $\mathit{AP}$ | Mean of $\mathit{AP}_{50},\mathit{AP}_{55}, \mathit{AP}_{60}, \mathit{AP}_{65}, …, \mathit{AP}_{95}$ | +| $\mathit{AP}_{50}$ | AP at IoU 50% | +| $\mathit{AP}_{75}$ | AP at IoU 75% | +~~~ +| $\mathit{AP}_{S}$ | AP for small objects: $\textit{area} < 32^2$ | +| $\mathit{AP}_{M}$ | AP for medium objects: $32^2 < \textit{area} < 96^2$ | +| $\mathit{AP}_{L}$ | AP for large objects: $96^2 < \textit{area}$ | + + +--- +section: FasterR-CNN +# Faster R-CNN + +![w=40%,f=right](fast_rcnn_speed.svgz) + +Even if Fast R-CNN is much faster then R-CNN, it can still be improved, +considering that the most problematic and time consuming part is generating the RoIs. +
+ +~~~ +![w=30%,f=right](faster_rcnn_architecture.png) + +Faster R-CNN extends Fast R-CNN by including a **region proposal +network (RPN)**, whose goal is to generate the RoIs automatically. + +~~~ +The regional proposal network produces the so-called **region proposals**, +which then play the role of RoIs in the rest of the pipeline (i.e., +the Fast R-CNN). + +~~~ +The region proposals are generated similarly to how predictions are generated +in Fast R-CNN. We start with several **anchors** and from each anchor +we generate either a single region proposal or nothing. + +--- +# Faster R-CNN – Anchors + +If we consider the $14Γ—14$ VGG backbone output, each β€œpixel” corresponds +to a region of size $16Γ—16$ in the original image. + +![w=45%,h=center](anchor_net.svgz) + +~~~ +We can therefore interpret each value in the $14Γ—14$ output as a representation +of a part of the image _centered_ in the corresponding image region, and try +predicting a region proposal from **every one** of them. + +~~~ +We call the dense grid of image regions from which we are predicting the +proposals the **anchors**. They have fixed size, and in practice we use +_several_ anchors per position. + +--- +# Faster R-CNN + +For every anchor, we classify it in two classes (background, object) +and also predict the region proposal bounding box relatively to the anchor, +exactly as in (Fast) R-CNN. + +~~~ +![w=58%,f=right](faster_rcnn_rpn.svgz) + +We perform the classification and the bounding box regression by first +running a $3Γ—3$ convolution followed by ReLU on the $14Γ—14$ VGG output, +and then attaching the two heads. +~~~ +Assuming there are $A$ anchors on every position: +- the classification head generates $2A$ outputs, performing $\softmax$ on every + 2 of them; +- the regression head generates $4A$ region proposal coordinates. + +~~~ +The authors consider 3 scales $(128^2, 256^2, 512^2)$ and 3 aspect ratios +$(1:1, 1:2, 2:1)$. + +--- +# Faster R-CNN + +During training, we generate +- positive training examples for every anchor that has the highest IoU with + a ground-truth box; +~~~ +- furthermore, a positive example is also any anchor with + IoU at least 0.7 for any ground-truth box; +~~~ +- negative training examples for every anchor that has IoU at most 0.3 with all + ground-truth boxes; +~~~ +- the positive and negative examples are generated with a ratio _up to_ 1:1 + (less, if there are not enough positive examples; each minibatch consits of + a single image and 256 anchors). + +~~~ +During inference, we consider all predicted non-background regions, run +non-maximum suppression on them using a 0.7 IoU threshold, and then take $N$ +top-scored regions (i.e., the ones with the highest probability from the +classification head) – the paper uses 300 proposals, compared to 2000 in the Fast +R-CNN. + +--- +# Faster R-CNN + +![w=94%,h=center](faster_rcnn_performance.svgz) + +--- +# Two-stage Detectors + +The Faster R-CNN is a so-called **two-stage** detector, where the regions are +refined twice – once in the region proposal network, and then in the final +bounding box regressor. + +~~~ +Several **single-stage** detector architectures have been proposed, mainly +because they are faster and smaller, but until circa 2017 the two-stage +detectors achieved better results. + +--- +section: MaskR-CNN +# Mask R-CNN + +Straightforward extension of Faster R-CNN able to produce image segmentation +(i.e., masks for every object). + +![w=100%,mh=80%,v=middle](../01/image_segmentation.svgz) + +--- +# Mask R-CNN – Architecture + +![w=100%,v=middle](mask_rcnn_architecture.png) + +--- +# Mask R-CNN – RoIAlign + +More precise alignment is required for the RoI in order to predict the masks. +Instead of quantization and max-pooling in RoI pooling, **RoIAlign** uses bilinear +interpolation of features at four regularly sampled locations in each RoI bin +and averages them. + +![w=68%,mw=50%,h=center](roi_pooling.svgz)![w=68%,mw=50%,h=center](mask_rcnn_roialign.svgz) + +~~~ +TorchVision provides `torchvision.ops.roi_align` and `torchvision.ops.roi_pool`. + +--- +# Mask R-CNN + +Masks are predicted in a third branch of the object detector. + +- Higher resolution of the mask is usually needed (at least $14Γ—14$, or even more). +- The masks are predicted for each class separately. +- The masks are predicted using convolutions instead of fully connected layers + (the upscaling convolutions are $2Γ—2$ with stride 2). + +![w=79%,h=center](mask_rcnn_heads.svgz) + +~~~ +Improvements from Nov 2021: all convs (except for the output layer) are followed +by BN, the _class&bbox_ head uses 4 convs instead of 2 MLPs, RPN contains +two convs instead of one. + +--- +# Mask R-CNN + +![w=100%,v=middle](mask_rcnn_ablation.svgz) + +--- +# Mask R-CNN – Human Pose Estimation + +![w=80%,h=center](../01/human_pose_estimation.jpg) + +~~~ +- Testing applicability of Mask R-CNN architecture. + +- Keypoints (e.g., left shoulder, right elbow, …) are detected + as independent one-hot masks of size $56Γ—56$ with $\softmax$ output function. + +~~~ +![w=70%,h=center](mask_rcnn_hpe_performance.svgz) + +--- +section: FPN +# Feature Pyramid Networks + +![w=85%,h=center](fpn_overview.svgz) + +--- +# Feature Pyramid Networks + +![w=62%,h=center](fpn_architecture.svgz) + +--- +# Feature Pyramid Networks + +![w=56%,h=center](fpn_architecture_detailed.svgz) + +--- +# Feature Pyramid Networks + +We employ FPN as a backbone in Faster R-CNN. + +~~~ +Assuming ResNet-like network with $224Γ—224$ input, we denote $C_2, C_3, …, C_5$ +the image features of the last convolutional layer of size $56Γ—56, 28Γ—28, …, +7Γ—7$ (i.e., $C_i$ indicates a downscaling of $2^i$). +~~~ +The FPN representations incorporating the smaller resolution features are +denoted as $P_2, …, P_5$, each consisting of 256 channels; the classification +heads are shared. + +~~~ +In both the RPN and the Fast R-CNN, authors utilize the $P_2, …, P_5$ +representations, considering single-size anchors for every $P_i$ (of size +$32^2, 64^2, 128^2, 256^2$, respectively). However, three aspect ratios +$(1:1, 1:2, 2:1)$ are still used. + +~~~ +![w=100%](fpn_results.svgz) + +--- +section: FocalLoss +# Focal Loss + +![w=46%,f=right](fast_rcnn_rumcajs.svgz) + +For single-stage object detection architectures, _class imbalance_ has been +identified as the main issue preventing obtaining performance comparable to +two-stage detectors. In a single-stage detector, there can be tens of thousands +of anchors, with only dozens of useful training examples. + +~~~ +![w=46%,f=right](focal_loss_graph.svgz) + +Cross-entropy loss is computed as +$$𝓛_\textrm{cross-entropy} = -\log p_\textrm{model}(y | x).$$ + +~~~ +Focal-loss (loss focused on hard examples) is proposed as +$$𝓛_\textrm{focal-loss} = -(1 - p_\textrm{model}(y | x))^Ξ³ β‹… \log p_\textrm{model}(y | x).$$ + +--- +# Focal Loss + +For $Ξ³=0$, focal loss is equal to cross-entropy loss. + +~~~ +Authors reported that $Ξ³=2$ worked best for them for training a single-stage +detector. + +~~~ +![w=100%,mh=75%,v=bottom](focal_loss_cdf.svgz) + +--- +# Focal Loss and Class Imbalance + +Focal loss is connected to another solution to class imbalance – we might +introduce weighting factor $Ξ± ∈ (0, 1)$ for one class and $1 - Ξ±$ for the other +class, arriving at +$$ -Ξ±_y β‹… \log p_\textrm{model}(y | x).$$ + +~~~ +The weight $Ξ±$ might be set to the inverse class frequency or treated as +a hyperparameter. + +~~~ +Even if weighting focuses more on low-frequent class, it does not distinguish +between easy and hard examples, contrary to focal loss. + +~~~ +In practice, the focal loss is usually used together with class weighting: +$$ -Ξ±_y β‹… (1 - p_\textrm{model}(y | x))^Ξ³ β‹… \log p_\textrm{model}(y | x).$$ +For example, authors report that $Ξ±=0.25$ (weight of the rare class) works best with $Ξ³=2$. + +--- +section: RetinaNet +# RetinaNet + +RetinaNet is a single-stage detector, using feature pyramid network +architecture. Built on top of ResNet architecture, the feature pyramid +contains levels $P_3$ through $P_7$, with each $P_l$ having 256 channels +and resolution $2^l$ times lower than the input. On each pyramid level $P_l$, +we consider 9 anchors for every position, with 3 different aspect ratios ($1$, $1:2$, $2:1$) +and with 3 different sizes $(\{2^0, 2^{1/3}, 2^{2/3}\} β‹… 4 β‹… 2^l)^2$. + +~~~ +Note that ResNet provides only $C_3$ to $C_5$ features. $C_6$ is computed +using a $3Γ—3$ convolution with stride 2 on $C_5$, and $C_7$ is obtained +by applying ReLU followed by another $3Γ—3$ stride-2 convolution. The $C_6$ and +$C_7$ are included to improve large object detection. + +--- +# RetinaNet – Architecture + +The classification head and the boundary regression heads are fully +convolutional and do not share parameters (but classification heads are shared +across levels, and so are the boundary regression heads), generating +$\mathit{anchors} β‹… \mathit{classes}$ sigmoids and $\mathit{anchors}$ bounding +boxes per position. + +![w=100%](retinanet.svgz) + +--- +# RetinaNet + +During training, anchors are assigned to ground-truth object boxes if IoU is at +least 0.5; to background if IoU with any ground-truth region is at most 0.4 +(the rest of anchors is ignored during training). +~~~ +The classification head is trained using focal loss with $Ξ³=2$ and $Ξ±=0.25$ (but +according to the paper, all values of $Ξ³$ in $[0.5, 5]$ range work well); the +boundary regression head is trained using $\textrm{smooth}_{L_1}$ loss as in +Fast(er) R-CNN. + +~~~ +During inference, at most 1000 objects with at least 5% probability from all +pyramid levels are considered, and all of them are combined using non-maximum +suppression with a threshold of 0.5. Fixed-size training and testing is used, +with sizes 400, 500, …, 800 pixels. + +~~~ +![w=68%](retinanet_results.svgz)![w=32%](retinanet_graph.svgz) + +--- +# RetinaNet – Ablations + +Ablations use ResNet-50-FPN backbone trained and tested with 600-pixel images. + +![w=80%,h=center](retinanet_ablations.svgz) + +--- +section: EfficientDet +# EfficientDet – Architecture + +EfficientDet builds up on EfficientNet, and it delivered state-of-the-art performance +in Nov 2019 with minimum time and space requirements (however, its performance +has already been surpassed significantly). It is a single-scale detector similar +to RetinaNet, which: + +~~~ +- uses EfficientNet as a backbone; +~~~ +- employs compound scaling; +~~~ +- uses a newly proposed BiFPN, β€œefficient bidirectional cross-scale connections + and weighted feature fusion”. + +~~~ +![w=78%,h=center](efficientdet_architecture.svgz) + +--- +# EfficientDet – BiFPN + +In multi-scale fusion in FPN, information flows only from the pyramid levels +with smaller resolution to the levels with higher resolution. + +![w=80%,h=center](efficientdet_bifpn.svgz) + +~~~ +BiFPN consists of several rounds of bidirectional flows. Each bidirectional flow +employs residual connections and does not include nodes that have only one input +edge with no feature fusion. All operations are $3Γ—3$ separable convolutions with +batch normalization and ReLU, upsampling is done by repeating rows and columns +and downsampling by max-pooling. + +--- +# EfficientDet – Weighted BiFPN + +When combining features with different resolutions, it is common to resize them +to the same resolution and sum them – therefore, all set of features are +considered to be of the same importance. The authors however argue that features +from different resolution contribute to the final result _unequally_ and propose +to combine them with trainable weighs. + +~~~ +- **Softmax-based fusion**: In each BiFPN node, we create a trainable weight + $w_i$ for every input $β‡ΆI_i$ and the final combination (after resize, before + a convolution) is + $$βˆ‘_i \frac{e^{w_i}}{βˆ‘\nolimits_j e^{w_j}} β‡ΆI_i.$$ + +~~~ +- **Fast normalized fusion**: Authors propose a simpler alternative of + weighting: + $$βˆ‘_i \frac{\ReLU(w_i)}{Ξ΅ + βˆ‘\nolimits_j \ReLU(w_j)} β‡ΆI_i.$$ + It uses $Ξ΅=0.0001$ for stability and is up to 30% faster on a GPU. + + +--- +# EfficientDet – Compound Scaling + +Similar to EfficientNet, authors propose to scale various dimensions of the +network, using a single compound coefficient $Ο•$. + +~~~ +After performing a grid search: +- the width of BiFPN is scaled as $W_\mathit{BiFPN} = 64 β‹… 1.35^Ο•,$ +- the depth of BiFPN is scaled as $D_\mathit{BiFPN} = 3 + Ο•,$ +- the box/class predictor has the same width as BiFPN and depth $D_\mathit{class} = 3 + \lfloor Ο•/3 \rfloor,$ +- input image resolution increases according to $R_\mathit{image} = 512 + 128 β‹… Ο•.$ + +![w=45%,h=center](efficientdet_scaling.svgz) + +--- +# EfficientDet – Results + +![w=50%](efficientdet_flops.svgz)![w=50%](efficientdet_size.svgz) + +--- +# EfficientDet – Results + +![w=83%,h=center](efficientdet_results.svgz) + +--- +# EfficientDet – Inference Latencies + +![w=100%](efficientdet_latency.svgz) + +--- +# EfficientDet – Ablations + +Given that EfficientDet employs both a powerful backbone and new BiFPN, authors +quantify the improvement of the individual components. + +![w=49%,h=center](efficientdet_ablations_backbone.svgz) + +~~~ +The comparison with previously used cross-scale fusion architectures is also +provided: + +![w=49%,h=center](efficientdet_ablations_fpn.svgz) + +--- +class: wide +# EfficientDet-D0 Example + +![w=98%,h=center](efficientdet_example.jpg) + +--- +section: GroupNorm +# Normalization + +## Batch Normalization + +Neuron value is normalized across the minibatch, and in case of CNN also across +all positions. + +~~~ +## Layer Normalization + +Neuron value is normalized across the layer. + +~~~ +![w=100%](normalizations.svgz) + +--- +# Group Normalization + +Group Normalization is analogous to Layer normalization, but the channels are +normalized in groups (by default, $G=32$). + +![w=40%,h=center](normalizations.svgz) + +~~~ +![w=40%,h=center](group_norm.svgz) + +--- +# Group Normalization + +![w=78%,h=center](group_norm_vs_batch_norm.svgz) + +--- +# Group Normalization + +![w=65%,h=center](group_norm_coco.svgz) diff --git a/slides/06/anchor_net.svgz b/slides/06/anchor_net.svgz new file mode 100644 index 0000000..a78b80f Binary files /dev/null and b/slides/06/anchor_net.svgz differ diff --git a/slides/06/anchor_net.svgz.ref b/slides/06/anchor_net.svgz.ref new file mode 100644 index 0000000..8473ea0 --- /dev/null +++ b/slides/06/anchor_net.svgz.ref @@ -0,0 +1 @@ +Adapted from slide 65 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/average_precision.jpg b/slides/06/average_precision.jpg new file mode 100644 index 0000000..aa92c3a Binary files /dev/null and b/slides/06/average_precision.jpg differ diff --git a/slides/06/average_precision.jpg.ref b/slides/06/average_precision.jpg.ref new file mode 100644 index 0000000..0bdfae7 --- /dev/null +++ b/slides/06/average_precision.jpg.ref @@ -0,0 +1 @@ +https://miro.medium.com/max/1400/1*naz02wO-XMywlwAdFzF-GA.jpeg diff --git a/slides/06/bbox_representation_cxcywh.webp b/slides/06/bbox_representation_cxcywh.webp new file mode 100644 index 0000000..745ad04 Binary files /dev/null and b/slides/06/bbox_representation_cxcywh.webp differ diff --git a/slides/06/bbox_representation_cxcywh.webp.ref b/slides/06/bbox_representation_cxcywh.webp.ref new file mode 100644 index 0000000..91b33ac --- /dev/null +++ b/slides/06/bbox_representation_cxcywh.webp.ref @@ -0,0 +1 @@ +https://miro.medium.com/1*Z80D7vwD-3UwP16asY-k6A.jpeg diff --git a/slides/06/bbox_representation_xywh.webp b/slides/06/bbox_representation_xywh.webp new file mode 100644 index 0000000..f82925e Binary files /dev/null and b/slides/06/bbox_representation_xywh.webp differ diff --git a/slides/06/bbox_representation_xywh.webp.ref b/slides/06/bbox_representation_xywh.webp.ref new file mode 100644 index 0000000..0e2a026 --- /dev/null +++ b/slides/06/bbox_representation_xywh.webp.ref @@ -0,0 +1 @@ +https://miro.medium.com/1*JLeFS2KIOzSTk6lUp1Ou2w.jpeg diff --git a/slides/06/bbox_representation_xyxy.webp b/slides/06/bbox_representation_xyxy.webp new file mode 100644 index 0000000..2f7d93b Binary files /dev/null and b/slides/06/bbox_representation_xyxy.webp differ diff --git a/slides/06/bbox_representation_xyxy.webp.ref b/slides/06/bbox_representation_xyxy.webp.ref new file mode 100644 index 0000000..7399ff7 --- /dev/null +++ b/slides/06/bbox_representation_xyxy.webp.ref @@ -0,0 +1 @@ +https://miro.medium.com/1*oZcZhzOWKb3kvBHPOHYfow.jpeg diff --git a/slides/06/cv_tasks.jpg b/slides/06/cv_tasks.jpg new file mode 100644 index 0000000..de4459b Binary files /dev/null and b/slides/06/cv_tasks.jpg differ diff --git a/slides/06/cv_tasks.jpg.ref b/slides/06/cv_tasks.jpg.ref new file mode 100644 index 0000000..1f5753a --- /dev/null +++ b/slides/06/cv_tasks.jpg.ref @@ -0,0 +1 @@ +https://www.implantology.or.kr/articles/xml/RvNO/ diff --git a/slides/06/efficientdet_ablations_backbone.svgz b/slides/06/efficientdet_ablations_backbone.svgz new file mode 100644 index 0000000..a73b0d0 Binary files /dev/null and b/slides/06/efficientdet_ablations_backbone.svgz differ diff --git a/slides/06/efficientdet_ablations_backbone.svgz.ref b/slides/06/efficientdet_ablations_backbone.svgz.ref new file mode 100644 index 0000000..8ea6795 --- /dev/null +++ b/slides/06/efficientdet_ablations_backbone.svgz.ref @@ -0,0 +1 @@ +Table 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_ablations_fpn.svgz b/slides/06/efficientdet_ablations_fpn.svgz new file mode 100644 index 0000000..ac3affa Binary files /dev/null and b/slides/06/efficientdet_ablations_fpn.svgz differ diff --git a/slides/06/efficientdet_ablations_fpn.svgz.ref b/slides/06/efficientdet_ablations_fpn.svgz.ref new file mode 100644 index 0000000..dd61bd6 --- /dev/null +++ b/slides/06/efficientdet_ablations_fpn.svgz.ref @@ -0,0 +1 @@ +Table 5 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_architecture.svgz b/slides/06/efficientdet_architecture.svgz new file mode 100644 index 0000000..dd376f1 Binary files /dev/null and b/slides/06/efficientdet_architecture.svgz differ diff --git a/slides/06/efficientdet_architecture.svgz.ref b/slides/06/efficientdet_architecture.svgz.ref new file mode 100644 index 0000000..66db1af --- /dev/null +++ b/slides/06/efficientdet_architecture.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_bifpn.svgz b/slides/06/efficientdet_bifpn.svgz new file mode 100644 index 0000000..bc694d3 Binary files /dev/null and b/slides/06/efficientdet_bifpn.svgz differ diff --git a/slides/06/efficientdet_bifpn.svgz.ref b/slides/06/efficientdet_bifpn.svgz.ref new file mode 100644 index 0000000..86130e9 --- /dev/null +++ b/slides/06/efficientdet_bifpn.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_example.jpg b/slides/06/efficientdet_example.jpg new file mode 100644 index 0000000..1f1aa1b Binary files /dev/null and b/slides/06/efficientdet_example.jpg differ diff --git a/slides/06/efficientdet_example.jpg.ref b/slides/06/efficientdet_example.jpg.ref new file mode 100644 index 0000000..2e9aaab --- /dev/null +++ b/slides/06/efficientdet_example.jpg.ref @@ -0,0 +1 @@ +https://github.com/google/automl/blob/master/efficientdet/g3doc/street.jpg diff --git a/slides/06/efficientdet_flops.svgz b/slides/06/efficientdet_flops.svgz new file mode 100644 index 0000000..24d9e8c Binary files /dev/null and b/slides/06/efficientdet_flops.svgz differ diff --git a/slides/06/efficientdet_flops.svgz.ref b/slides/06/efficientdet_flops.svgz.ref new file mode 100644 index 0000000..186b61d --- /dev/null +++ b/slides/06/efficientdet_flops.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_latency.svgz b/slides/06/efficientdet_latency.svgz new file mode 100644 index 0000000..0a5dd99 Binary files /dev/null and b/slides/06/efficientdet_latency.svgz differ diff --git a/slides/06/efficientdet_latency.svgz.ref b/slides/06/efficientdet_latency.svgz.ref new file mode 100644 index 0000000..bb23a56 --- /dev/null +++ b/slides/06/efficientdet_latency.svgz.ref @@ -0,0 +1 @@ +Figure 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_results.svgz b/slides/06/efficientdet_results.svgz new file mode 100644 index 0000000..b2e4058 Binary files /dev/null and b/slides/06/efficientdet_results.svgz differ diff --git a/slides/06/efficientdet_results.svgz.ref b/slides/06/efficientdet_results.svgz.ref new file mode 100644 index 0000000..c4f6073 --- /dev/null +++ b/slides/06/efficientdet_results.svgz.ref @@ -0,0 +1 @@ +Table 2 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_scaling.svgz b/slides/06/efficientdet_scaling.svgz new file mode 100644 index 0000000..675dbb8 Binary files /dev/null and b/slides/06/efficientdet_scaling.svgz differ diff --git a/slides/06/efficientdet_scaling.svgz.ref b/slides/06/efficientdet_scaling.svgz.ref new file mode 100644 index 0000000..5f14bba --- /dev/null +++ b/slides/06/efficientdet_scaling.svgz.ref @@ -0,0 +1 @@ +Table 1 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/efficientdet_size.svgz b/slides/06/efficientdet_size.svgz new file mode 100644 index 0000000..f42947b Binary files /dev/null and b/slides/06/efficientdet_size.svgz differ diff --git a/slides/06/efficientdet_size.svgz.ref b/slides/06/efficientdet_size.svgz.ref new file mode 100644 index 0000000..bb23a56 --- /dev/null +++ b/slides/06/efficientdet_size.svgz.ref @@ -0,0 +1 @@ +Figure 4 of "EfficientDet: Scalable and Efficient Object Detection", https://arxiv.org/abs/1911.09070 diff --git a/slides/06/fast_rcnn.jpg b/slides/06/fast_rcnn.jpg new file mode 100644 index 0000000..1803bb5 Binary files /dev/null and b/slides/06/fast_rcnn.jpg differ diff --git a/slides/06/fast_rcnn.jpg.ref b/slides/06/fast_rcnn.jpg.ref new file mode 100644 index 0000000..fbecdb1 --- /dev/null +++ b/slides/06/fast_rcnn.jpg.ref @@ -0,0 +1 @@ +Figure 1 of "Fast R-CNN", https://arxiv.org/abs/1504.08083 diff --git a/slides/06/fast_rcnn_architecture.svgz b/slides/06/fast_rcnn_architecture.svgz new file mode 100644 index 0000000..b7bda19 Binary files /dev/null and b/slides/06/fast_rcnn_architecture.svgz differ diff --git a/slides/06/fast_rcnn_architecture.svgz.ref b/slides/06/fast_rcnn_architecture.svgz.ref new file mode 100644 index 0000000..6efa2ff --- /dev/null +++ b/slides/06/fast_rcnn_architecture.svgz.ref @@ -0,0 +1 @@ +Slide 61 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/fast_rcnn_rumcajs.svgz b/slides/06/fast_rcnn_rumcajs.svgz new file mode 100644 index 0000000..c774a93 Binary files /dev/null and b/slides/06/fast_rcnn_rumcajs.svgz differ diff --git a/slides/06/fast_rcnn_rumcajs.svgz.ref b/slides/06/fast_rcnn_rumcajs.svgz.ref new file mode 100644 index 0000000..3ebdb63 --- /dev/null +++ b/slides/06/fast_rcnn_rumcajs.svgz.ref @@ -0,0 +1 @@ +https://commons.wikimedia.org/wiki/File:TiΕ‘nov,_HajΓ‘nky,_garΓ‘ΕΎovΓ‘_ozdoba_(6597).jpg diff --git a/slides/06/fast_rcnn_speed.svgz b/slides/06/fast_rcnn_speed.svgz new file mode 100644 index 0000000..9f24720 Binary files /dev/null and b/slides/06/fast_rcnn_speed.svgz differ diff --git a/slides/06/fast_rcnn_speed.svgz.ref b/slides/06/fast_rcnn_speed.svgz.ref new file mode 100644 index 0000000..436c3bf --- /dev/null +++ b/slides/06/fast_rcnn_speed.svgz.ref @@ -0,0 +1 @@ +Slide 76 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/fast_rcnn_vgg.png b/slides/06/fast_rcnn_vgg.png new file mode 100644 index 0000000..07cfbf0 Binary files /dev/null and b/slides/06/fast_rcnn_vgg.png differ diff --git a/slides/06/fast_rcnn_vgg.png.ref b/slides/06/fast_rcnn_vgg.png.ref new file mode 100644 index 0000000..62ac59b --- /dev/null +++ b/slides/06/fast_rcnn_vgg.png.ref @@ -0,0 +1 @@ +https://en.wikipedia.org/wiki/File:VGG_neural_network.png diff --git a/slides/06/faster_rcnn_architecture.png b/slides/06/faster_rcnn_architecture.png new file mode 100644 index 0000000..8464540 Binary files /dev/null and b/slides/06/faster_rcnn_architecture.png differ diff --git a/slides/06/faster_rcnn_architecture.png.ref b/slides/06/faster_rcnn_architecture.png.ref new file mode 100644 index 0000000..657ebdd --- /dev/null +++ b/slides/06/faster_rcnn_architecture.png.ref @@ -0,0 +1 @@ +Figure 2 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497 diff --git a/slides/06/faster_rcnn_performance.svgz b/slides/06/faster_rcnn_performance.svgz new file mode 100644 index 0000000..f2ccc58 Binary files /dev/null and b/slides/06/faster_rcnn_performance.svgz differ diff --git a/slides/06/faster_rcnn_performance.svgz.ref b/slides/06/faster_rcnn_performance.svgz.ref new file mode 100644 index 0000000..8796742 --- /dev/null +++ b/slides/06/faster_rcnn_performance.svgz.ref @@ -0,0 +1 @@ +Tables 3 and 4 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497 diff --git a/slides/06/faster_rcnn_rpn.svgz b/slides/06/faster_rcnn_rpn.svgz new file mode 100644 index 0000000..b493b07 Binary files /dev/null and b/slides/06/faster_rcnn_rpn.svgz differ diff --git a/slides/06/faster_rcnn_rpn.svgz.ref b/slides/06/faster_rcnn_rpn.svgz.ref new file mode 100644 index 0000000..1fac88c --- /dev/null +++ b/slides/06/faster_rcnn_rpn.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", https://arxiv.org/abs/1506.01497 diff --git a/slides/06/focal_loss_cdf.svgz b/slides/06/focal_loss_cdf.svgz new file mode 100644 index 0000000..403d6d5 Binary files /dev/null and b/slides/06/focal_loss_cdf.svgz differ diff --git a/slides/06/focal_loss_cdf.svgz.ref b/slides/06/focal_loss_cdf.svgz.ref new file mode 100644 index 0000000..0dd7c12 --- /dev/null +++ b/slides/06/focal_loss_cdf.svgz.ref @@ -0,0 +1 @@ +Figure 4 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/focal_loss_graph.svgz b/slides/06/focal_loss_graph.svgz new file mode 100644 index 0000000..44ebdf2 Binary files /dev/null and b/slides/06/focal_loss_graph.svgz differ diff --git a/slides/06/focal_loss_graph.svgz.ref b/slides/06/focal_loss_graph.svgz.ref new file mode 100644 index 0000000..ccc201a --- /dev/null +++ b/slides/06/focal_loss_graph.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/fpn_architecture.svgz b/slides/06/fpn_architecture.svgz new file mode 100644 index 0000000..af04b27 Binary files /dev/null and b/slides/06/fpn_architecture.svgz differ diff --git a/slides/06/fpn_architecture.svgz.ref b/slides/06/fpn_architecture.svgz.ref new file mode 100644 index 0000000..96d788c --- /dev/null +++ b/slides/06/fpn_architecture.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144 diff --git a/slides/06/fpn_architecture_detailed.svgz b/slides/06/fpn_architecture_detailed.svgz new file mode 100644 index 0000000..ff42dd0 Binary files /dev/null and b/slides/06/fpn_architecture_detailed.svgz differ diff --git a/slides/06/fpn_architecture_detailed.svgz.ref b/slides/06/fpn_architecture_detailed.svgz.ref new file mode 100644 index 0000000..bfb0bc8 --- /dev/null +++ b/slides/06/fpn_architecture_detailed.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144 diff --git a/slides/06/fpn_overview.svgz b/slides/06/fpn_overview.svgz new file mode 100644 index 0000000..c6c1574 Binary files /dev/null and b/slides/06/fpn_overview.svgz differ diff --git a/slides/06/fpn_overview.svgz.ref b/slides/06/fpn_overview.svgz.ref new file mode 100644 index 0000000..c00542b --- /dev/null +++ b/slides/06/fpn_overview.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144 diff --git a/slides/06/fpn_results.svgz b/slides/06/fpn_results.svgz new file mode 100644 index 0000000..02db310 Binary files /dev/null and b/slides/06/fpn_results.svgz differ diff --git a/slides/06/fpn_results.svgz.ref b/slides/06/fpn_results.svgz.ref new file mode 100644 index 0000000..8ced9a5 --- /dev/null +++ b/slides/06/fpn_results.svgz.ref @@ -0,0 +1 @@ +Table 4 of "Feature Pyramid Networks for Object Detection", https://arxiv.org/abs/1612.03144 diff --git a/slides/06/group_norm.svgz b/slides/06/group_norm.svgz new file mode 100644 index 0000000..0be782b Binary files /dev/null and b/slides/06/group_norm.svgz differ diff --git a/slides/06/group_norm.svgz.ref b/slides/06/group_norm.svgz.ref new file mode 100644 index 0000000..6e47f02 --- /dev/null +++ b/slides/06/group_norm.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Group Normalization", https://arxiv.org/abs/1803.08494 diff --git a/slides/06/group_norm_coco.svgz b/slides/06/group_norm_coco.svgz new file mode 100644 index 0000000..fe964af Binary files /dev/null and b/slides/06/group_norm_coco.svgz differ diff --git a/slides/06/group_norm_coco.svgz.ref b/slides/06/group_norm_coco.svgz.ref new file mode 100644 index 0000000..86ea266 --- /dev/null +++ b/slides/06/group_norm_coco.svgz.ref @@ -0,0 +1 @@ +Tables 4 and 5 of "Group Normalization", https://arxiv.org/abs/1803.08494 diff --git a/slides/06/group_norm_vs_batch_norm.svgz b/slides/06/group_norm_vs_batch_norm.svgz new file mode 100644 index 0000000..2c017ac Binary files /dev/null and b/slides/06/group_norm_vs_batch_norm.svgz differ diff --git a/slides/06/group_norm_vs_batch_norm.svgz.ref b/slides/06/group_norm_vs_batch_norm.svgz.ref new file mode 100644 index 0000000..e6c9431 --- /dev/null +++ b/slides/06/group_norm_vs_batch_norm.svgz.ref @@ -0,0 +1 @@ +Figures 4 and 5 of "Group Normalization", https://arxiv.org/abs/1803.08494 diff --git a/slides/06/huber_loss.py b/slides/06/huber_loss.py new file mode 100644 index 0000000..f6f93d2 --- /dev/null +++ b/slides/06/huber_loss.py @@ -0,0 +1,22 @@ +#!/usr/bin/env python3 +import os + +import matplotlib +import matplotlib.pyplot as plt +import numpy as np + +matplotlib.rcParams["mathtext.fontset"] = "cm" + +xs = np.linspace(-3, 3, 51) +l2 = xs * xs / 2 +huber = np.where(np.abs(xs) <= 1, xs * xs / 2, np.abs(xs) - 0.5) +d_huber = np.where(np.abs(xs) <= 1, xs, np.sign(xs)) + +plt.figure(figsize=(5, 3.5)) +plt.plot(xs, l2, label="L2 loss $\\frac{1}{2} x^2$") +plt.plot(xs, huber, label="Huber loss") +plt.plot(xs, d_huber, label="Huber loss derivative") +plt.gca().set_aspect(1) +plt.grid(True) +plt.legend(loc="upper center") +plt.savefig("huber_loss.svg", bbox_inches="tight", transparent=True) diff --git a/slides/06/huber_loss.svgz b/slides/06/huber_loss.svgz new file mode 100644 index 0000000..a3362fa Binary files /dev/null and b/slides/06/huber_loss.svgz differ diff --git a/slides/06/huber_loss.svgz.ref b/slides/06/huber_loss.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/06/mask_rcnn_ablation.svgz b/slides/06/mask_rcnn_ablation.svgz new file mode 100644 index 0000000..1b6b8e2 Binary files /dev/null and b/slides/06/mask_rcnn_ablation.svgz differ diff --git a/slides/06/mask_rcnn_ablation.svgz.ref b/slides/06/mask_rcnn_ablation.svgz.ref new file mode 100644 index 0000000..8877b9d --- /dev/null +++ b/slides/06/mask_rcnn_ablation.svgz.ref @@ -0,0 +1 @@ +Table 2 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/mask_rcnn_architecture.png b/slides/06/mask_rcnn_architecture.png new file mode 100644 index 0000000..5b9e6ed Binary files /dev/null and b/slides/06/mask_rcnn_architecture.png differ diff --git a/slides/06/mask_rcnn_architecture.png.ref b/slides/06/mask_rcnn_architecture.png.ref new file mode 100644 index 0000000..2d5bd13 --- /dev/null +++ b/slides/06/mask_rcnn_architecture.png.ref @@ -0,0 +1 @@ +Figure 1 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/mask_rcnn_heads.svgz b/slides/06/mask_rcnn_heads.svgz new file mode 100644 index 0000000..f5c90b1 Binary files /dev/null and b/slides/06/mask_rcnn_heads.svgz differ diff --git a/slides/06/mask_rcnn_heads.svgz.ref b/slides/06/mask_rcnn_heads.svgz.ref new file mode 100644 index 0000000..5e303ff --- /dev/null +++ b/slides/06/mask_rcnn_heads.svgz.ref @@ -0,0 +1 @@ +Figure 4 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/mask_rcnn_hpe_performance.svgz b/slides/06/mask_rcnn_hpe_performance.svgz new file mode 100644 index 0000000..b79f401 Binary files /dev/null and b/slides/06/mask_rcnn_hpe_performance.svgz differ diff --git a/slides/06/mask_rcnn_hpe_performance.svgz.ref b/slides/06/mask_rcnn_hpe_performance.svgz.ref new file mode 100644 index 0000000..19c0665 --- /dev/null +++ b/slides/06/mask_rcnn_hpe_performance.svgz.ref @@ -0,0 +1 @@ +Table 4 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/mask_rcnn_roialign.svgz b/slides/06/mask_rcnn_roialign.svgz new file mode 100644 index 0000000..0cefb39 Binary files /dev/null and b/slides/06/mask_rcnn_roialign.svgz differ diff --git a/slides/06/mask_rcnn_roialign.svgz.ref b/slides/06/mask_rcnn_roialign.svgz.ref new file mode 100644 index 0000000..b4070e5 --- /dev/null +++ b/slides/06/mask_rcnn_roialign.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "Mask R-CNN", https://arxiv.org/abs/1703.06870 diff --git a/slides/06/normalizations.svgz b/slides/06/normalizations.svgz new file mode 100644 index 0000000..6230387 Binary files /dev/null and b/slides/06/normalizations.svgz differ diff --git a/slides/06/normalizations.svgz.ref b/slides/06/normalizations.svgz.ref new file mode 100644 index 0000000..7b89167 --- /dev/null +++ b/slides/06/normalizations.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Group Normalization", https://arxiv.org/abs/1803.08494 diff --git a/slides/06/object_localization.png b/slides/06/object_localization.png new file mode 100644 index 0000000..a6d3c85 Binary files /dev/null and b/slides/06/object_localization.png differ diff --git a/slides/06/object_localization.png.ref b/slides/06/object_localization.png.ref new file mode 100644 index 0000000..b84eac5 --- /dev/null +++ b/slides/06/object_localization.png.ref @@ -0,0 +1 @@ +Slide 38 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/precision_recall_bottle.svgz b/slides/06/precision_recall_bottle.svgz new file mode 100644 index 0000000..41de99d Binary files /dev/null and b/slides/06/precision_recall_bottle.svgz differ diff --git a/slides/06/precision_recall_bottle.svgz.ref b/slides/06/precision_recall_bottle.svgz.ref new file mode 100644 index 0000000..5a828ee --- /dev/null +++ b/slides/06/precision_recall_bottle.svgz.ref @@ -0,0 +1 @@ +Figure 6 of "The PASCAL Visual Object Classes (VOC) Challenge", http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf diff --git a/slides/06/precision_recall_curve.png b/slides/06/precision_recall_curve.png new file mode 100644 index 0000000..13f8fb9 Binary files /dev/null and b/slides/06/precision_recall_curve.png differ diff --git a/slides/06/precision_recall_curve.png.ref b/slides/06/precision_recall_curve.png.ref new file mode 100644 index 0000000..fc537f8 --- /dev/null +++ b/slides/06/precision_recall_curve.png.ref @@ -0,0 +1 @@ +https://miro.medium.com/max/1400/1*VenTq4IgxjmIpOXWdFb-jg.png diff --git a/slides/06/precision_recall_curve_interpolated.jpg b/slides/06/precision_recall_curve_interpolated.jpg new file mode 100644 index 0000000..817eae0 Binary files /dev/null and b/slides/06/precision_recall_curve_interpolated.jpg differ diff --git a/slides/06/precision_recall_curve_interpolated.jpg.ref b/slides/06/precision_recall_curve_interpolated.jpg.ref new file mode 100644 index 0000000..9a840d2 --- /dev/null +++ b/slides/06/precision_recall_curve_interpolated.jpg.ref @@ -0,0 +1 @@ +https://miro.medium.com/max/1400/1*pmSxeb4EfdGnzT6Xa68GEQ.jpeg diff --git a/slides/06/precision_recall_person.svgz b/slides/06/precision_recall_person.svgz new file mode 100644 index 0000000..808dd55 Binary files /dev/null and b/slides/06/precision_recall_person.svgz differ diff --git a/slides/06/precision_recall_person.svgz.ref b/slides/06/precision_recall_person.svgz.ref new file mode 100644 index 0000000..5a828ee --- /dev/null +++ b/slides/06/precision_recall_person.svgz.ref @@ -0,0 +1 @@ +Figure 6 of "The PASCAL Visual Object Classes (VOC) Challenge", http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf diff --git a/slides/06/pyramidnet_architecture.svgz b/slides/06/pyramidnet_architecture.svgz new file mode 100644 index 0000000..d773f10 Binary files /dev/null and b/slides/06/pyramidnet_architecture.svgz differ diff --git a/slides/06/pyramidnet_architecture.svgz.ref b/slides/06/pyramidnet_architecture.svgz.ref new file mode 100644 index 0000000..321784e --- /dev/null +++ b/slides/06/pyramidnet_architecture.svgz.ref @@ -0,0 +1 @@ +Table 1 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/pyramidnet_blocks.svgz b/slides/06/pyramidnet_blocks.svgz new file mode 100644 index 0000000..077785f Binary files /dev/null and b/slides/06/pyramidnet_blocks.svgz differ diff --git a/slides/06/pyramidnet_blocks.svgz.ref b/slides/06/pyramidnet_blocks.svgz.ref new file mode 100644 index 0000000..2fde23d --- /dev/null +++ b/slides/06/pyramidnet_blocks.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/pyramidnet_cifar.svgz b/slides/06/pyramidnet_cifar.svgz new file mode 100644 index 0000000..4f2b985 Binary files /dev/null and b/slides/06/pyramidnet_cifar.svgz differ diff --git a/slides/06/pyramidnet_cifar.svgz.ref b/slides/06/pyramidnet_cifar.svgz.ref new file mode 100644 index 0000000..bc183f0 --- /dev/null +++ b/slides/06/pyramidnet_cifar.svgz.ref @@ -0,0 +1 @@ +Table 4 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/pyramidnet_growth_rate.svgz b/slides/06/pyramidnet_growth_rate.svgz new file mode 100644 index 0000000..5474788 Binary files /dev/null and b/slides/06/pyramidnet_growth_rate.svgz differ diff --git a/slides/06/pyramidnet_growth_rate.svgz.ref b/slides/06/pyramidnet_growth_rate.svgz.ref new file mode 100644 index 0000000..12ee550 --- /dev/null +++ b/slides/06/pyramidnet_growth_rate.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/pyramidnet_residuals.svgz b/slides/06/pyramidnet_residuals.svgz new file mode 100644 index 0000000..c4290c1 Binary files /dev/null and b/slides/06/pyramidnet_residuals.svgz differ diff --git a/slides/06/pyramidnet_residuals.svgz.ref b/slides/06/pyramidnet_residuals.svgz.ref new file mode 100644 index 0000000..b53108d --- /dev/null +++ b/slides/06/pyramidnet_residuals.svgz.ref @@ -0,0 +1 @@ +Figure 5 of "Deep Pyramidal Residual Networks", https://arxiv.org/abs/1610.02915 diff --git a/slides/06/rcnn_architecture.svgz b/slides/06/rcnn_architecture.svgz new file mode 100644 index 0000000..0a7cf0e Binary files /dev/null and b/slides/06/rcnn_architecture.svgz differ diff --git a/slides/06/rcnn_architecture.svgz.ref b/slides/06/rcnn_architecture.svgz.ref new file mode 100644 index 0000000..1a20f30 --- /dev/null +++ b/slides/06/rcnn_architecture.svgz.ref @@ -0,0 +1 @@ +Slide 54 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/retinanet.svgz b/slides/06/retinanet.svgz new file mode 100644 index 0000000..60fe4c1 Binary files /dev/null and b/slides/06/retinanet.svgz differ diff --git a/slides/06/retinanet.svgz.ref b/slides/06/retinanet.svgz.ref new file mode 100644 index 0000000..aab04d0 --- /dev/null +++ b/slides/06/retinanet.svgz.ref @@ -0,0 +1 @@ +Figure 3 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/retinanet_ablations.svgz b/slides/06/retinanet_ablations.svgz new file mode 100644 index 0000000..aec5956 Binary files /dev/null and b/slides/06/retinanet_ablations.svgz differ diff --git a/slides/06/retinanet_ablations.svgz.ref b/slides/06/retinanet_ablations.svgz.ref new file mode 100644 index 0000000..1e51d14 --- /dev/null +++ b/slides/06/retinanet_ablations.svgz.ref @@ -0,0 +1 @@ +Table 1 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/retinanet_graph.svgz b/slides/06/retinanet_graph.svgz new file mode 100644 index 0000000..299a928 Binary files /dev/null and b/slides/06/retinanet_graph.svgz differ diff --git a/slides/06/retinanet_graph.svgz.ref b/slides/06/retinanet_graph.svgz.ref new file mode 100644 index 0000000..b54356d --- /dev/null +++ b/slides/06/retinanet_graph.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/retinanet_results.svgz b/slides/06/retinanet_results.svgz new file mode 100644 index 0000000..80a5c4d Binary files /dev/null and b/slides/06/retinanet_results.svgz differ diff --git a/slides/06/retinanet_results.svgz.ref b/slides/06/retinanet_results.svgz.ref new file mode 100644 index 0000000..38a2dcf --- /dev/null +++ b/slides/06/retinanet_results.svgz.ref @@ -0,0 +1 @@ +Table 2 of "Focal Loss for Dense Object Detection", https://arxiv.org/abs/1708.02002 diff --git a/slides/06/roi_generation.jpg b/slides/06/roi_generation.jpg new file mode 100644 index 0000000..18f7350 Binary files /dev/null and b/slides/06/roi_generation.jpg differ diff --git a/slides/06/roi_generation.jpg.ref b/slides/06/roi_generation.jpg.ref new file mode 100644 index 0000000..fbb2b02 --- /dev/null +++ b/slides/06/roi_generation.jpg.ref @@ -0,0 +1 @@ +Slide 48 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/06/roi_pooling.svgz b/slides/06/roi_pooling.svgz new file mode 100644 index 0000000..b5d6c0d Binary files /dev/null and b/slides/06/roi_pooling.svgz differ diff --git a/slides/06/roi_pooling.svgz.ref b/slides/06/roi_pooling.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/06/roi_projection.svgz b/slides/06/roi_projection.svgz new file mode 100644 index 0000000..a6aee2e Binary files /dev/null and b/slides/06/roi_projection.svgz differ diff --git a/slides/06/roi_projection.svgz.ref b/slides/06/roi_projection.svgz.ref new file mode 100644 index 0000000..1cc5acc --- /dev/null +++ b/slides/06/roi_projection.svgz.ref @@ -0,0 +1 @@ +Slide 65 of http://cs231n.stanford.edu/slides/2021/lecture_15.pdf. diff --git a/slides/08/08.md b/slides/08/08.md new file mode 100644 index 0000000..e8a103e --- /dev/null +++ b/slides/08/08.md @@ -0,0 +1,594 @@ +title: NPFL138, Lecture 8 +class: title, langtech, cc-by-sa +style: .algorithm { background-color: #eee; padding: .5em } + +# Recurrent Neural Networks + +## Milan Straka + +### April 8, 2024 + +--- +section: RNN +class: middle, center +# Recurrent Neural Networks + +# Recurrent Neural Networks + +--- +# Recurrent Neural Networks + +## Single RNN cell + +![w=17%,h=center](rnn_cell.svgz) + +~~~ + +## Unrolled RNN cells + +![w=60%,h=center](rnn_cell_unrolled.svgz) + +--- +# Basic RNN Cell + +![w=100%,h=center,mw=50%](rnn_cell_basic.svgz)![w=50%,h=center,mw=50%](rnn_cell_basic_as_cell.svgz) + +Given an input $β†’x^{(t)}$ and previous state $β†’h^{(t-1)}$, the new state is computed as +$$β†’h^{(t)} = f(β†’h^{(t-1)}, β†’x^{(t)}; β†’ΞΈ).$$ + +~~~ +One of the simplest possibilities (called `SimpleRNN` in Keras, `RNN` in PyTorch) is +$$β†’h^{(t)} = \tanh(⇉Uβ†’h^{(t-1)} + ⇉Vβ†’x^{(t)} + β†’b).$$ + +--- +# Basic RNN Cell + +Basic RNN cells suffer a lot from vanishing/exploding gradients (the so-called +**challenge of long-term dependencies**). + +~~~ +If we simplify the recurrence of states to just a linear approximation +$$β†’h^{(t)} β‰ˆ ⇉Uβ†’h^{(t-1)},$$ + +~~~ +we get $β†’h^{(t)} β‰ˆ ⇉U^tβ†’h^{(0)}$. + +~~~ +If $⇉U$ has an eigenvalue decomposition of $⇉U = ⇉Q ⇉Λ ⇉Q^{-1}$, we get that +$$β†’h^{(t)} β‰ˆ ⇉Q ⇉Λ^t ⇉Q^{-1} β†’h^{(0)}.$$ +The main problem is that the _same_ function is iteratively applied many times. + +~~~ +Several more complex RNN cell variants have been proposed, which alleviate +this issue to some degree, namely **LSTM** and **GRU**. + +--- +section: LSTM +# Long Short-Term Memory + +Hochreiter & Schmidhuber (1997) suggested that to enforce +_constant error flow_, we would like +$$f' = β†’1.$$ + +~~~ +They propose to achieve that by a _constant error carrousel_. + +![w=60%,h=center](lstm_cec_idea.svgz) + +~~~ ~~ +They propose to achieve that by a _constant error carrousel_. + +![w=60%,h=center](lstm_cec.svgz) + +--- +# Long Short-Term Memory + +They also propose an **input** and **output** gates which control the flow +of information into and out of the carrousel (**memory cell** $β†’c_t$). + +![w=40%,f=right](lstm_input_output_gates.svgz) + +$$\begin{aligned} + \textcolor{blue} {β†’i_t} & ← Οƒ(⇉W^i β†’x_t + ⇉V^i β†’h_{t-1} + β†’b^i) \\ + \textcolor{darkgreen}{β†’o_t} & ← Οƒ(⇉W^o β†’x_t + ⇉V^o β†’h_{t-1} + β†’b^o) \\ + \textcolor{magenta} {β†’c_t} & ← β†’c_{t-1} + β†’i_t βŠ™ \tanh(⇉W^y β†’x_t + ⇉V^y β†’h_{t-1} + β†’b^y) \\ + \textcolor{red} {β†’h_t} & ← β†’o_t βŠ™ \tanh(β†’c_t) +\end{aligned}$$ + +--- +# Long Short-Term Memory + +Later, Gers, Schmidhuber & Cummins (1999) added a possibility to **forget** +information from memory cell $β†’c_t$. + +![w=40%,f=right](lstm_input_output_forget_gates.svgz) + +$$\begin{aligned} + \textcolor{blue} {β†’i_t} & ← Οƒ(⇉W^i β†’x_t + ⇉V^i β†’h_{t-1} + β†’b^i) \\ + \textcolor{darkorange}{β†’f_t} & ← Οƒ(⇉W^f β†’x_t + ⇉V^f β†’h_{t-1} + β†’b^f) \\ + \textcolor{darkgreen} {β†’o_t} & ← Οƒ(⇉W^o β†’x_t + ⇉V^o β†’h_{t-1} + β†’b^o) \\ + \textcolor{magenta} {β†’c_t} & ← β†’f_t βŠ™ β†’c_{t-1} + β†’i_t βŠ™ \tanh(⇉W^y β†’x_t + ⇉V^y β†’h_{t-1} + β†’b^y) \\ + \textcolor{red} {β†’h_t} & ← β†’o_t βŠ™ \tanh(β†’c_t) +\end{aligned}$$ + +~~~ +Note that since 2015, following the paper +- R. Jozefowicz et al.: _An Empirical Exploration of Recurrent Network Architectures_ + +the forget gate bias $β†’b^f$ is usually initialized to 1, so that the forget gate is closer +to 1 and the gradients can easily flow through multiple timesteps. +~~~ +(Gers et al. advocated this in the original paper already.) +~~~ +(BTW, I think 3 might be even better, as $Οƒ(1) β‰ˆ 0.731$, $Οƒ(3) β‰ˆ 0.953$.) + +--- +# Long Short-Term Memory +![w=100%,v=middle](LSTM3-SimpleRNN.png) + +--- +# Long Short-Term Memory +![w=100%,v=middle](LSTM3-chain.png) + +--- +# Long Short-Term Memory +![w=100%,v=middle](LSTM3-C-line.png) + +--- +# Long Short-Term Memory +![w=100%,v=middle](LSTM3-focus-i.png) + +--- +# Long Short-Term Memory +![w=100%,v=middle](LSTM3-focus-f.png) + +--- +# Long Short-Term Memory +![w=100%,v=middle](LSTM3-focus-C.png) + +--- +# Long Short-Term Memory +![w=100%,v=middle](LSTM3-focus-o.png) + +--- +section: GRU +# Gated Recurrent Unit + +**Gated recurrent unit (GRU)** was proposed by Cho et al. (2014) as +a simplification of LSTM. The main differences are + +![w=45%,f=right](gru.svgz) + +- no memory cell, +- forgetting and updating tied together. + +~~~ +$$\begin{aligned} + \textcolor{blue} {β†’r_t} & ← Οƒ(⇉W^r β†’x_t + ⇉V^r β†’h_{t-1} + β†’b^r) \\ + \textcolor{darkgreen}{β†’u_t} & ← Οƒ(⇉W^u β†’x_t + ⇉V^u β†’h_{t-1} + β†’b^u) \\ + \textcolor{magenta} {β†’hΜ‚_t} & ← \tanh(⇉W^h β†’x_t + ⇉V^h (β†’r_t βŠ™ β†’h_{t-1}) + β†’b^h) \\ + \textcolor{red} {β†’h_t} & ← β†’u_t βŠ™ β†’h_{t-1} + (1 - β†’u_t) βŠ™ β†’hΜ‚_t +\end{aligned}$$ + +--- +# Gated Recurrent Unit +![w=100%,v=middle](LSTM3-var-GRU.png) + +--- +# GRU and LSTM Differences + +The main differences between GRU and LSTM: +~~~ +- GRU uses fewer parameters and less computation. + + - six matrices $⇉W$, $⇉V$ instead of eight +~~~ +- GRU are easier to work with, because the state is just one tensor, while it is + a pair of tensors for LSTM. +~~~ +- In most tasks, LSTM and GRU give very similar results. +~~~ +- However, there are some tasks, on which LSTM achieves (much) better results + than GRU. +~~~ + - For a demonstration of difference in the expressive power of LSTM and GRU + (caused by the coupling of the forget and update gate), see the paper + - G. Weiss et al.: _On the Practical Computational Power of Finite Precision + RNNs for Language Recognition_ https://arxiv.org/abs/1805.04908 +~~~ + - For a difference between LSTM and GRU on a real-word task, see for example + - T. Dozat et al.: _Deep Biaffine Attention for Neural Dependency Parsing_ + https://arxiv.org/abs/1611.01734 + +--- +# SimpleRNN, GRU, and LSTM Initialization + +Recall that when we approximate $β†’h^{(t)} β‰ˆ ⇉Uβ†’h^{(t-1)}$, +assuming the eigenvalue decomposition of $⇉U = ⇉Q ⇉Λ ⇉Q^{-1}$, we get +$$β†’h^{(t)} β‰ˆ ⇉Q ⇉Λ^t ⇉Q^{-1} β†’h^{(0)}.$$ + +~~~ +This motivated a specific initialization scheme for the $⇉U$ matrix – +this so-called **recurrent kernel** (the concatenation of all the $⇉V^i$, +$⇉V^f$, $⇉V^o$, $⇉V^y$ matrices) is initialized with a randomly generated +orthogonal matrix. + +~~~ +This **orthogonal** initialization is used for all RNN cells in Keras +(via the `recurrent_initializer='orthogonal'` parameter of `SimpleRNN`, `GRU`, +and `LSTM`). + +--- +section: HighwayNetworks +class: middle, center +# Highway Networks + +# Highway Networks + +--- +# Highway Networks + +For input $β†’x$, fully connected layer computes +$$β†’y ← H(β†’x, ⇉W_H).$$ + +~~~ +Highway networks add residual connection with gating: +$$β†’y ← H(β†’x, ⇉W_H) βŠ™ T(β†’x, ⇉W_T) + β†’x βŠ™ (1 - T(β†’x, ⇉W_T)).$$ + +~~~ +Usually, the gating is defined as +$$T(β†’x, ⇉W_T) ← Οƒ(⇉W_T β†’x + β†’b_T).$$ + +~~~ +Note that the resulting update is very similar to a GRU cell with $β†’h_t$ removed; for a +fully connected layer $H(β†’x, ⇉W_H) = \tanh(⇉W_H β†’x + β†’b_H)$ it is exactly it, +apart from copying $β†’x$ instead of $β†’h_{t-1}$. + +~~~ +Analogously to LSTM, the transform gate bias $β†’b_T$ should be initialized to +a negative number. + +--- +# Highway Networks on MNIST + +![w=100%](highway_training.svgz) + +--- +# Highway Networks + +![w=90%,h=center](highway_activations.jpg) + +--- +# Highway Networks + +![w=95%,h=center](highway_leisoning.svgz) + +--- +section: RNNRegularization +# Regularizing RNNs + +## Dropout + +- Using dropout on hidden states interferes with long-term dependencies. + +~~~ + +- However, using dropout on the inputs and outputs works well and is used +frequently. +~~~ + - In case residual connections are present, the output dropout needs to be + applied before adding the residual connection. + +~~~ +- Several techniques were designed to allow using dropout on hidden states. + - Variational Dropout + - Recurrent Dropout + - Zoneout + +--- +# Regularizing RNNs + +## Variational Dropout + +![w=75%,h=center](variational_rnn.svgz) + +~~~ +To implement variational dropout on inputs in Keras, use `noise_shape` of +`keras.layers.Dropout` to force the same mask across time-steps. +The variational dropout on the hidden states can be implemented using +`recurrent_dropout` argument of `keras.layers.{LSTM,GRU,SimpleRNN}{,Cell}`. + +--- +# Regularizing RNNs + +## Recurrent Dropout + +Dropout only candidate states (i.e., values added to the memory cell in LSTM and +previous state in GRU), independently in every time-step. + +~~~ +## Zoneout + +Randomly preserve hidden activations instead of dropping them. + +~~~ +## Batch Normalization + +![w=42%,f=right](recurrent_batch_normalization.svgz) + +Very fragile and sensitive to proper initialization – there were papers with +negative results (_Dario Amodei et al, 2015: Deep Speech 2_ or _Cesar Laurent et al, +2016: Batch Normalized Recurrent Neural Networks_) until people managed to make +it work (_Tim Cooijmans et al, 2016: Recurrent Batch Normalization_; +specifically, initializing $Ξ³=0.1$ did the trick). + +--- +# Regularizing RNNs + +## Batch Normalization + +Neuron value is normalized across the minibatch, and in case of CNN also across +all positions. + +~~~ +## Layer Normalization + +Neuron value is normalized across the layer. + +~~~ +![w=100%](../06/normalizations.svgz) + +--- +# Layer Normalization + +Consider a hidden value $β†’x ∈ ℝ^D$. Layer normalization (both during training and +during inference) is performed as follows. + +
+ +**Inputs**: An example $β†’x ∈ ℝ^D$, $Ξ΅ ∈ ℝ$ with default value 0.001
+**Parameters**: $β†’Ξ² ∈ ℝ^D$ initialized to $β†’0$, $β†’Ξ³ ∈ ℝ^D$ initialized to $β†’1$
+**Outputs**: Normalized example $β†’y$ + +~~~ +- $ΞΌ ← \frac{1}{D} βˆ‘_{i = 1}^D x_i$ + +~~~ +- $Οƒ^2 ← \frac{1}{D} βˆ‘_{i = 1}^D (x_i - ΞΌ)^2$ +~~~ +- $β†’xΜ‚ ← (β†’x - ΞΌ) / \sqrt{Οƒ^2 + Ξ΅}$ +~~~ +- $β†’y ← β†’Ξ³ βŠ™ β†’xΜ‚ + β†’Ξ²$ +
+ +--- +# Regularizing RNNs + +## Layer Normalization + +Much more stable than batch normalization for RNN regularization. + +![w=70%,h=center](layer_norm.svgz) + +~~~ +![w=85%,h=center](layer_norm_properties.svgz) + +--- +# Layer Normalization + +In an important recent architecture (namely Transformer), many fully +connected layers are used, with a residual connection and a layer normalization. + +![w=85%,h=center](layer_norm_residual.svgz) + +~~~ +This could be considered an alternative to highway networks, i.e., a suitable +residual connection for fully connected layers. +~~~ +Note the architecture can be considered as a variant of a mobile inverted +bottleneck $1Γ—1$ convolution block. + +--- +section: RNNArchitectures +# Basic RNN Architectures and Tasks + +## Sequence Element Representation + +Create output for individual elements, for example for classification of the +individual elements. + +![w=70%,h=center](rnn_cell_unrolled.svgz) + +~~~ +## Sequence Representation + +Generate a single output for the whole sequence (either the last output or the +last state). + +--- +# Basic RNN Architectures and Tasks + +## Sequence Prediction + +During training, predict next sequence element. + +![w=75%,h=center](sequence_prediction_training.svgz) + +~~~ +During inference, use predicted elements as further inputs. + +![w=75%,h=center](sequence_prediction_inference.svgz) + +--- +# Multilayer RNNs + +We might stack several layers of recurrent neural networks. Usually using two or +three layers gives better results than just one. + +![w=75%,h=center](multilayer_rnn.svgz) + +--- +# Multilayer RNNs + +In case of multiple layers, residual connections usually improve results. +Because dimensionality has to be the same, they are usually applied from the +second layer. + +![w=75%,h=center](multilayer_rnn_residual.svgz) + +--- +# Bidirectional RNN + +To consider both the left and right contexts, a **bidirectional** RNN can be used, +which consists of parallel application of a **forward** RNN and a **backward** RNN. + +![w=80%,h=center](bidirectional_rnn.svgz) + +~~~ +The outputs of both directions can be either **added** or **concatenated**. Even +if adding them does not seem very intuitive, it does not increase +dimensionality and therefore allows residual connections to be used in case +of multilayer bidirectional RNN. + +--- +section: WE +# Word Embeddings + +We might represent **words** using one-hot encoding, considering all words to be +independent of each other. + +~~~ +However, words are not independent – some are more similar than others. + +~~~ +Ideally, we would like some kind of similarity in the space of the word +representations. + +~~~ +## Distributed Representation +The idea behind distributed representation is that objects can +be represented using a set of common underlying factors. + +~~~ +We therefore represent words as fixed-size **embeddings** into $ℝ^d$ space, +with the vector elements playing role of the common underlying factors. + +~~~ +These embeddings are initialized randomly and trained together with the rest of +the network. + +--- +# Word Embeddings + +The word embedding layer is in fact just a fully connected layer on top of +one-hot encoding. However, it is not implemented in that way. + +~~~ +Instead, the so-called **embedding** layer is used, which is much more efficient. +When a matrix is multiplied by an one-hot encoded vector (all but one zeros +and exactly one 1), the row corresponding to that 1 is selected, so the +embedding layer can be implemented only as a simple lookup. + +~~~ +In Keras, the embedding layer is available as +```python +keras.layers.Embedding(input_dim, output_dim) +``` + +~~~ +In PyTorch, it is available as +```python +torch.nn.Embedding(input_dim, output_dim) +``` + +--- +# Word Embeddings + +Even if the embedding layer is just a fully connected layer on top of one-hot +encoding, it is important that this layer is _shared_ across +the whole network. + +~~~ +![w=37.5%](words_onehot.svgz) +~~~ +![w=60.5%](words_embeddings.svgz) + +--- +section: CLE +# Word Embeddings for Unknown Words + +![w=42%,f=right](cle_rnn.svgz) + +## Recurrent Character-level WEs + +In order to handle words not seen during training, we could find a way +to generate a representation from the word **characters**. + +~~~ +A possible way to compose the representation from individual characters +is to use RNNs – we embed _characters_ to get character representation, +and then use an RNN to produce the representation of a whole _sequence of +characters_. + +~~~ +Usually, both forward and backward directions are used, and the resulting +representations are concatenated/added. + +--- +# Word Embeddings for Unknown Words + +## Convolutional Character-level WEs + +![w=32%,f=right](cle_cnn.png) + +Alternatively, 1D convolutions might be used. + +~~~ +Assume we use a 1D convolution with kernel size 3. It produces a representation +for every input word trigram, but we need a representation of the whole word. +To that end, we use _global max-pooling_ – using it has an interpretable +meaning, where the kernel is a _pattern_ and the activation after the maximum +is a level of a highest match of the pattern anywhere in the word. + +~~~ +Kernels of varying sizes are usually used (because it makes sense to have +patterns for unigrams, bigrams, trigrams, …) – for example, 25 filters for every +kernel size $(1, 2, 3, 4, 5)$ might be used. + +~~~ +Lastly, authors employed a highway layer after the convolutions, improving +the results (compared to not using any layer or using a fully connected one). + +--- +# Examples of Recurrent Character-level WEs + +![w=80%,h=center](cle_rnn_examples.svgz) + +--- +# Examples of Convolutional Character-level WEs + +![w=100%](cle_cnn_examples.svgz) + +--- +# Character-level WE Implementation + +## Training + +- Generate unique words per batch. + +~~~ +- Process the unique words in the batch. + +~~~ +- Copy the resulting embeddings suitably in the batch. + +~~~ +## Inference + +- We can cache character-level word embeddings during inference. + +--- +# NLP Processing with CLEs + +![w=100%,v=middle](cle_rnn_gru.png) + diff --git a/slides/08/LSTM3-C-line.png b/slides/08/LSTM3-C-line.png new file mode 100644 index 0000000..ce79157 Binary files /dev/null and b/slides/08/LSTM3-C-line.png differ diff --git a/slides/08/LSTM3-C-line.png.ref b/slides/08/LSTM3-C-line.png.ref new file mode 100644 index 0000000..bdccd03 --- /dev/null +++ b/slides/08/LSTM3-C-line.png.ref @@ -0,0 +1 @@ +http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-C-line.png diff --git a/slides/08/LSTM3-SimpleRNN.png b/slides/08/LSTM3-SimpleRNN.png new file mode 100644 index 0000000..9472592 Binary files /dev/null and b/slides/08/LSTM3-SimpleRNN.png differ diff --git a/slides/08/LSTM3-SimpleRNN.png.ref b/slides/08/LSTM3-SimpleRNN.png.ref new file mode 100644 index 0000000..5f038dd --- /dev/null +++ b/slides/08/LSTM3-SimpleRNN.png.ref @@ -0,0 +1 @@ +http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png diff --git a/slides/08/LSTM3-chain.png b/slides/08/LSTM3-chain.png new file mode 100644 index 0000000..e962a3c Binary files /dev/null and b/slides/08/LSTM3-chain.png differ diff --git a/slides/08/LSTM3-chain.png.ref b/slides/08/LSTM3-chain.png.ref new file mode 100644 index 0000000..5dc69bd --- /dev/null +++ b/slides/08/LSTM3-chain.png.ref @@ -0,0 +1 @@ +http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png diff --git a/slides/08/LSTM3-focus-C.png b/slides/08/LSTM3-focus-C.png new file mode 100644 index 0000000..7fc49f5 Binary files /dev/null and b/slides/08/LSTM3-focus-C.png differ diff --git a/slides/08/LSTM3-focus-C.png.ref b/slides/08/LSTM3-focus-C.png.ref new file mode 100644 index 0000000..a32d12a --- /dev/null +++ b/slides/08/LSTM3-focus-C.png.ref @@ -0,0 +1 @@ +http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-C.png diff --git a/slides/08/LSTM3-focus-f.png b/slides/08/LSTM3-focus-f.png new file mode 100644 index 0000000..5808675 Binary files /dev/null and b/slides/08/LSTM3-focus-f.png differ diff --git a/slides/08/LSTM3-focus-f.png.ref b/slides/08/LSTM3-focus-f.png.ref new file mode 100644 index 0000000..827d9b7 --- /dev/null +++ b/slides/08/LSTM3-focus-f.png.ref @@ -0,0 +1 @@ +http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-f.png diff --git a/slides/08/LSTM3-focus-i.png b/slides/08/LSTM3-focus-i.png new file mode 100644 index 0000000..d3d82f0 Binary files /dev/null and b/slides/08/LSTM3-focus-i.png differ diff --git a/slides/08/LSTM3-focus-i.png.ref b/slides/08/LSTM3-focus-i.png.ref new file mode 100644 index 0000000..3f83f87 --- /dev/null +++ b/slides/08/LSTM3-focus-i.png.ref @@ -0,0 +1 @@ +http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-i.png diff --git a/slides/08/LSTM3-focus-o.png b/slides/08/LSTM3-focus-o.png new file mode 100644 index 0000000..40fc56b Binary files /dev/null and b/slides/08/LSTM3-focus-o.png differ diff --git a/slides/08/LSTM3-focus-o.png.ref b/slides/08/LSTM3-focus-o.png.ref new file mode 100644 index 0000000..d9ad766 --- /dev/null +++ b/slides/08/LSTM3-focus-o.png.ref @@ -0,0 +1 @@ +http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-o.png diff --git a/slides/08/LSTM3-var-GRU.png b/slides/08/LSTM3-var-GRU.png new file mode 100644 index 0000000..6838a20 Binary files /dev/null and b/slides/08/LSTM3-var-GRU.png differ diff --git a/slides/08/LSTM3-var-GRU.png.ref b/slides/08/LSTM3-var-GRU.png.ref new file mode 100644 index 0000000..985df8d --- /dev/null +++ b/slides/08/LSTM3-var-GRU.png.ref @@ -0,0 +1 @@ +http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-var-GRU.png diff --git a/slides/08/bidirectional_rnn.ipe b/slides/08/bidirectional_rnn.ipe new file mode 100644 index 0000000..098dfa2 --- /dev/null +++ b/slides/08/bidirectional_rnn.ipe @@ -0,0 +1,456 @@ + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +32 768 m +64 768 l + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + + diff --git a/slides/08/bidirectional_rnn.svgz b/slides/08/bidirectional_rnn.svgz new file mode 100644 index 0000000..053e516 Binary files /dev/null and b/slides/08/bidirectional_rnn.svgz differ diff --git a/slides/08/bidirectional_rnn.svgz.ref b/slides/08/bidirectional_rnn.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/cle_cnn.png b/slides/08/cle_cnn.png new file mode 100644 index 0000000..286a13d Binary files /dev/null and b/slides/08/cle_cnn.png differ diff --git a/slides/08/cle_cnn.png.ref b/slides/08/cle_cnn.png.ref new file mode 100644 index 0000000..833534b --- /dev/null +++ b/slides/08/cle_cnn.png.ref @@ -0,0 +1 @@ +Figure 1 of "Character-Aware Neural Language Models", https://arxiv.org/abs/1508.06615 diff --git a/slides/08/cle_cnn_examples.svgz b/slides/08/cle_cnn_examples.svgz new file mode 100644 index 0000000..3cd8541 Binary files /dev/null and b/slides/08/cle_cnn_examples.svgz differ diff --git a/slides/08/cle_cnn_examples.svgz.ref b/slides/08/cle_cnn_examples.svgz.ref new file mode 100644 index 0000000..5012fbf --- /dev/null +++ b/slides/08/cle_cnn_examples.svgz.ref @@ -0,0 +1 @@ +Table 6 of "Character-Aware Neural Language Models", https://arxiv.org/abs/1508.06615 diff --git a/slides/08/cle_rnn.svgz b/slides/08/cle_rnn.svgz new file mode 100644 index 0000000..46a7cfd Binary files /dev/null and b/slides/08/cle_rnn.svgz differ diff --git a/slides/08/cle_rnn.svgz.ref b/slides/08/cle_rnn.svgz.ref new file mode 100644 index 0000000..ee616f0 --- /dev/null +++ b/slides/08/cle_rnn.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation", https://arxiv.org/abs/1508.02096 diff --git a/slides/08/cle_rnn_examples.svgz b/slides/08/cle_rnn_examples.svgz new file mode 100644 index 0000000..cc3e0e7 Binary files /dev/null and b/slides/08/cle_rnn_examples.svgz differ diff --git a/slides/08/cle_rnn_examples.svgz.ref b/slides/08/cle_rnn_examples.svgz.ref new file mode 100644 index 0000000..9d722a3 --- /dev/null +++ b/slides/08/cle_rnn_examples.svgz.ref @@ -0,0 +1 @@ +Table 2 of "Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation", https://arxiv.org/abs/1508.02096 diff --git a/slides/08/cle_rnn_gru.png b/slides/08/cle_rnn_gru.png new file mode 100644 index 0000000..bd87286 Binary files /dev/null and b/slides/08/cle_rnn_gru.png differ diff --git a/slides/08/cle_rnn_gru.png.ref b/slides/08/cle_rnn_gru.png.ref new file mode 100644 index 0000000..a45cbc4 --- /dev/null +++ b/slides/08/cle_rnn_gru.png.ref @@ -0,0 +1 @@ +Figure 1 of "Multi-Task Cross-Lingual Sequence Tagging from Scratch", https://arxiv.org/abs/1603.06270 diff --git a/slides/08/gru.ipe b/slides/08/gru.ipe new file mode 100644 index 0000000..c443153 --- /dev/null +++ b/slides/08/gru.ipe @@ -0,0 +1,412 @@ + + + + +\usepackage{bm} + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +96 768 m +96 624 l +352 624 l +352 768 l +h + + +96 672 m +252 672 l + + +160 644 m +160 628 l +192 628 l +192 644 l +h + +\sigma + +256 624 m +256 640 l + + +160 592 m +176 624 l + + +192 592 m +176 624 l + +\bm x_t +\bm h_{t-1} + +256 656 m +256 668 l + + +56 704 m +88 704 l + +\bm h_{t-1} + +96 720 m +172 720 l + + +160 644 m +160 628 l +192 628 l +192 644 l +h + +\sigma + +176 768 m +176 752 l + + +160 592 m +176 624 l + + +192 592 m +176 624 l + +\bm h_{t-1} + +176 736 m +176 724 l + +\bm x_t + +56 704 m +88 704 l + +\bm x_t + +116 664 m +116 648 l +152 648 l +152 664 l +h + +\tanh + +96 704 m +180 704 l +196 712 l + + +56 704 m +88 704 l + +\bm h_{t-1} +1- + +244 700 m +244 684 l +268 684 l +268 700 l +h + + +256 656 m +272 672 +256 684 c + + +256 700 m +256 708 l + + +232 712 m +252 712 l + + +260 712 m +300 688 l + + +260 672 m +300 688 l + + +308 688 m +352 688 l + + +352 656 m +384 656 l + +\bm h_t + +4 0 0 4 176 656 e + +\cdot + +180 720 m +196 712 l + + +4 0 0 4 176 656 e + +\cdot + +4 0 0 4 176 656 e + +\cdot + +4 0 0 4 176 656 e + ++ + + diff --git a/slides/08/gru.svgz b/slides/08/gru.svgz new file mode 100644 index 0000000..7ed69fa Binary files /dev/null and b/slides/08/gru.svgz differ diff --git a/slides/08/gru.svgz.ref b/slides/08/gru.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/highway_activations.jpg b/slides/08/highway_activations.jpg new file mode 100644 index 0000000..7989db3 Binary files /dev/null and b/slides/08/highway_activations.jpg differ diff --git a/slides/08/highway_activations.jpg.ref b/slides/08/highway_activations.jpg.ref new file mode 100644 index 0000000..91c9627 --- /dev/null +++ b/slides/08/highway_activations.jpg.ref @@ -0,0 +1 @@ +Figure 2 of "Training Very Deep Networks", https://arxiv.org/abs/1507.06228 diff --git a/slides/08/highway_leisoning.svgz b/slides/08/highway_leisoning.svgz new file mode 100644 index 0000000..1f37f1b Binary files /dev/null and b/slides/08/highway_leisoning.svgz differ diff --git a/slides/08/highway_leisoning.svgz.ref b/slides/08/highway_leisoning.svgz.ref new file mode 100644 index 0000000..3078441 --- /dev/null +++ b/slides/08/highway_leisoning.svgz.ref @@ -0,0 +1 @@ +Figure 4 of "Training Very Deep Networks", https://arxiv.org/abs/1507.06228 diff --git a/slides/08/highway_training.svgz b/slides/08/highway_training.svgz new file mode 100644 index 0000000..973bcf3 Binary files /dev/null and b/slides/08/highway_training.svgz differ diff --git a/slides/08/highway_training.svgz.ref b/slides/08/highway_training.svgz.ref new file mode 100644 index 0000000..997d2b8 --- /dev/null +++ b/slides/08/highway_training.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Training Very Deep Networks", https://arxiv.org/abs/1507.06228 diff --git a/slides/08/layer_norm.svgz b/slides/08/layer_norm.svgz new file mode 100644 index 0000000..1951e39 Binary files /dev/null and b/slides/08/layer_norm.svgz differ diff --git a/slides/08/layer_norm.svgz.ref b/slides/08/layer_norm.svgz.ref new file mode 100644 index 0000000..2ff61c5 --- /dev/null +++ b/slides/08/layer_norm.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Layer Normalization", https://arxiv.org/abs/1607.06450 diff --git a/slides/08/layer_norm_properties.svgz b/slides/08/layer_norm_properties.svgz new file mode 100644 index 0000000..aaa0044 Binary files /dev/null and b/slides/08/layer_norm_properties.svgz differ diff --git a/slides/08/layer_norm_properties.svgz.ref b/slides/08/layer_norm_properties.svgz.ref new file mode 100644 index 0000000..e0b63f6 --- /dev/null +++ b/slides/08/layer_norm_properties.svgz.ref @@ -0,0 +1 @@ +Table 1 of "Layer Normalization", https://arxiv.org/abs/1607.06450 diff --git a/slides/08/layer_norm_residual.ipe b/slides/08/layer_norm_residual.ipe new file mode 100644 index 0000000..4ecdd51 --- /dev/null +++ b/slides/08/layer_norm_residual.ipe @@ -0,0 +1,402 @@ + + + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +292 532 m +236 548 l + + +4 0 0 4 176 656 e + ++ + +248 532 m +248 520 l +336 520 l +336 532 l +h + +Fully connected layer + +248 532 m +248 520 l +336 520 l +336 532 l +h + +ReLU + +248 532 m +248 520 l +336 520 l +336 532 l +h + +Fully connected layer + +292 508 m +292 520 l + + +292 508 m +292 520 l + + +292 532 m +236 548 l + + +236 456 m +236 548 l + + +292 508 m +292 520 l + + +248 532 m +248 520 l +336 520 l +336 532 l +h + +Layer normalization + +292 508 m +292 520 l + + +292 508 m +292 520 l + +For example 512 values +For example 2048 values +For example 512 values +\bf Original ``Post-LN'' configuration + +292 532 m +236 548 l + + +4 0 0 4 176 656 e + ++ + +248 532 m +248 520 l +336 520 l +336 532 l +h + +Fully connected layer + +248 532 m +248 520 l +336 520 l +336 532 l +h + +ReLU + +248 532 m +248 520 l +336 520 l +336 532 l +h + +Fully connected layer + +292 508 m +292 520 l + + +292 508 m +292 520 l + + +292 532 m +236 548 l + + +300 456 m +300 572 l + + +292 508 m +292 520 l + + +292 508 m +292 520 l + +For example 512 values +For example 2048 values +For example 512 values +\bf Improved ``Pre-LN`` configuration since 2020 + +248 532 m +248 520 l +336 520 l +336 532 l +h + +Layer normalization + +292 508 m +292 520 l + + + diff --git a/slides/08/layer_norm_residual.svgz b/slides/08/layer_norm_residual.svgz new file mode 100644 index 0000000..ed3dcde Binary files /dev/null and b/slides/08/layer_norm_residual.svgz differ diff --git a/slides/08/layer_norm_residual.svgz.ref b/slides/08/layer_norm_residual.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/lstm_cec.ipe b/slides/08/lstm_cec.ipe new file mode 100644 index 0000000..3f2f80e --- /dev/null +++ b/slides/08/lstm_cec.ipe @@ -0,0 +1,312 @@ + + +\usepackage{bm} + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +96 768 m +96 624 l +352 624 l +352 768 l +h + + +64 672 m +96 656 l + + +64 640 m +96 656 l + +\bm x_t +\bm h_{t-1} + +352 656 m +384 656 l + + +40 0 0 40 224 704 e + + +96 656 m +224 656 l + + +96 656 m +352 656 l + +\bm h_t + +96 736 m +352 736 l + + +352 656 m +384 656 l + +\bm c_t + +352 656 m +384 656 l + +\bm c_{t-1} + +184 736 m +264 656 l + + +184 656 m +264 736 l + + + diff --git a/slides/08/lstm_cec.svgz b/slides/08/lstm_cec.svgz new file mode 100644 index 0000000..74d37da Binary files /dev/null and b/slides/08/lstm_cec.svgz differ diff --git a/slides/08/lstm_cec.svgz.ref b/slides/08/lstm_cec.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/lstm_cec_idea.ipe b/slides/08/lstm_cec_idea.ipe new file mode 100644 index 0000000..0c948c2 --- /dev/null +++ b/slides/08/lstm_cec_idea.ipe @@ -0,0 +1,290 @@ + + +\usepackage{bm} + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +96 768 m +96 624 l +352 624 l +352 768 l +h + + +64 672 m +96 656 l + + +64 640 m +96 656 l + +\bm x_t +\bm h_{t-1} + +352 656 m +384 656 l + + +40 0 0 40 224 704 e + + +96 656 m +224 656 l + + +224 656 m +352 656 l + +\bm h_t + + diff --git a/slides/08/lstm_cec_idea.svgz b/slides/08/lstm_cec_idea.svgz new file mode 100644 index 0000000..a404dfa Binary files /dev/null and b/slides/08/lstm_cec_idea.svgz differ diff --git a/slides/08/lstm_cec_idea.svgz.ref b/slides/08/lstm_cec_idea.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/lstm_input_output_forget_gates.ipe b/slides/08/lstm_input_output_forget_gates.ipe new file mode 100644 index 0000000..e3e4c85 --- /dev/null +++ b/slides/08/lstm_input_output_forget_gates.ipe @@ -0,0 +1,444 @@ + + + + +\usepackage{bm} + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +160 644 m +160 628 l +192 628 l +192 644 l +h + +\sigma + +176 624 m +176 632 l + + +160 592 m +176 624 l + + +192 592 m +176 624 l + +\bm h_{t-1} +\bm x_t + +32 0 0 32 216 696 e + + +216 664 m +220 664 l + + +96 768 m +96 624 l +352 624 l +352 768 l +h + + +64 672 m +96 656 l + + +64 640 m +96 656 l + +\bm x_t +\bm h_{t-1} + +352 656 m +384 656 l + + +324 664 m +352 664 l + +\bm h_t + +116 664 m +116 648 l +152 648 l +152 664 l +h + +\tanh + +96 656 m +112 656 l + + +160 644 m +160 628 l +192 628 l +192 644 l +h + +\sigma + +176 624 m +176 632 l + + +160 592 m +176 624 l + + +192 592 m +176 624 l + +\bm x_t +\bm h_{t-1} + +176 648 m +176 660 l + + +116 664 m +116 648 l +152 648 l +152 664 l +h + +\tanh + +160 644 m +160 628 l +192 628 l +192 644 l +h + +\sigma + +148 656 m +172 656 l + + +176 624 m +176 632 l + + +160 592 m +176 624 l + + +192 592 m +176 624 l + +\bm x_t +\bm h_{t-1} + +96 728 m +172 728 l + + +352 656 m +384 656 l + +\bm c_t +\bm c_{t-1} + +352 656 m +384 656 l + + +4 0 0 4 176 656 e + +\cdot + +176 648 m +176 660 l + + +4 0 0 4 176 656 e + +\cdot + +148 656 m +172 656 l + + +4 0 0 4 176 656 e + ++ + +220 728 m +352 728 l + + +178.851 666.885 m +212.886 725.303 l + + +219.14 725.351 m +256 664 l + + +4 0 0 4 176 656 e + +\cdot + +180 728 m +212 728 l + + +176 744 m +176 732 l + + + diff --git a/slides/08/lstm_input_output_forget_gates.svgz b/slides/08/lstm_input_output_forget_gates.svgz new file mode 100644 index 0000000..6770692 Binary files /dev/null and b/slides/08/lstm_input_output_forget_gates.svgz differ diff --git a/slides/08/lstm_input_output_forget_gates.svgz.ref b/slides/08/lstm_input_output_forget_gates.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/lstm_input_output_gates.ipe b/slides/08/lstm_input_output_gates.ipe new file mode 100644 index 0000000..0170d2f --- /dev/null +++ b/slides/08/lstm_input_output_gates.ipe @@ -0,0 +1,411 @@ + + + + +\usepackage{bm} + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +32 0 0 32 216 696 e + + +216 664 m +220 664 l + + +96 768 m +96 624 l +352 624 l +352 768 l +h + + +64 672 m +96 656 l + + +64 640 m +96 656 l + +\bm x_t +\bm h_{t-1} + +352 656 m +384 656 l + + +324 664 m +352 664 l + +\bm h_t + +116 664 m +116 648 l +152 648 l +152 664 l +h + +\tanh + +96 656 m +112 656 l + + +160 644 m +160 628 l +192 628 l +192 644 l +h + +\sigma + +176 624 m +176 632 l + + +160 592 m +176 624 l + + +192 592 m +176 624 l + +\bm x_t +\bm h_{t-1} + +176 648 m +176 660 l + + +116 664 m +116 648 l +152 648 l +152 664 l +h + +\tanh + +160 644 m +160 628 l +192 628 l +192 644 l +h + +\sigma + +148 656 m +172 656 l + + +176 624 m +176 632 l + + +160 592 m +176 624 l + + +192 592 m +176 624 l + +\bm x_t +\bm h_{t-1} + +96 728 m +212 728 l + + +352 656 m +384 656 l + +\bm c_t +\bm c_{t-1} + +352 656 m +384 656 l + + +4 0 0 4 176 656 e + +\cdot + +176 648 m +176 660 l + + +4 0 0 4 176 656 e + +\cdot + +148 656 m +172 656 l + + +4 0 0 4 176 656 e + ++ + +220 728 m +352 728 l + + +178.851 666.885 m +212.886 725.303 l + + +219.14 725.351 m +256 664 l + +\phantom{\bm h_{t-1}} + + diff --git a/slides/08/lstm_input_output_gates.svgz b/slides/08/lstm_input_output_gates.svgz new file mode 100644 index 0000000..b26f64a Binary files /dev/null and b/slides/08/lstm_input_output_gates.svgz differ diff --git a/slides/08/lstm_input_output_gates.svgz.ref b/slides/08/lstm_input_output_gates.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/multilayer_rnn.ipe b/slides/08/multilayer_rnn.ipe new file mode 100644 index 0000000..1f85067 --- /dev/null +++ b/slides/08/multilayer_rnn.ipe @@ -0,0 +1,490 @@ + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +32 768 m +64 768 l + + +32 768 m +64 768 l + + + diff --git a/slides/08/multilayer_rnn.svgz b/slides/08/multilayer_rnn.svgz new file mode 100644 index 0000000..bbb50c0 Binary files /dev/null and b/slides/08/multilayer_rnn.svgz differ diff --git a/slides/08/multilayer_rnn.svgz.ref b/slides/08/multilayer_rnn.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/multilayer_rnn_residual.ipe b/slides/08/multilayer_rnn_residual.ipe new file mode 100644 index 0000000..194b03a --- /dev/null +++ b/slides/08/multilayer_rnn_residual.ipe @@ -0,0 +1,550 @@ + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +80 816 m +80 784 l + + +32 768 m +64 768 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +16 0 0 16 80 768 e + + +32 768 m +64 768 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 816 m +80 784 l + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +80 800 m +32 768 +80 736 c + + +32 768 m +64 768 l + + +32 768 m +64 768 l + + +32 768 m +64 768 l + + + diff --git a/slides/08/multilayer_rnn_residual.svgz b/slides/08/multilayer_rnn_residual.svgz new file mode 100644 index 0000000..6abdf6a Binary files /dev/null and b/slides/08/multilayer_rnn_residual.svgz differ diff --git a/slides/08/multilayer_rnn_residual.svgz.ref b/slides/08/multilayer_rnn_residual.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/recurrent_batch_normalization.svgz b/slides/08/recurrent_batch_normalization.svgz new file mode 100644 index 0000000..b7e6c7d Binary files /dev/null and b/slides/08/recurrent_batch_normalization.svgz differ diff --git a/slides/08/recurrent_batch_normalization.svgz.ref b/slides/08/recurrent_batch_normalization.svgz.ref new file mode 100644 index 0000000..55024b1 --- /dev/null +++ b/slides/08/recurrent_batch_normalization.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "Recurrent Batch Normalization", https://arxiv.org/abs/1603.09025 diff --git a/slides/08/rnn_cell.ipe b/slides/08/rnn_cell.ipe new file mode 100644 index 0000000..0d76d0c --- /dev/null +++ b/slides/08/rnn_cell.ipe @@ -0,0 +1,353 @@ + + + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +32 0 0 32 192 672 e + +\textit{input} + +192 752 m +192 704 l + + +192 640 m +192 584 l + +\textit{output} + +236 672 m +256 672 +256 628 +128 628 +124 672 +160 672 c + +\textit{state} + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +224 672 m +236 672 l + + + diff --git a/slides/08/rnn_cell.svgz b/slides/08/rnn_cell.svgz new file mode 100644 index 0000000..5ecff56 Binary files /dev/null and b/slides/08/rnn_cell.svgz differ diff --git a/slides/08/rnn_cell.svgz.ref b/slides/08/rnn_cell.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/rnn_cell_basic.ipe b/slides/08/rnn_cell_basic.ipe new file mode 100644 index 0000000..a97675e --- /dev/null +++ b/slides/08/rnn_cell_basic.ipe @@ -0,0 +1,288 @@ + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +80 800 m +80 736 l +96 736 l +96 800 l +h + + +128 768 m +128 704 l +144 704 l +144 768 l +h + + +80 800 m +80 736 l +96 736 l +96 800 l +h + +\textit{input} +\textit{previous state} + +128 776 m +160 736 l + + +128 696 m +160 736 l + +\textit{output~=~new state} + + diff --git a/slides/08/rnn_cell_basic.svgz b/slides/08/rnn_cell_basic.svgz new file mode 100644 index 0000000..f1ea259 Binary files /dev/null and b/slides/08/rnn_cell_basic.svgz differ diff --git a/slides/08/rnn_cell_basic.svgz.ref b/slides/08/rnn_cell_basic.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/rnn_cell_basic_as_cell.ipe b/slides/08/rnn_cell_basic_as_cell.ipe new file mode 100644 index 0000000..b0440ad --- /dev/null +++ b/slides/08/rnn_cell_basic_as_cell.ipe @@ -0,0 +1,433 @@ + + + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +64 0 0 64 192 672 e + +\textit{input} + +192 752 m +192 704 l + + +192 576 m +192 512 l + +\textit{output} + +268 640 m +288 640 +288 556 +96 556 +96 640 +128 640 c + +\textit{state} + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +224 672 m +236 672 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + +\tanh + +176 640 m +256 640 l + + +128 640 m +176 640 l + + +184 648 m +8 0 0 8 192 648 192 640 a + + +192 672 m +8 0 0 8 192 680 200 680 a + + +160 672 m +8 0 0 8 160 664 152 664 a + + +192 704 m +192 680 l + + +184 672 m +168 672 l + + +160 664 m +160 648 l + + +184 648 m +8 0 0 8 192 648 192 640 a + + +192 672 m +8 0 0 8 192 680 200 680 a + + +160 672 m +8 0 0 8 160 664 152 664 a + + +192 704 m +192 680 l + + +184 672 m +168 672 l + + +160 664 m +160 648 l + + + diff --git a/slides/08/rnn_cell_basic_as_cell.svgz b/slides/08/rnn_cell_basic_as_cell.svgz new file mode 100644 index 0000000..7947878 Binary files /dev/null and b/slides/08/rnn_cell_basic_as_cell.svgz differ diff --git a/slides/08/rnn_cell_basic_as_cell.svgz.ref b/slides/08/rnn_cell_basic_as_cell.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/rnn_cell_unrolled.ipe b/slides/08/rnn_cell_unrolled.ipe new file mode 100644 index 0000000..9063203 --- /dev/null +++ b/slides/08/rnn_cell_unrolled.ipe @@ -0,0 +1,606 @@ + + + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +32 0 0 32 64 720 e + +\textit{input~1} + +192 752 m +192 704 l + + +192 640 m +192 592 l + +\textit{output~1} +\textit{state} + +96 720 m +160 720 l + + +32 0 0 32 64 720 e + +\textit{input~2} + +192 752 m +192 704 l + + +192 640 m +192 592 l + +\textit{output~2} +\textit{state} + +96 720 m +160 720 l + + +32 0 0 32 64 720 e + +\textit{input~3} + +192 752 m +192 704 l + + +192 640 m +192 592 l + +\textit{output~3} +\textit{state} + +96 720 m +160 720 l + + +32 0 0 32 64 720 e + +\textit{input~4} + +192 752 m +192 704 l + + +192 640 m +192 592 l + +\textit{output~4} +\textit{state} + +96 720 m +160 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + +216 728 m +216 720 l +168 720 l +168 728 l +h + + +176 728 m +176 720 l + + +184 728 m +184 720 l + + +200 728 m +200 720 l + + +208 728 m +208 720 l + + + diff --git a/slides/08/rnn_cell_unrolled.svgz b/slides/08/rnn_cell_unrolled.svgz new file mode 100644 index 0000000..1ac808c Binary files /dev/null and b/slides/08/rnn_cell_unrolled.svgz differ diff --git a/slides/08/rnn_cell_unrolled.svgz.ref b/slides/08/rnn_cell_unrolled.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/sequence_prediction_inference.ipe b/slides/08/sequence_prediction_inference.ipe new file mode 100644 index 0000000..6514be8 --- /dev/null +++ b/slides/08/sequence_prediction_inference.ipe @@ -0,0 +1,371 @@ + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +16 0 0 16 80 736 e + + +80 776 m +80 752 l + +\textit{BOS} + +80 720 m +80 696 l + +\hat x^{(0)} + +96 736 m +128 736 l + + +16 0 0 16 80 736 e + + +144 720 m +144 696 l + +\hat x^{(1)} + +96 736 m +128 736 l + + +16 0 0 16 80 736 e + + +208 720 m +208 696 l + +\hat x^{(2)} + +96 736 m +128 736 l + + +16 0 0 16 80 736 e + + +272 720 m +272 696 l + +\hat x^{(3)} + +96 736 m +128 736 l + + +16 0 0 16 80 736 e + + +336 720 m +336 696 l + +\textit{EOS} + +80 776 m +80 752 l + + +80 776 m +80 752 l + + +80 776 m +80 752 l + + +80 776 m +80 752 l + + +80 696 m +88 680 +112 680 +112 792 +136 792 +144 776 c + + +80 696 m +88 680 +112 680 +112 792 +136 792 +144 776 c + + +80 696 m +88 680 +112 680 +112 792 +136 792 +144 776 c + + +80 696 m +88 680 +112 680 +112 792 +136 792 +144 776 c + + +96 736 m +128 736 l + +\mathit{sequence} +\mathit{representation} + + diff --git a/slides/08/sequence_prediction_inference.svgz b/slides/08/sequence_prediction_inference.svgz new file mode 100644 index 0000000..bf4030b Binary files /dev/null and b/slides/08/sequence_prediction_inference.svgz differ diff --git a/slides/08/sequence_prediction_inference.svgz.ref b/slides/08/sequence_prediction_inference.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/sequence_prediction_training.ipe b/slides/08/sequence_prediction_training.ipe new file mode 100644 index 0000000..a8a2922 --- /dev/null +++ b/slides/08/sequence_prediction_training.ipe @@ -0,0 +1,343 @@ + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +16 0 0 16 80 736 e + + +80 776 m +80 752 l + +\textit{BOS} + +80 720 m +80 696 l + +\hat x^{(0)} + +96 736 m +128 736 l + + +16 0 0 16 80 736 e + +x^{(0)} + +144 720 m +144 696 l + +\hat x^{(1)} + +96 736 m +128 736 l + + +16 0 0 16 80 736 e + +x^{(1)} + +208 720 m +208 696 l + +\hat x^{(2)} + +96 736 m +128 736 l + + +16 0 0 16 80 736 e + +x^{(2)} + +272 720 m +272 696 l + +\hat x^{(3)} + +96 736 m +128 736 l + + +16 0 0 16 80 736 e + +x^{(3)} + +336 720 m +336 696 l + +\textit{EOS} + +80 776 m +80 752 l + + +80 776 m +80 752 l + + +80 776 m +80 752 l + + +80 776 m +80 752 l + + +96 736 m +128 736 l + +\mathit{sequence} +\mathit{representation} + + diff --git a/slides/08/sequence_prediction_training.svgz b/slides/08/sequence_prediction_training.svgz new file mode 100644 index 0000000..722547a Binary files /dev/null and b/slides/08/sequence_prediction_training.svgz differ diff --git a/slides/08/sequence_prediction_training.svgz.ref b/slides/08/sequence_prediction_training.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/variational_rnn.svgz b/slides/08/variational_rnn.svgz new file mode 100644 index 0000000..7552c81 Binary files /dev/null and b/slides/08/variational_rnn.svgz differ diff --git a/slides/08/variational_rnn.svgz.ref b/slides/08/variational_rnn.svgz.ref new file mode 100644 index 0000000..51c0c59 --- /dev/null +++ b/slides/08/variational_rnn.svgz.ref @@ -0,0 +1 @@ +Figure 1 of "A Theoretically Grounded Application of Dropout in Recurrent Neural Networks", https://arxiv.org/abs/1512.05287.pdf diff --git a/slides/08/words_embeddings.ipe b/slides/08/words_embeddings.ipe new file mode 100644 index 0000000..a776b44 --- /dev/null +++ b/slides/08/words_embeddings.ipe @@ -0,0 +1,316 @@ + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Word in one-hot encoding + +72 696 m +72 632 l +152 632 l +152 696 l +h + + +72 696 m +72 632 l +152 632 l +152 696 l +h + +D +D_1 + +72 696 m +72 632 l +152 632 l +152 696 l +h + +D +D_2 + +72 696 m +72 632 l +152 632 l +152 696 l +h + +D +D_3 + +136 680 m +224 776 l + + +136 680 m +224 584 l + + +136 680 m +224 680 l + + +72 696 m +72 632 l +152 632 l +152 696 l +h + +V +D + +136 680 m +216 680 l + + + diff --git a/slides/08/words_embeddings.svgz b/slides/08/words_embeddings.svgz new file mode 100644 index 0000000..59ab2f4 Binary files /dev/null and b/slides/08/words_embeddings.svgz differ diff --git a/slides/08/words_embeddings.svgz.ref b/slides/08/words_embeddings.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/08/words_onehot.ipe b/slides/08/words_onehot.ipe new file mode 100644 index 0000000..6ab09a2 --- /dev/null +++ b/slides/08/words_onehot.ipe @@ -0,0 +1,303 @@ + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Word in one-hot encoding + +72 696 m +72 632 l +152 632 l +152 696 l +h + + +72 696 m +72 632 l +152 632 l +152 696 l +h + +V +D_1 + +72 696 m +72 632 l +152 632 l +152 696 l +h + +V +D_2 + +72 696 m +72 632 l +152 632 l +152 696 l +h + +V +D_3 + +136 680 m +224 776 l + + +136 680 m +224 584 l + + +136 680 m +224 680 l + + + diff --git a/slides/08/words_onehot.svgz b/slides/08/words_onehot.svgz new file mode 100644 index 0000000..f283837 Binary files /dev/null and b/slides/08/words_onehot.svgz differ diff --git a/slides/08/words_onehot.svgz.ref b/slides/08/words_onehot.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/09/09.md b/slides/09/09.md new file mode 100644 index 0000000..a7bf206 --- /dev/null +++ b/slides/09/09.md @@ -0,0 +1,496 @@ +title: NPFL138, Lecture 9 +class: title, langtech, cc-by-sa +style: .algorithm { background-color: #eee; padding: .5em } + +# Structured Prediction, CTC, Word2Vec + +## Milan Straka + +### April 15, 2024 + +--- +section: Span Labeling +class: middle, center +# Structured Prediction + +# Structured Prediction + +--- +# Structured Prediction + +Consider generating a sequence of $y_1, \ldots, y_N ∈ 𝓨^N$ given input +$β†’x_1, \ldots, β†’x_N$. + +~~~ +Predicting each sequence element independently models the distribution $P(y_i | ⇉X)$. + +![w=40%,h=center](labeling_independent.svgz) + +~~~ +However, there may be dependencies among the $y_i$ themselves, in the sense +that not all sequences of $y_i$ are valid; but when generating each $y_i$ +independently, the model might not be capable of generating only valid +sequences. + +--- +# Structured Prediction – Span Labeling + +Consider for example **named entity recognition**, whose goal is to locate +_named entities_, which are single words or sequences of multiple words +denoting real-world objects, concepts, and events. +~~~ +The most common types of named entities include: +- `PER`: _people_, including names of individuals, historical figures, and even + fictional characters; +~~~ +- `ORG`: _organizations_, incorporating companies, government agencies, + educational institutions, and others; +~~~ +- `LOC`: _locations_, encompassing countries, cities, geographical features, + addresses. + +~~~ +Compared to part-of-speech tagging, locating named entities is much more +challenging – named entity mentions are generally multi-word spans, and +arbitrary number of named entities can appear in a sentence (consequently, +we cannot use accuracy for evaluation; F1-score is commonly used). + +~~~ +Named entity recognition is an instance of a **span labeling** task, where +the goal is to locate and classify spans in the input sequence. + +--- +# Span Labeling – BIO Encoding + +A possible approach to a span labeling task is to classify every sequence +element using a specialized tag set. A common approach is to use the +**BIO** encoding, which consists of +~~~ +- `O`: _outside_, the given element is not part of any span; + +~~~ +- `B-PER`, `B-ORG`, `B-LOC`, …: _beginning_, the element is first in a new span; +~~~ +- `I-PER`, `I-ORG`, `I-LOC`, …: _inside_, a continuation element of an existing + span. + +~~~ +(Formally, the described scheme is IOB-2 format; there exists quite a few other +possibilities like IOB-1, IEO, BILOU, …) + +~~~ +The described encoding can represent any set of continuous typed spans (when no spans +overlap, i.e., a single element can belong to at most one span). + +--- +# Span Labeling – BIO Encoding + +However, when predicting each of the element tags independently, invalid +sequences might be created. + +~~~ +- We can decide to ignore it and heuristics capable of recovering the spans + from invalid sequences of BIO tags. + +~~~ +- We can employ a decoding algorithm producing the most probable **valid + sequence** of tags during prediction. +~~~ + - However, during training we do not consider the BIO tags validity. + +~~~ +- We might use a different loss enabling the model to consider only + valid BIO tag sequences also during training. + +--- +# Span Labeling – Decoding Algorithm + +Let $β†’x_1, \ldots, β†’x_N$ be an input sequence. + +Our goal is to produce an output sequence $y_1, …, y_N$, where each $y_t ∈ 𝓨$ +with $Y$ classes. + +~~~ +Assume we have a model predicting $p(y_t = k | ⇉X; β†’ΞΈ)$, a probability that the +$t$-th output element $y_t$ is the class $k$. + +~~~ +However, only some sequences $β†’y$ are valid. +~~~ +We now make an assumption that the validity of a sequence depends only on the +validity of **neighboring** output classes. In other words, if all neighboring +pairs of output elements are valid, the whole sequence is. + +~~~ +- The validity of neighboring pairs can be described by a transition matrix $⇉A + ∈ \{0, 1\}^{YΓ—Y}$. +~~~ +- Such an approach allows expressing the (in)validity of a BIO tag sequence. + +--- +# Span Labeling – Decoding Algorithm + +Let us denote $Ξ±_t(k)$ the log probability of the most probable output sequence +of $t$ elements with the last one being $k$. + +~~~ +We can compute $Ξ±_t(k)$ efficiently using dynamic programming. The core idea is +the following: + +![w=38%,h=center](crf_composability.svgz) + +~~~ +$$Ξ±_t(k) = \log p(y_t=k | ⇉X; β†’ΞΈ) + \max\nolimits_{j,\textrm{~such~that~}A_{j,k}\textrm{~is~valid}} Ξ±_{t-1}(j).$$ + +~~~ +If we consider $\log A_{j,k}$ to be $-∞$ when $A_{j,k}=0$, we can rewrite the above as +$$Ξ±_t(k) = \log p(y_t=k | ⇉X; β†’ΞΈ) + \max\nolimits_j \big(Ξ±_{t-1}(j) + \log A_{j,k}\big).$$ + +~~~ +The resulting algorithm is also called the **Viterbi algorithm**, and it is also +a search for the path of maximum length in an acyclic graph. + +--- +# Span Labeling – Decoding Algorithm + +
+ +**Inputs**: Input sequence of length $N$, tag set with $Y$ tags. +**Inputs**: Model computing $p(y_t = k | ⇉X; β†’ΞΈ)$, a probability that $y_t$ +should have the class $k$. +**Inputs**: Transition matrix $⇉A ∈ ℝ^{YΓ—Y}$ indicating _valid_ and _invalid_ +transitions. +**Outputs**: The most probable sequence $β†’y$ consisting of valid transitions +only. +**Time Complexity**: $π“ž(N β‹… Y^2)$ in the worst case. + +- For $t = 1, \ldots, N$: + - For $k = 1, \ldots, Y:$ + - $Ξ±_t(k) ← \log p(y_t=k | ⇉X; β†’ΞΈ)$Β Β _logits (unnormalized log probs) can also be used_ + - If $t > 1$: + - $Ξ²_t(k) ← \argmax\nolimits_{j,\textrm{~such~that~}A_{j,k}\textrm{~is~valid}} Ξ±_{t-1}(j)$ + - $Ξ±_t(k) ← Ξ±_t(k) + Ξ±_{t-1}\big(Ξ²_t(k)\big)$ +- The most probable sequence has the log probability $\max Ξ±_N$, and its + elements can be recovered by traversing $Ξ²$ from $t=N$ downto $t=1$. +
+ +--- +# Span Labeling – Other Approaches + +With deep learning models, constrained decoding is usually sufficient to deliver +high performance. + +~~~ +Historically, there have been also other approaches: + +~~~ +- **Maximum Entropy Markov Models** + + We might model the dependencies by explicitly conditioning on the previous + label: + $$P(y_i | ⇉X, y_{i-1}).$$ + +~~~ + Then, each label is predicted by a softmax from a hidden state and a + _previous label_. + ![w=35%,h=center](labeling_memm.svgz) + +~~~ + The decoding can still be performed by a dynamic programming algorithm. + +--- +# Span Labeling – Other Approaches + +- **Conditional Random Fields (CRF)** + + In the simplest variant, Linear-chain CRF, usually abbreviated only to CRF, + can be considered an extension of softmax – instead of a sequence of + independent softmaxes, it is a sentence-level softmax, with additional weights + for neighboring sequence elements. + +~~~ + We start by defining a score of a label sequence $β†’y$ as + $$s(⇉X, β†’y; β†’ΞΈ, ⇉A) = f(y_1 | ⇉X; β†’ΞΈ) + βˆ‘\nolimits_{i=2}^N \big(⇉A_{y_{i-1}, y_i} + f(y_i | ⇉X; β†’ΞΈ)\big),$$ +~~~ + and define the probability of a label sequence $β†’y$ using $\softmax$: + $$p(β†’y | ⇉X) = \softmax_{β†’z ∈ Y^N}\big(s(⇉X, β†’z)\big)_{β†’y}.$$ + +~~~ + The probability $\log p(β†’y_\textrm{gold} | ⇉X)$ can be efficiently computed + using dynamic programming in a differentiable way, so it can be used in NLL + computation. + +~~~ + For more details, see [Lecture 8 of NPFL114 2022/23 slides](https://ufal.mff.cuni.cz/~straka/courses/npfl114/2223/slides/?08). + +--- +section: CTC +# Connectionist Temporal Classification + +Let us again consider generating a sequence of $y_1, \ldots, y_M$ given input +$β†’x_1, \ldots, β†’x_N$, but this time $M ≀ N$, and there is no explicit alignment +of $β†’x$ and $y$ in the gold data. + +~~~ +![w=100%,mh=90%,v=middle](ctc_example.svgz) + +--- +# Connectionist Temporal Classification + +We enlarge the set of the output labels by a – (**blank**), and perform a classification for every +input element to produce an **extended labeling** (in contrast to the original **regular labeling**). +We then post-process it by the following rules (denoted as $𝓑$): +1. We collapse multiple neighboring occurrences of the same symbol into one. +2. We remove the blank –. + +~~~ +Because the explicit alignment of inputs and labels is not known, we consider +_all possible_ alignments. + +~~~ +Denoting the probability of label $l$ at time $t$ as $p_l^t$, we define +$$Ξ±^t(s) ≝ βˆ‘_{\substack{\textrm{extended}\\\textrm{labelings~}β†’Ο€:\\𝓑(β†’Ο€_{1:t}) = β†’y_{1:s}}} ∏_{i=1}^t p_{Ο€_i}^i.$$ + +--- +# Connectionist Temporal Classification + +## Computation + +When aligning an extended labeling to a regular one, we need to consider +whether the extended labeling ends by a _blank_ or not. We therefore define +$$\begin{aligned} + Ξ±_-^t(s) &≝ βˆ‘_{\substack{\textrm{extended}\\\textrm{labelings~}β†’Ο€:\\𝓑(β†’Ο€_{1:t}) = β†’y_{1:s}, Ο€_t=-}} ∏_{i=1}^t p_{Ο€_i}^i \\ + Ξ±_*^t(s) &≝ βˆ‘_{\substack{\textrm{extended}\\\textrm{labelings~}β†’Ο€:\\𝓑(β†’Ο€_{1:t}) = β†’y_{1:s}, Ο€_tβ‰ -}} ∏_{i=1}^t p_{Ο€_i}^i + +\end{aligned}$$ +and compute $Ξ±^t(s)$ as $Ξ±_-^t(s) + Ξ±_*^t(s)$. + +--- +# Connectionist Temporal Classification + +## Computation – Initialization + +![w=35%,f=right](ctc_computation.svgz) + +We initialize $Ξ±^1$ as follows: +- $Ξ±_-^1(0) ← p_-^1$ +- $Ξ±_*^1(1) ← p_{y_1}^1$ +- all other $Ξ±^1$ to zeros + +~~~ +## Computation – Induction Step + +We then proceed recurrently according to: +- $Ξ±_-^t(s) ← p_-^t \big(Ξ±_*^{t-1}(s) + Ξ±_-^{t-1}(s)\big)$ + +~~~ +- $Ξ±_*^t(s) ← \begin{cases} + p_{y_s}^t\big(Ξ±_*^{t-1}(s) + Ξ±_-^{t-1}(s-1) + Ξ±_*^{t-1}(s-1)\big)\textrm{, if }y_sβ‰ y_{s-1}\\ + p_{y_s}^t\big(Ξ±_*^{t-1}(s) + Ξ±_-^{t-1}(s-1) + \sout{Ξ±_*^{t-1}(s-1)}\big)\textrm{, if }y_s=y_{s-1}\\ +\end{cases}$ + +~~~ + We can write the update as $p_{y_s}^t\big(Ξ±_*^{t-1}(s) + Ξ±_-^{t-1}(s-1) + [y_sβ‰ y_{s-1}] β‹… Ξ±_*^{t-1}(s-1)\big)$. + +--- +section: CTCDecoding +# CTC Decoding + +Unlike BIO-tag structured prediction, nobody knows how to perform CTC decoding +optimally in polynomial time. + +~~~ +The key observation is that while an optimal extended labeling can be extended +into an optimal labeling of a greater length, the same does not apply to +a regular labeling. The problem is that regular labeling corresponds to many +extended labelings, which are modified each in a different way during an +extension of the regular labeling. + +~~~ +![w=75%,h=center](ctc_decoding.svgz) + +--- +# CTC Decoding + +## Beam Search + +~~~ +To perform a beam search, we keep $k$ best **regular** (non-extended) labelings. +Specifically, for each regular labeling $β†’y$ we keep both $Ξ±^t_-(β†’y)$ and +$Ξ±^t_*(β†’y)$, which are probabilities of all (modulo beam search) extended +labelings of length $t$ which produce the regular labeling $β†’y$; we therefore +keep $k$ regular labelings with the highest $Ξ±^t_-(β†’y) + Ξ±^t_*(β†’y)$. + +~~~ +To compute the best regular labelings for a longer prefix of extended labelings, +for each regular labeling in the beam we consider the following cases: +~~~ +- adding a _blank_ symbol, i.e., contributing to $Ξ±^{t+1}_-(β†’y)$ both from + $Ξ±^t_-(β†’y)$ and $Ξ±^t_*(β†’y)$; +~~~ +- adding a non-blank symbol, i.e., contributing to $Ξ±^{t+1}_*(β€’)$ from + $Ξ±^t_-(β†’y)$ and contributing to a possibly different $Ξ±^{t+1}_*(β€’)$ from + $Ξ±^t_*(β†’y)$. + +~~~ +Finally, we merge the resulting candidates according to their regular labeling, and +keep only the $k$ best. + +--- +section: Word2Vec +# Unsupervised Word Embeddings + +The embeddings can be trained for each task separately. + +~~~ + +However, a method of precomputing word embeddings have been proposed, based on +_distributional hypothesis_: + +> **Words that are used in the same contexts tend to have similar meanings**. + +~~~ +The distributional hypothesis is usually attributed to Firth (1957): +> _You shall know a word by a company it keeps._ + +--- +# Word2Vec + +![w=70%,h=center](word2vec.svgz) + +Mikolov et al. (2013) proposed two very simple architectures for precomputing +word embeddings, together with a C multi-threaded implementation `word2vec`. + +--- +# Word2Vec + +![w=100%](word2vec_composability.svgz) + +--- +# Word2Vec – SkipGram Model + +![w=50%,h=center,mh=64%](word2vec.svgz) + +Considering input word $w_i$ and output $w_o$, the Skip-gram model defines +$$p(w_o | w_i) ≝ \frac{e^{⇉V_{w_i}^\top ⇉W_{w_o}}}{βˆ‘_w e^{⇉V_{w_i}^\top ⇉W_w}}.$$ +After training, the final embeddings are the rows of the $⇉V$ matrix. + +--- +# Word2Vec – Hierarchical Softmax + +Instead of a large softmax, we construct a binary tree over the words, with +a sigmoid classifier for each node. + +If word $w$ corresponds to a path $n_1, n_2, \ldots, n_L$, we define +$$p_\textrm{HS}(w | w_i) ≝ ∏_{j=1}^{L-1} Οƒ(\textrm{[+1 if }n_{j+1}\textrm{ is right child else -1]} β‹… ⇉V_{w_i}^\top ⇉W_{n_j}).$$ + +--- +# Word2Vec – Negative Sampling + +Instead of a large softmax, we could train individual sigmoids for all words. + +~~~ +We could also only sample several _negative examples_. This gives rise to the +following _negative sampling_ objective (instead of just summing all the +sigmoidal losses): +$$l_\textrm{NEG}(w_o, w_i) ≝ -\log Οƒ(⇉V_{w_i}^\top ⇉W_{w_o}) - βˆ‘_{j=1}^k 𝔼_{w_j ∼ P(w)} \log \big(1 - Οƒ(⇉V_{w_i}^\top ⇉W_{w_j})\big).$$ + +~~~ +The usual value of negative samples $k$ is 5, but it can be even 2 for extremely +large corpora. + +~~~ +Each expectation in the loss is estimated using a single sample. + +~~~ +For $P(w)$, both uniform and unigram distribution $U(w)$ work, but +$$U(w)^{3/4}$$ +outperforms them significantly (this fact has been reported in several papers by +different authors). + +--- +section: CLEs +# Recurrent Character-level WEs + +![w=80%,h=center](../08/cle_rnn_examples.svgz) + +--- +# Convolutional Character-level WEs + +![w=100%](../08/cle_cnn_examples.svgz) + +--- +section: Subword Embeddings +# Character N-grams + +Another simple idea appeared simultaneously in three nearly simultaneous +publications as [Charagram](https://arxiv.org/abs/1607.02789), [Subword Information](https://arxiv.org/abs/1607.04606) or [SubGram](http://link.springer.com/chapter/10.1007/978-3-319-45510-5_21). + +A word embedding is a sum of the word embedding plus embeddings of its character +_n_-grams. Such embedding can be pretrained using same algorithms as `word2vec`. + +~~~ +The implementation can be +- dictionary based: only some number of frequent character _n_-grams is kept; +~~~ +- hash-based: character _n_-grams are hashed into $K$ buckets + (usually $K ∼ 10^6$ is used). + +--- +# Charagram WEs + +![w=100%,v=middle](cle_charagram_examples.svgz) + +--- +# Charagram WEs + +![w=48%,h=center](cle_charagram_ngrams.svgz) + +--- +# FastText + +The word2vec enriched with subword embeddings is implemented in publicly +available `fastText` library https://fasttext.cc/. + +~~~ +Pre-trained embeddings for 157 languages (including Czech) trained on +Wikipedia and CommonCrawl are also available at +https://fasttext.cc/docs/en/crawl-vectors.html. + +--- +section: ELMo +# ELMo + +At the end of 2017, a new type of _deep contextualized_ word representations was +proposed by Peters et al., called ELMo, **E**mbeddings from **L**anguage +**Mo**dels. + +~~~ +The ELMo embeddings were based on a two-layer pre-trained LSTM language model, +where a language model predicts following word based on a sentence prefix. +~~~ +Specifically, two such models were used, one for the forward direction and the +other one for the backward direction. +~~~ + +![w=30%](elmo_language_model.png)![w=68%](elmo_bidirectional.png) + +--- +# ELMo + +To compute an embedding of a word in a sentence, the concatenation of the two +language model's hidden states is used. + +![w=68%,h=center](elmo_embedding.png) + +~~~ +To be exact, the authors propose to take a (trainable) weighted combination of +the input embeddings and outputs on the first and second LSTM layers. + +--- +# ELMo Results + +Pre-trained ELMo embeddings substantially improved several NLP tasks. + +![w=100%](elmo_results.svgz) + diff --git a/slides/09/cle_charagram_examples.svgz b/slides/09/cle_charagram_examples.svgz new file mode 100644 index 0000000..c38515c Binary files /dev/null and b/slides/09/cle_charagram_examples.svgz differ diff --git a/slides/09/cle_charagram_examples.svgz.ref b/slides/09/cle_charagram_examples.svgz.ref new file mode 100644 index 0000000..c7dc8bd --- /dev/null +++ b/slides/09/cle_charagram_examples.svgz.ref @@ -0,0 +1 @@ +Table 7 of "Enriching Word Vectors with Subword Information", https://arxiv.org/abs/1607.04606 diff --git a/slides/09/cle_charagram_ngrams.svgz b/slides/09/cle_charagram_ngrams.svgz new file mode 100644 index 0000000..e47c49a Binary files /dev/null and b/slides/09/cle_charagram_ngrams.svgz differ diff --git a/slides/09/cle_charagram_ngrams.svgz.ref b/slides/09/cle_charagram_ngrams.svgz.ref new file mode 100644 index 0000000..534f5a7 --- /dev/null +++ b/slides/09/cle_charagram_ngrams.svgz.ref @@ -0,0 +1 @@ +Figure 2 of "Enriching Word Vectors with Subword Information", https://arxiv.org/abs/1607.04606 diff --git a/slides/09/crf_composability.ipe b/slides/09/crf_composability.ipe new file mode 100644 index 0000000..0ad99fb --- /dev/null +++ b/slides/09/crf_composability.ipe @@ -0,0 +1,279 @@ + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +j + +264 728 m +264 704 l +284 704 l +284 728 l +h + + +264 728 m +184 728 l +184 704 l +264 704 l + +k + +264 728 m +264 704 l +284 704 l +284 728 l +h + +t-1 + + diff --git a/slides/09/crf_composability.svgz b/slides/09/crf_composability.svgz new file mode 100644 index 0000000..fa386b3 Binary files /dev/null and b/slides/09/crf_composability.svgz differ diff --git a/slides/09/crf_composability.svgz.ref b/slides/09/crf_composability.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/09/ctc_computation.svgz b/slides/09/ctc_computation.svgz new file mode 100644 index 0000000..e01d84d Binary files /dev/null and b/slides/09/ctc_computation.svgz differ diff --git a/slides/09/ctc_computation.svgz.ref b/slides/09/ctc_computation.svgz.ref new file mode 100644 index 0000000..34bef71 --- /dev/null +++ b/slides/09/ctc_computation.svgz.ref @@ -0,0 +1 @@ +Figure 7.3 of "Supervised Sequence Labelling with Recurrent Neural Networks" dissertation by Alex Graves diff --git a/slides/09/ctc_decoding.svgz b/slides/09/ctc_decoding.svgz new file mode 100644 index 0000000..ed3162a Binary files /dev/null and b/slides/09/ctc_decoding.svgz differ diff --git a/slides/09/ctc_decoding.svgz.ref b/slides/09/ctc_decoding.svgz.ref new file mode 100644 index 0000000..6ff181c --- /dev/null +++ b/slides/09/ctc_decoding.svgz.ref @@ -0,0 +1 @@ +Figure 7.5 of "Supervised Sequence Labelling with Recurrent Neural Networks" dissertation by Alex Graves diff --git a/slides/09/ctc_example.svgz b/slides/09/ctc_example.svgz new file mode 100644 index 0000000..fbc207e Binary files /dev/null and b/slides/09/ctc_example.svgz differ diff --git a/slides/09/ctc_example.svgz.ref b/slides/09/ctc_example.svgz.ref new file mode 100644 index 0000000..28ea3d5 --- /dev/null +++ b/slides/09/ctc_example.svgz.ref @@ -0,0 +1 @@ +Figure 7.1 of "Supervised Sequence Labelling with Recurrent Neural Networks" dissertation by Alex Graves diff --git a/slides/09/elmo_bidirectional.png b/slides/09/elmo_bidirectional.png new file mode 100644 index 0000000..a0cff36 Binary files /dev/null and b/slides/09/elmo_bidirectional.png differ diff --git a/slides/09/elmo_bidirectional.png.ref b/slides/09/elmo_bidirectional.png.ref new file mode 100644 index 0000000..78cce7a --- /dev/null +++ b/slides/09/elmo_bidirectional.png.ref @@ -0,0 +1 @@ +http://jalammar.github.io/images/elmo-forward-backward-language-model-embedding.png diff --git a/slides/09/elmo_embedding.png b/slides/09/elmo_embedding.png new file mode 100644 index 0000000..8b5c9d5 Binary files /dev/null and b/slides/09/elmo_embedding.png differ diff --git a/slides/09/elmo_embedding.png.ref b/slides/09/elmo_embedding.png.ref new file mode 100644 index 0000000..dd9c385 --- /dev/null +++ b/slides/09/elmo_embedding.png.ref @@ -0,0 +1 @@ +http://jalammar.github.io/images/elmo-embedding.png diff --git a/slides/09/elmo_language_model.png b/slides/09/elmo_language_model.png new file mode 100644 index 0000000..8090b9e Binary files /dev/null and b/slides/09/elmo_language_model.png differ diff --git a/slides/09/elmo_language_model.png.ref b/slides/09/elmo_language_model.png.ref new file mode 100644 index 0000000..180b431 --- /dev/null +++ b/slides/09/elmo_language_model.png.ref @@ -0,0 +1 @@ +http://jalammar.github.io/images/Bert-language-modeling.png diff --git a/slides/09/elmo_results.svgz b/slides/09/elmo_results.svgz new file mode 100644 index 0000000..e363363 Binary files /dev/null and b/slides/09/elmo_results.svgz differ diff --git a/slides/09/elmo_results.svgz.ref b/slides/09/elmo_results.svgz.ref new file mode 100644 index 0000000..226a9ba --- /dev/null +++ b/slides/09/elmo_results.svgz.ref @@ -0,0 +1 @@ +Table 1 of "Deep contextualized word representations", https://arxiv.org/abs/1802.05365 diff --git a/slides/09/labeling_independent.ipe b/slides/09/labeling_independent.ipe new file mode 100644 index 0000000..1b3f2af --- /dev/null +++ b/slides/09/labeling_independent.ipe @@ -0,0 +1,292 @@ + + +\usepackage{bm} + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +\bm x_1 + +132 728 m +132 680 l + +y_1 +\bm x_2 + +132 728 m +132 680 l + +y_2 +\bm x_3 + +132 728 m +132 680 l + +y_3 +\cdots +\cdots +\cdots +\bm x_N + +132 728 m +132 680 l + +y_N + +120 752 m +120 656 l +312 656 l +312 752 l +h + + + diff --git a/slides/09/labeling_independent.svgz b/slides/09/labeling_independent.svgz new file mode 100644 index 0000000..0983a44 Binary files /dev/null and b/slides/09/labeling_independent.svgz differ diff --git a/slides/09/labeling_independent.svgz.ref b/slides/09/labeling_independent.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/09/labeling_memm.ipe b/slides/09/labeling_memm.ipe new file mode 100644 index 0000000..4fbb212 --- /dev/null +++ b/slides/09/labeling_memm.ipe @@ -0,0 +1,312 @@ + + +\usepackage{bm} + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + +0.6 0 0 0.6 0 0 e + + + + + +0.5 0 0 0.5 0 0 e + + +0.6 0 0 0.6 0 0 e +0.4 0 0 0.4 0 0 e + + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h + + + + + +-0.5 -0.5 m +0.5 -0.5 l +0.5 0.5 l +-0.5 0.5 l +h + + +-0.6 -0.6 m +0.6 -0.6 l +0.6 0.6 l +-0.6 0.6 l +h +-0.4 -0.4 m +0.4 -0.4 l +0.4 0.4 l +-0.4 0.4 l +h + + + + + + +-0.43 -0.57 m +0.57 0.43 l +0.43 0.57 l +-0.57 -0.43 l +h + + +-0.43 0.57 m +0.57 -0.43 l +0.43 -0.57 l +-0.57 0.43 l +h + + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-0.8 0 l +-1 -0.333 l +h + + + + +-1 0.333 m +0 0 l +-1 -0.333 l + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + +0 0 m +-1 0.333 l +-1 -0.333 l +h +-1 0 m +-2 0.333 l +-2 -0.333 l +h + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +\bm x_1 + +132 728 m +132 680 l + +y_1 +\bm x_2 + +132 728 m +132 680 l + +y_2 +\bm x_3 + +132 728 m +132 680 l + +y_3 +\cdots +\cdots +\cdots +\bm x_N + +132 728 m +132 680 l + +y_N + +132 680 m +180 680 l + + +132 680 m +180 680 l + + +228 680 m +248 680 l + + +272 680 m +292 680 l + + +248 680 m +272 680 l + + +120 752 m +120 656 l +312 656 l +312 752 l +h + + + diff --git a/slides/09/labeling_memm.svgz b/slides/09/labeling_memm.svgz new file mode 100644 index 0000000..4d8435c Binary files /dev/null and b/slides/09/labeling_memm.svgz differ diff --git a/slides/09/labeling_memm.svgz.ref b/slides/09/labeling_memm.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/09/word2vec.svgz b/slides/09/word2vec.svgz new file mode 100644 index 0000000..a1bb3bb Binary files /dev/null and b/slides/09/word2vec.svgz differ diff --git a/slides/09/word2vec.svgz.ref b/slides/09/word2vec.svgz.ref new file mode 100644 index 0000000..e69de29 diff --git a/slides/09/word2vec_composability.svgz b/slides/09/word2vec_composability.svgz new file mode 100644 index 0000000..e060d02 Binary files /dev/null and b/slides/09/word2vec_composability.svgz differ diff --git a/slides/09/word2vec_composability.svgz.ref b/slides/09/word2vec_composability.svgz.ref new file mode 100644 index 0000000..30785b2 --- /dev/null +++ b/slides/09/word2vec_composability.svgz.ref @@ -0,0 +1 @@ +Table 8 of "Efficient Estimation of Word Representations in Vector Space", https://arxiv.org/abs/1301.3781 diff --git a/tasks/3d_recognition.md b/tasks/3d_recognition.md new file mode 100644 index 0000000..eb3c23b --- /dev/null +++ b/tasks/3d_recognition.md @@ -0,0 +1,28 @@ +### Assignment: 3d_recognition +#### Date: Deadline: Apr 16, 22:00 +#### Points: 3 points+4 bonus + +Your goal in this assignment is to perform 3D object recognition. The input +is voxelized representation of an object, stored as a _3D grid_ of either empty +or occupied _voxels_, and your goal is to classify the object into one of +10 classes. The data is available in two resolutions, either as +[20Γ—20Γ—20 data](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/modelnet20.html) +or [32Γ—32Γ—32 data](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/modelnet32.html). +To load the dataset, use the +[modelnet.py](https://github.com/ufal/npfl138/tree/master/labs/07/modelnet.py) module. + +The official dataset offers only train and test sets, with the **test set having +a different distributions of labels**. Our dataset contains also a development +set, which has **nearly the same** label distribution as the test set. + +If you want, it is possible to use any model from `keras.applications` in +this assignment; however, the only way I know how to utilize such a pre-trained +model is to render the objects to a set of 2D images and classify them instead. + +The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution +achieving at least _88%_ test set accuracy gets 3 points; the remaining +4 bonus points are distributed depending on relative ordering of your solutions. + +You can start with the +[3d_recognition.py](https://github.com/ufal/npfl138/tree/master/labs/07/3d_recognition.py) +template, which among others generates test set annotations in the required format. diff --git a/tasks/bboxes_utils.md b/tasks/bboxes_utils.md new file mode 100644 index 0000000..64c58c8 --- /dev/null +++ b/tasks/bboxes_utils.md @@ -0,0 +1,26 @@ +### Assignment: bboxes_utils +#### Date: Deadline: Apr 09, 22:00 +#### Points: 2 points + +This is a preparatory assignment for `svhn_competition`. The goal is to +implement several bounding box manipulation routines in the +[bboxes_utils.py](https://github.com/ufal/npfl138/tree/master/labs/06/bboxes_utils.py) +module. Notably, you need to implement the following methods: +- `bboxes_to_rcnn`: convert given bounding boxes to a R-CNN-like + representation relative to the given anchors; +- `bboxes_from_rcnn`: convert R-CNN-like representations relative to + given anchors back to bounding boxes; +- `bboxes_training`: given a list of anchors and gold objects, assign gold + objects to anchors and generate suitable training data (the exact algorithm + is described in the template). + +The [bboxes_utils.py](https://github.com/ufal/npfl138/tree/master/labs/06/bboxes_utils.py) +contains simple unit tests, which are evaluated when executing the module, +which you can use to check the validity of your implementation. Note that +the template does not contain type annotations because Python typing system is +not flexible enough to describe the tensor shape changes. + +When submitting to ReCodEx, the method `main` is executed, returning the +implemented `bboxes_to_rcnn`, `bboxes_from_rcnn` and `bboxes_training` +methods. These methods are then executed and compared to the reference +implementation. diff --git a/tasks/cags_classification.md b/tasks/cags_classification.md index eedad3a..42009a4 100644 --- a/tasks/cags_classification.md +++ b/tasks/cags_classification.md @@ -31,8 +31,8 @@ estimates on the batch) or in inference regime. There is one exception though inference regime even when `training == True`._ The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution -which achieves at least _93%_ test set accuracy will get 4 points; the rest -5 points will be distributed depending on relative ordering of your solutions. +achieving at least _93%_ test set accuracy gets 4 points; the remaining +5 bonus points are distributed depending on relative ordering of your solutions. You may want to start with the [cags_classification.py](https://github.com/ufal/npfl138/tree/master/labs/05/cags_classification.py) diff --git a/tasks/cags_segmentation.md b/tasks/cags_segmentation.md index 7669d70..d9677e5 100644 --- a/tasks/cags_segmentation.md +++ b/tasks/cags_segmentation.md @@ -18,8 +18,8 @@ module, which can also evaluate your predictions (either by running with `evaluate_segmentation_file` method). The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution -which achieves at least _87%_ test set IoU gets 4 points; the rest -5 points will be distributed depending on relative ordering of your solutions. +achieving at least _87%_ test set IoU gets 4 points; the remaining +5 bonus points are distributed depending on relative ordering of your solutions. You may want to start with the [cags_segmentation.py](https://github.com/ufal/npfl138/tree/master/labs/05/cags_segmentation.py) diff --git a/tasks/cifar_competition.md b/tasks/cifar_competition.md index ae455fb..9505c18 100644 --- a/tasks/cifar_competition.md +++ b/tasks/cifar_competition.md @@ -8,8 +8,9 @@ You can load the data using the module. Note that the test set is different than that of official CIFAR-10. The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits a solution -which achieves at least _70%_ test set accuracy will get 4 points; the rest -5 points will be distributed depending on relative ordering of your solutions. +achieving at least _70%_ test set accuracy gets 4 points; the remaining +5 bonus points are distributed depending on relative ordering of your solutions. + Note that my solutions usually need to achieve around ~85% on the development set to score 70% on the test set. diff --git a/tasks/cnn_manual.md b/tasks/cnn_manual.md index 6d1483a..06f3a83 100644 --- a/tasks/cnn_manual.md +++ b/tasks/cnn_manual.md @@ -12,9 +12,9 @@ activation and `valid` padding, specified in the `args.cnn` option. The `args.cnn` contains comma-separated layer specifications in the format `filters-kernel_size-stride`. -Of course, you cannot use any TensorFlow convolutional operation (instead, +Of course, you cannot use any PyTorch convolutional operation (instead, implement the forward and backward pass using matrix multiplication and other -operations), nor the `tf.GradientTape` for gradient computation. +operations), nor the `.backward()` for gradient computation. To make debugging easier, the template supports a `--verify` option, which allows comparing the forward pass and the three gradients you compute in the diff --git a/tasks/ctc_loss.md b/tasks/ctc_loss.md new file mode 100644 index 0000000..a34cb03 --- /dev/null +++ b/tasks/ctc_loss.md @@ -0,0 +1,10 @@ +### Assignment: ctc_loss +#### Date: Deadline: Apr 30, 22:00 +#### Points: 2 points + +**The template is being finalized, final version will be released shortly.** + +This assignment is an extension of `tagger_we` task. Using the +`ctc_loss.py` template, manually implement the CTC loss computation +and also greedy CTC decoding. + diff --git a/tasks/mnist_ensemble.md b/tasks/mnist_ensemble.md index 9deb3d5..7bee385 100644 --- a/tasks/mnist_ensemble.md +++ b/tasks/mnist_ensemble.md @@ -8,7 +8,7 @@ Your goal in this assignment is to implement model ensembling. The [mnist_ensemble.py](https://github.com/ufal/npfl138/tree/master/labs/03/mnist_ensemble.py) template trains `args.models` individual models, and your goal is to perform an ensemble of the first model, first two models, first three models, …, all -models, and evaluate their accuracy on the test set. +models, and evaluate their accuracy on the development set. #### Tests Start: mnist_ensemble_tests _Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._ diff --git a/tasks/sequence_classification.md b/tasks/sequence_classification.md new file mode 100644 index 0000000..96ce931 --- /dev/null +++ b/tasks/sequence_classification.md @@ -0,0 +1,127 @@ +### Assignment: sequence_classification +#### Date: Deadline: Apr 23, 22:00 +#### Points: 2 points +#### Tests: sequence_classification_tests +#### Examples: sequence_classification_examples + +The goal of this assignment is to introduce recurrent neural networks, show +their convergence speed, and illustrate exploding gradient issue. The network +should process sequences of 50 small integers and compute parity for each prefix +of the sequence. The inputs are either 0/1, or vectors with one-hot +representation of small integer. + +Your goal is to modify the +[sequence_classification.py](https://github.com/ufal/npfl138/tree/master/labs/08/sequence_classification.py) +template and implement the following: +- Use the specified RNN type (`SimpleRNN`, `GRU`, and `LSTM`) and dimensionality. +- Process the sequence using the required RNN. +- Use additional hidden layer on the RNN outputs if requested. +- Implement gradient clipping if requested. + +In addition to submitting the task in ReCodEx, please also run the following +variations and observe the results in TensorBoard. +Concentrate on the way how the RNNs converge, convergence speed, exploding +gradient issues and how gradient clipping helps: +- `--rnn=SimpleRNN --sequence_dim=1`, `--rnn=GRU --sequence_dim=1`, `--rnn=LSTM --sequence_dim=1` +- the same as above but with `--sequence_dim=3` +- the same as above but with `--sequence_dim=10` +- `--rnn=SimpleRNN --hidden_layer=85 --rnn_dim=30 --sequence_dim=30` and the same with `--clip_gradient=1` +- the same as above but with `--rnn=GRU` with and without `--clip_gradient=1` +- the same as above but with `--rnn=LSTM` with and without `--clip_gradient=1` + +#### Tests Start: sequence_classification_tests +_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._ + +1. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=SimpleRNN --epochs=5` +``` +Epoch 1/5 accuracy: 0.4854 - loss: 0.7253 - val_accuracy: 0.5092 - val_loss: 0.6971 +Epoch 2/5 accuracy: 0.5101 - loss: 0.6944 - val_accuracy: 0.4990 - val_loss: 0.6914 +Epoch 3/5 accuracy: 0.5000 - loss: 0.6904 - val_accuracy: 0.5198 - val_loss: 0.6892 +Epoch 4/5 accuracy: 0.5200 - loss: 0.6887 - val_accuracy: 0.5328 - val_loss: 0.6875 +Epoch 5/5 accuracy: 0.5326 - loss: 0.6869 - val_accuracy: 0.5362 - val_loss: 0.6857 +``` + +2. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=GRU --epochs=5` +``` +Epoch 1/5 accuracy: 0.5277 - loss: 0.6925 - val_accuracy: 0.5217 - val_loss: 0.6921 +Epoch 2/5 accuracy: 0.5183 - loss: 0.6921 - val_accuracy: 0.5217 - val_loss: 0.6918 +Epoch 3/5 accuracy: 0.5185 - loss: 0.6919 - val_accuracy: 0.5217 - val_loss: 0.6914 +Epoch 4/5 accuracy: 0.5212 - loss: 0.6914 - val_accuracy: 0.5282 - val_loss: 0.6910 +Epoch 5/5 accuracy: 0.5320 - loss: 0.6904 - val_accuracy: 0.5355 - val_loss: 0.6905 +``` + +3. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5` +``` +Epoch 1/5 accuracy: 0.5359 - loss: 0.6926 - val_accuracy: 0.5361 - val_loss: 0.6925 +Epoch 2/5 accuracy: 0.5358 - loss: 0.6925 - val_accuracy: 0.5333 - val_loss: 0.6923 +Epoch 3/5 accuracy: 0.5370 - loss: 0.6923 - val_accuracy: 0.5369 - val_loss: 0.6920 +Epoch 4/5 accuracy: 0.5342 - loss: 0.6919 - val_accuracy: 0.5366 - val_loss: 0.6917 +Epoch 5/5 accuracy: 0.5378 - loss: 0.6915 - val_accuracy: 0.5444 - val_loss: 0.6914 +``` + +4. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5 --hidden_layer=50` +``` +Epoch 1/5 accuracy: 0.5377 - loss: 0.6923 - val_accuracy: 0.5414 - val_loss: 0.6911 +Epoch 2/5 accuracy: 0.5465 - loss: 0.6902 - val_accuracy: 0.5577 - val_loss: 0.6878 +Epoch 3/5 accuracy: 0.5600 - loss: 0.6862 - val_accuracy: 0.5450 - val_loss: 0.6811 +Epoch 4/5 accuracy: 0.5491 - loss: 0.6783 - val_accuracy: 0.5590 - val_loss: 0.6707 +Epoch 5/5 accuracy: 0.5539 - loss: 0.6678 - val_accuracy: 0.5433 - val_loss: 0.6591 +``` + +5. `python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5 --hidden_layer=50 --clip_gradient=0.01` +``` +Epoch 1/5 accuracy: 0.5421 - loss: 0.6923 - val_accuracy: 0.5409 - val_loss: 0.6910 +Epoch 2/5 accuracy: 0.5504 - loss: 0.6900 - val_accuracy: 0.5511 - val_loss: 0.6875 +Epoch 3/5 accuracy: 0.5566 - loss: 0.6860 - val_accuracy: 0.5494 - val_loss: 0.6816 +Epoch 4/5 accuracy: 0.5504 - loss: 0.6788 - val_accuracy: 0.5398 - val_loss: 0.6721 +Epoch 5/5 accuracy: 0.5539 - loss: 0.6699 - val_accuracy: 0.5494 - val_loss: 0.6624 +``` +#### Tests End: +#### Examples Start: sequence_classification_examples +_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._ + +- `python3 sequence_classification.py --rnn=SimpleRNN --epochs=5` +``` +Epoch 1/5 accuracy: 0.4984 - loss: 0.7004 - val_accuracy: 0.5223 - val_loss: 0.6884 +Epoch 2/5 accuracy: 0.5198 - loss: 0.6862 - val_accuracy: 0.5117 - val_loss: 0.6794 +Epoch 3/5 accuracy: 0.5132 - loss: 0.6784 - val_accuracy: 0.5121 - val_loss: 0.6732 +Epoch 4/5 accuracy: 0.5160 - loss: 0.6723 - val_accuracy: 0.5191 - val_loss: 0.6683 +Epoch 5/5 accuracy: 0.5235 - loss: 0.6680 - val_accuracy: 0.5276 - val_loss: 0.6639 +``` + +- `python3 sequence_classification.py --rnn=GRU --epochs=5` +``` +Epoch 1/5 accuracy: 0.5109 - loss: 0.6929 - val_accuracy: 0.5128 - val_loss: 0.6915 +Epoch 2/5 accuracy: 0.5174 - loss: 0.6894 - val_accuracy: 0.5155 - val_loss: 0.6785 +Epoch 3/5 accuracy: 0.5446 - loss: 0.6630 - val_accuracy: 0.9538 - val_loss: 0.2142 +Epoch 4/5 accuracy: 0.9812 - loss: 0.1270 - val_accuracy: 0.9987 - val_loss: 0.0304 +Epoch 5/5 accuracy: 0.9985 - loss: 0.0270 - val_accuracy: 0.9995 - val_loss: 0.0135 +``` + +- `python3 sequence_classification.py --rnn=LSTM --epochs=5` +``` +Epoch 1/5 accuracy: 0.5131 - loss: 0.6930 - val_accuracy: 0.5187 - val_loss: 0.6918 +Epoch 2/5 accuracy: 0.5187 - loss: 0.6892 - val_accuracy: 0.5340 - val_loss: 0.6760 +Epoch 3/5 accuracy: 0.6401 - loss: 0.5744 - val_accuracy: 1.0000 - val_loss: 0.0845 +Epoch 4/5 accuracy: 1.0000 - loss: 0.0585 - val_accuracy: 1.0000 - val_loss: 0.0194 +Epoch 5/5 accuracy: 1.0000 - loss: 0.0154 - val_accuracy: 1.0000 - val_loss: 0.0082 +``` + +- `python3 sequence_classification.py --rnn=LSTM --epochs=5 --hidden_layer=85` +``` +Epoch 1/5 accuracy: 0.5151 - loss: 0.6888 - val_accuracy: 0.5323 - val_loss: 0.6571 +Epoch 2/5 accuracy: 0.5387 - loss: 0.6497 - val_accuracy: 0.5575 - val_loss: 0.6321 +Epoch 3/5 accuracy: 0.5570 - loss: 0.6242 - val_accuracy: 0.6199 - val_loss: 0.5854 +Epoch 4/5 accuracy: 0.8367 - loss: 0.2854 - val_accuracy: 0.9897 - val_loss: 0.0503 +Epoch 5/5 accuracy: 0.9995 - loss: 0.0058 - val_accuracy: 0.9999 - val_loss: 0.0014 +``` + +- `python3 sequence_classification.py --rnn=LSTM --epochs=5 --hidden_layer=85 --clip_gradient=1` +``` +Epoch 1/5 accuracy: 0.5151 - loss: 0.6888 - val_accuracy: 0.5323 - val_loss: 0.6571 +Epoch 2/5 accuracy: 0.5387 - loss: 0.6497 - val_accuracy: 0.5582 - val_loss: 0.6321 +Epoch 3/5 accuracy: 0.5576 - loss: 0.6237 - val_accuracy: 0.6542 - val_loss: 0.5625 +Epoch 4/5 accuracy: 0.9033 - loss: 0.1909 - val_accuracy: 0.9999 - val_loss: 0.0014 +Epoch 5/5 accuracy: 0.9997 - loss: 0.0029 - val_accuracy: 1.0000 - val_loss: 4.4711e-04 +``` +#### Examples End: diff --git a/tasks/sgd_manual.md b/tasks/sgd_manual.md index fd268e7..d054d94 100644 --- a/tasks/sgd_manual.md +++ b/tasks/sgd_manual.md @@ -17,7 +17,7 @@ Start with the [sgd_manual.py](https://github.com/ufal/npfl138/tree/master/labs/02/sgd_manual.py) template, which is based on [sgd_backpropagation.py](https://github.com/ufal/npfl138/tree/master/labs/02/sgd_backpropagation.py) -one. Be aware that these templates generates each a different output file. +one. Note that ReCodEx disables the PyTorch automatic differentiation during evaluation. diff --git a/tasks/speech_recognition.md b/tasks/speech_recognition.md new file mode 100644 index 0000000..dc1587e --- /dev/null +++ b/tasks/speech_recognition.md @@ -0,0 +1,42 @@ +### Assignment: speech_recognition +#### Date: Deadline: Apr 30, 22:00 +#### Points: 5 points+5 bonus + +**The template is being finalized, final version will be released shortly.** + +This assignment is a competition task in speech recognition area. Specifically, +your goal is to predict a sequence of letters given a spoken utterance. +We will be using Czech recordings from the [Common Voice](https://commonvoice.mozilla.org/), +with input sound waves passed through the usual preprocessing – computing +[Mel-frequency cepstral coefficients (MFCCs)](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum). +You can repeat this preprocessing on a given audio using the `load_audio` and +`mfcc_extract` methods from the +[common_voice_cs.py](https://github.com/ufal/npfl138/tree/master/labs/09/common_voice_cs.py) module. +This module can also load the dataset, downloading it when necessary (note that +it has 200MB, so it might take a while). Furthermore, you can listen to the +[development portion of the dataset](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/common_voice_cs_dev.html). +Lastly, the whole dataset is available for +[download in MP3 format](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/datasets/common_voice_cs_mp3.tar) +(but you are not expected to download that, only if you would like to perform some +custom preprocessing). + +Additional following data can be utilized in this assignment: +- You can use any _unannotated_ text data (Wikipedia, Czech National Corpus, …), + and also any pre-trained word embeddings or language models (assuming they + were trained on plain texts). +- You can use any _unannotated_ speech data. + +The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). +The evaluation is performed by computing the edit distance to the gold letter +sequence, normalized by its length (a corresponding metric +`EditDistanceMetric` is provided by the [common_voice_cs.py](https://github.com/ufal/npfl138/tree/master/labs/09/common_voice_cs.py)). +Everyone who submits a solution with at most 50% test set edit distance +gets 5 points; the remaining 5 bonus points are distributed +depending on relative ordering of your solutions. Note that +you can evaluate the predictions as usual using the [common_voice_cs.py](https://github.com/ufal/npfl138/tree/master/labs/08/common_voice_cs.py) +module, either by running with `--evaluate=path` arguments, or using its +`evaluate_file` method. + +Start with the `speech_recognition.py` +template which contains instructions for using the CTC loss and the CTC decoder, +and it generates the test set annotation in the required format. diff --git a/tasks/svhn_competition.md b/tasks/svhn_competition.md new file mode 100644 index 0000000..902484f --- /dev/null +++ b/tasks/svhn_competition.md @@ -0,0 +1,44 @@ +### Assignment: svhn_competition +#### Date: Deadline: Apr 09, 22:00 +#### Points: 5 points+5 bonus + +The goal of this assignment is to implement a system performing object +recognition, optionally utilizing the pretrained EfficientNetV2-B0 backbone +(or any other model from `keras.applications`). + +The [Street View House Numbers (SVHN) dataset](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/svhn_train.html) +annotates for every photo all digits appearing on it, including their bounding +boxes. The dataset can be loaded using the [svhn_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_dataset.py) +module. Similarly to the `CAGS` dataset, the `train/dev/test` are PyTorch +`torch.utils.data.Dataset`s, and every element is a dictionary with the following keys: +- `"image"`: a square 3-channel image stored using PyTorch tensor of type `torch.uint8`, +- `"classes"`: a 1D `np.ndarray` with all digit labels appearing in the image, +- `"bboxes"`: a `[num_digits, 4]` 2D `np.ndarray` with bounding boxes of every + digit in the image, each represented as `[TOP, LEFT, BOTTOM, RIGHT]`. + +Each test set image annotation consists of a sequence of space separated +five-tuples _label top left bottom right_, and the annotation is considered +correct, if exactly the gold digits are predicted, each with IoU at least 0.5. +The whole test set score is then the prediction accuracy of individual images. +You can again evaluate your predictions using the +[svhn_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_dataset.py) +module, either by running with `--evaluate=path` arguments, or using its +`evaluate_file` method. + +The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). +Everyone who submits a solution achieving at least _20%_ test set accuracy gets +5 points; the remaining 5 bonus points are distributed depending on relative ordering +of your solutions. Note that I usually need at least _35%_ development set +accuracy to achieve the required test set performance. + +You should start with the +[svhn_competition.py](https://github.com/ufal/npfl138/tree/master/labs/06/svhn_competition.py) +template, which generates the test set annotation in the required format. + +_A baseline solution can use RetinaNet-like single stage detector, +using only a single level of convolutional features (no FPN) +with single-scale and single-aspect anchors. Focal loss is available +as [keras.losses.BinaryFocalCrossentropy](https://keras.io/api/losses/probabilistic_losses/#binaryfocalcrossentropy-class) +and non-maximum suppression as +[torchvision.ops.nms](https://pytorch.org/vision/main/generated/torchvision.ops.nms.html#nms) or +[torchvision.ops.batched_nms](https://pytorch.org/vision/main/generated/torchvision.ops.batched_nms.html#batched-nms)._ diff --git a/tasks/tagger_cle.md b/tasks/tagger_cle.md new file mode 100644 index 0000000..57bf95f --- /dev/null +++ b/tasks/tagger_cle.md @@ -0,0 +1,50 @@ +### Assignment: tagger_cle +#### Date: Deadline: Apr 23, 22:00 +#### Points: 3 points +#### Tests: tagger_cle_tests +#### Examples: tagger_cle_examples + +This assignment is a continuation of `tagger_we`. Using the +[tagger_cle.py](https://github.com/ufal/npfl138/tree/master/labs/08/tagger_cle.py) +template, implement character-level word embedding computation using +a bidirectional character-level GRU. + +Once submitted to ReCodEx, you should experiment with the effect of CLEs +compared to a plain `tagger_we`, and the influence of their dimensionality. Note +that `tagger_cle` has by default smaller word embeddings so that the size +of word representation (64 + 32 + 32) is the same as in the `tagger_we` assignment. + +#### Tests Start: tagger_cle_tests +_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._ + +1. `python3 tagger_cle.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16 --cle_dim=16` +``` +Epoch=1/1 4.0s loss=2.2871 accuracy=0.2909 dev_loss=1.8784 dev_accuracy=0.4275 +``` + +2. `python3 tagger_cle.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16 --cle_dim=16 --word_masking=0.1` +``` +Epoch=1/1 4.4s loss=2.2846 accuracy=0.2875 dev_loss=1.8835 dev_accuracy=0.4289 +``` +#### Tests End: +#### Examples Start: tagger_cle_examples +_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._ + +- `python3 tagger_cle.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=32 --cle_dim=32` +``` +Epoch=1/5 22.6s loss=1.0757 accuracy=0.6784 dev_loss=0.3678 dev_accuracy=0.8969 +Epoch=2/5 21.5s loss=0.1476 accuracy=0.9684 dev_loss=0.1978 dev_accuracy=0.9375 +Epoch=3/5 22.1s loss=0.0490 accuracy=0.9881 dev_loss=0.1722 dev_accuracy=0.9488 +Epoch=4/5 21.3s loss=0.0303 accuracy=0.9912 dev_loss=0.1651 dev_accuracy=0.9470 +Epoch=5/5 21.1s loss=0.0201 accuracy=0.9942 dev_loss=0.1630 dev_accuracy=0.9479 +``` + +- `python3 tagger_cle.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=32 --cle_dim=32 --word_masking=0.1` +``` +Epoch=1/5 22.2s loss=1.1264 accuracy=0.6594 dev_loss=0.3980 dev_accuracy=0.8977 +Epoch=2/5 21.4s loss=0.2340 accuracy=0.9408 dev_loss=0.2175 dev_accuracy=0.9377 +Epoch=3/5 24.1s loss=0.1163 accuracy=0.9690 dev_loss=0.1624 dev_accuracy=0.9525 +Epoch=4/5 26.6s loss=0.0852 accuracy=0.9745 dev_loss=0.1493 dev_accuracy=0.9560 +Epoch=5/5 24.9s loss=0.0718 accuracy=0.9778 dev_loss=0.1450 dev_accuracy=0.9563 +``` +#### Examples End: diff --git a/tasks/tagger_competition.md b/tasks/tagger_competition.md new file mode 100644 index 0000000..659c7d9 --- /dev/null +++ b/tasks/tagger_competition.md @@ -0,0 +1,32 @@ +### Assignment: tagger_competition +#### Date: Deadline: Apr 23, 22:00 +#### Points: 4 points+5 bonus + +In this assignment, you should extend `tagger_cle` +into a real-world Czech part-of-speech tagger. We will use +Czech PDT dataset loadable using the [morpho_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/08/morpho_dataset.py) +module. Note that the dataset contains more than 1500 unique POS tags and that +the POS tags have a fixed structure of 15 positions (so it is possible to +generate the POS tag characters independently). + +You can use the following additional data in this assignment: +- You can use outputs of a morphological analyzer loadable with + [morpho_analyzer.py](https://github.com/ufal/npfl138/tree/master/labs/08/morpho_analyzer.py). + If a word form in train, dev or test PDT data is known to the analyzer, + all its _(lemma, POS tag)_ pairs are returned. +- You can use any _unannotated_ text data (Wikipedia, Czech National Corpus, …), + and also any pre-trained word embeddings (assuming they were trained on plain + texts). + +The task is a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). +Everyone who submits a solution with at least 92.5% label accuracy gets +4 points; the remaining 5 bonus points are distributed depending on relative ordering +of your solutions. Lastly, **3 bonus points** will be given to anyone surpassing +pre-neural-network state-of-the-art of **96.35%**. + +You can start with the +[tagger_competition.py](https://github.com/ufal/npfl138/tree/master/labs/08/tagger_competition.py) +template, which among others generates test set annotations in the required format. Note that +you can evaluate the predictions as usual using the [morpho_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/08/morpho_dataset.py) +module, either by running with `--task=tagger --evaluate=path` arguments, or using its +`evaluate_file` method. diff --git a/tasks/tagger_ner.md b/tasks/tagger_ner.md new file mode 100644 index 0000000..a8dd777 --- /dev/null +++ b/tasks/tagger_ner.md @@ -0,0 +1,18 @@ +### Assignment: tagger_ner +#### Date: Deadline: Apr 30, 22:00 +#### Points: 2 points + +**The template is being finalized, final version will be released shortly.** + +This assignment is an extension of `tagger_we` task. Using the +`tagger_ner.py` +template, implement optimal decoding of named entity spans from +BIO-encoded tags. + +The evaluation is performed using the provided metric computing F1 score of the +span prediction (i.e., a recognized possibly-multiword named entity is true +positive if both the entity type and the span exactly match). + +In practice, character-level embeddings (and also pre-trained word embeddings) +would be used to obtain superior results. + diff --git a/tasks/tagger_we.md b/tasks/tagger_we.md new file mode 100644 index 0000000..648b30e --- /dev/null +++ b/tasks/tagger_we.md @@ -0,0 +1,56 @@ +### Assignment: tagger_we +#### Date: Deadline: Apr 23, 22:00 +#### Points: 3 points +#### Tests: tagger_we_tests +#### Examples: tagger_we_examples + +In this assignment you will create a simple part-of-speech tagger. For training +and evaluation, we will use Czech dataset containing tokenized sentences, each +word annotated by gold lemma and part-of-speech tag. The +[morpho_dataset.py](https://github.com/ufal/npfl138/tree/master/labs/08/morpho_dataset.py) +module (down)loads the dataset and provides mappings between strings and integers. + +Your goal is to modify the +[tagger_we.py](https://github.com/ufal/npfl138/tree/master/labs/08/tagger_we.py) +template and implement the following: +- Use specified RNN layer type (`GRU` and `LSTM`) and dimensionality. +- Create word embeddings for training vocabulary. +- Process the sentences using bidirectional RNN. +- Predict part-of-speech tags. +Note that you need to properly handle sentences of different lengths in one +batch. + +#### Tests Start: tagger_we_tests +_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._ + +1. `python3 tagger_we.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16` +``` +Epoch=1/1 3.1s loss=2.3541 accuracy=0.3138 dev_loss=2.0320 dev_accuracy=0.3611 +``` + +2. `python3 tagger_we.py --epochs=1 --max_sentences=1000 --rnn=GRU --rnn_dim=16` +``` +Epoch=1/1 3.2s loss=2.1970 accuracy=0.4233 dev_loss=1.5569 dev_accuracy=0.5121 +``` +#### Tests End: +#### Examples Start: tagger_we_examples +_Note that your results may be slightly different, depending on your CPU type and whether you use a GPU._ + +- `python3 tagger_we.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=64` +``` +Epoch=1/5 21.1s loss=0.9776 accuracy=0.7080 dev_loss=0.3744 dev_accuracy=0.8814 +Epoch=2/5 19.2s loss=0.1060 accuracy=0.9736 dev_loss=0.2947 dev_accuracy=0.9013 +Epoch=3/5 19.4s loss=0.0291 accuracy=0.9921 dev_loss=0.2794 dev_accuracy=0.9057 +Epoch=4/5 19.7s loss=0.0166 accuracy=0.9960 dev_loss=0.2976 dev_accuracy=0.9015 +Epoch=5/5 19.7s loss=0.0096 accuracy=0.9978 dev_loss=0.3159 dev_accuracy=0.8957 +``` + +- `python3 tagger_we.py --epochs=5 --max_sentences=5000 --rnn=GRU --rnn_dim=64` +``` +Epoch=1/5 20.5s loss=0.7698 accuracy=0.7703 dev_loss=0.3432 dev_accuracy=0.8903 +Epoch=2/5 18.9s loss=0.0735 accuracy=0.9807 dev_loss=0.2999 dev_accuracy=0.8969 +Epoch=3/5 19.0s loss=0.0245 accuracy=0.9923 dev_loss=0.3244 dev_accuracy=0.8965 +Epoch=4/5 19.2s loss=0.0153 accuracy=0.9955 dev_loss=0.3302 dev_accuracy=0.8929 +Epoch=5/5 19.0s loss=0.0088 accuracy=0.9977 dev_loss=0.3641 dev_accuracy=0.8923 +``` +#### Examples End: diff --git a/tasks/tensorboard_projector.md b/tasks/tensorboard_projector.md new file mode 100644 index 0000000..824bf8b --- /dev/null +++ b/tasks/tensorboard_projector.md @@ -0,0 +1,13 @@ +### Assignment: tensorboard_projector + +You can try exploring the TensorBoard Projector with pre-trained embeddings +for 20k most frequent lemmas in +[Czech](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/cs_lemma_20k.zip) +and [English](https://ufal.mff.cuni.cz/~straka/courses/npfl138/2324/demos/en_lemma_20k.zip) +– after extracting the archive, start +`tensorboard --logdir dir_where_the_archive_is_extracted`. + +In order to use the Projector tab yourself, you can take inspiration from the +[projector_export.py](https://github.com/ufal/npfl138/tree/master/labs/09/projector_export.py) +script, which was used to export the above pre-trained embeddings from the +Word2vec format. diff --git a/tasks/uppercase.md b/tasks/uppercase.md index 66288d4..5c9a9e4 100644 --- a/tasks/uppercase.md +++ b/tasks/uppercase.md @@ -15,8 +15,8 @@ only used to understand the approach you took, and to indicate teams). Explicitly, submit **exactly one .txt file** and **at least one .py/ipynb file**. The task is also a [_competition_](https://ufal.mff.cuni.cz/courses/npfl138/2324-summer#competitions). Everyone who submits -a solution which achieves at least _98.5%_ accuracy will get 4 basic points; the -5 bonus points will be distributed depending on relative ordering of your +a solution achieving at least _98.5%_ accuracy gets 4 basic points; the +remaining 5 bonus points are distributed depending on relative ordering of your solutions. The accuracy is computed per-character and can be evaluated by running [uppercase_data.py](https://github.com/ufal/npfl138/tree/master/labs/03/uppercase_data.py) with `--evaluate` argument, or using its `evaluate_file` method.