diff --git a/natural-language-processing/multinomial-logistic-regression.md b/natural-language-processing/multinomial-logistic-regression.md index 365da5b3..56ebee06 100644 --- a/natural-language-processing/multinomial-logistic-regression.md +++ b/natural-language-processing/multinomial-logistic-regression.md @@ -1,3 +1,10 @@ +--- +title: Multinomial Logistic Regression +category: natural language processing +tags: classification, multinomial logistic regression, machine learning +description: Explanation of multinomial logistic regression, a classification algorithm used in natural language processing. +--- + # Multinomial Logistic Regression ## Classification diff --git a/natural-language-processing/neural-networks.md b/natural-language-processing/neural-networks.md new file mode 100644 index 00000000..70315c4c --- /dev/null +++ b/natural-language-processing/neural-networks.md @@ -0,0 +1,220 @@ +--- +title: Feedforward Neural Networks +category: natural language processing +tags: neural networks, machine learning, natural language processing, deep learning, feedforward +description: Overview of neural networks (feedforward), particularly in the context of natural language processing. +source: https://web.stanford.edu/~jurafsky/slp3/7.pdf +--- + +# Neural Networks + +Contrasting with MLR, neural networks are a more flexible model that can learn complex patterns in the data, even without hand-crafted features. + + +## Activation Functions + +A single computational unit $z = w \cdot x + b$ is a linear function of the input $x$ with weights $w$ and bias $b$. The output $y$ is a non-linear function of $f(z)$, where $f$ is the activation function (typically one of $\tanh$, $\text{ReLU}$, or $\sigma$). + +$$ +y = \sigma(w \cdot x + b) = \frac{1}{1 + e^{-(w \cdot x + b)}} +$$ + +In practice, $\sigma$ is rarely the best choice, and $\tanh$ is similar yet almost always better. $\tanh$ is a scaled version of $\sigma$ that ranges from $-1$ to $1$. + +$$ +y = \tanh(w \cdot x + b) = \frac{e^{w \cdot x + b} - e^{-(w \cdot x + b)}}{e^{w \cdot x + b} + e^{-(w \cdot x + b)}} +$$ + +The simplest activation function is the Rectified Linear Unit (ReLU), which is $0$ for negative inputs and linear for positive inputs. + +$$ +y = \text{ReLU}(w \cdot x + b) = \max(0, w \cdot x + b) +$$ + +A potential upside with ReLU is that it is computationally efficient, and also prevents the vanishing gradient problem, e.g. when the gradient is $\approx 0$, and the network stops learning. + +## The XOR Problem + +It can be shown that a single computational unit cannot solve XOR, as it is a non-linear problem. However, a two-layer network can solve XOR, as it can learn to represent the input in a higher-dimensional space where the problem is linearly separable. + +$$ +y = \begin{cases} +1 & \text{if } w \cdot x + b > 0 \\ +0 & \text{otherwise} +\end{cases} +$$ + +XOR turns out to be a simple example of a problem that is not linearly separable in the input space, since the inputs $(x_1, x_2) = (0, 0)$ and $(1, 1)$ are in the same class, while $(0, 1)$ and $(1, 0)$ are in the other class. It is not possible to draw a straight line that separates the two classes. + +## Feedforward Neural Networks + +A feedforward NN is a multi-layer network where the output of each layer is the input to the next layer, all with no cycles. They are sometimes called multilayer perceptrons (MLPs), although this term is technically only applicable to networks with a single step function as their activation function. + +The network has three different types of nodes: + +### Input units + +vector of input units is $x$. One node for each feature in the input. + +### Hidden layers + +one or more layers of hidden units, each with a non-linear activation function. In the standard architecture, each node is connected with all nodes in the previous layer. Thus, each hidden unit sums over all input values. + +For a given hidden layer $h$, we combine the weights $w$ and bias $b$ for each computational unit into a weight matrix $W$ and bias vector $b$. Each element $W_{ij}$ of the weight matrix is the weight from the $i$th input unit $x_i$ to the $j$th hidden unit $h_j$. + +Thus, the output for a given hidden layer with activation function $f$ is: + +$$ +h = f(W \cdot x + b) +$$ + +#### Dimensionality + +Referring to the input layer as layer $0$, and $n_0$ as the number of input units, we have an input $x \in \mathbb{R}^{n_0}$, e.g. a column vector with dimension $n_0 \times 1$. + +The first hidden layer $h^{(1)}$ has $n_1$ hidden units, so $W \in \mathbb{R}^{n_1 \times n_0}$, and $b \in \mathbb{R}^{n_1}$. + +$$ +h_j = f\left(\sum_{i=1}^{n_0} W_{ji} x_i + b_j\right) +$$ + +### Output units + +one or more output units, each with a non-linear activation function. The output layer is the final layer of the network, and the output $y$ with $dim(y) = n_{\text{output}}$ is an estimate for the probability distribution of the correct class/output. + +#### Normalization + +In order to get that probability distribution, we normalize the output of the network using the softmax function. + +$$ +y = \text{softmax}(W \cdot h + b) +$$ + +$$ +\text{softmax}(z) = \frac{e^z}{\sum_{i=1}^n e^{z_i}} +$$ + + +### Comparison with MLR + +A NN is like MLR but with with a few differences: +- many layers, since a deep NN is like layer after layer of MLR classifiers +- intermediate layers have non-linear activation functions. In fact, without these, the network would just be a linear classifier since the composition of linear functions is still linear +- instead of feature selection, previous layers build up a representation of the input that is useful for the final layer + +### Details/Notation + +- $*^{[l]}$ denotes a quantity associated with the $l$th layer, e.g. $W^{[l]}$ is the weight matrix for the $l$th layer. Note that these indices are 1-indexed. +- $n_l$ is the number of units in layer $l$. +- $g(.)$ is the activation function, which tends to be $\tanh$ or ReLU for hidden layers, and softmax for the output layer. +- $a^{[l]}$ is the output from layer $l$ +- $z^{[l]}$ is the input to the activation function in layer $l$, e.g. $z^{[l]} = W^{[l]} \cdot a^{[l-1]} + b^{[l]}$ +- $x = a^{[0]}$ is the input vector + +#### Example: 2-layer NN + +$$ +\begin{align*} +z^{[1]} &= W^{[1]} \cdot a^{[0]} + b^{[1]} \\ +a^{[1]} &= g^{[1]}(z^{[1]}) \\ +z^{[2]} &= W^{[2]} \cdot a^{[1]} + b^{[2]} \\ +a^{[2]} &= g^{[2]}(z^{[2]}) \\ +\hat{y} &= a^{[2]} +\end{align*} +$$ + +### Feedforward Computation + +$$ +\begin{align*} +\text{for } l = 1, \ldots, L: \\ +z^{[l]} &= W^{[l]} \cdot a^{[l-1]} + b^{[l]} \\ +a^{[l]} &= g^{[l]}(z^{[l]})\\ + +\text{return } \hat{y} = a^{[L]} +\end{align*} +$$ + +```python +def feedforward(x): + a = x + for l in range(1, L): + z = W[l] @ a + b[l] + a = g[l](z) + return a +``` + +### Replacing the Bias + +Often, the bias term is included in the weight matrix, by adding a column of $1$s to the input vector $x$. + +With $a^{[0]}_0 = 1$, we can write $z^{[l]} = W^{[l]} \cdot a^{[l-1]}$. + +$$ +h_j = f\left(\sum_{i=1}^{n_0} W_{ji} x_i\right) +$$ + +## FF networks for NLP: Classification + +Instead of manually designed features, use words as embeddings (e.g. word2vec, GloVe). This constitutes "pre-training", i.e. relying on already computed values/embeddings. One simple method of representing a sentence is to sum the embeddings of the words in the sentence, or to average them. + +To classify many examples at once, pack inputs into a single matrix $X$ where each row $i$ is an input vector $x^{(i)}$. If our input has $d$ features, then $X \in \mathbb{R}^{m \times d}$ where $m$ is the number of examples. + +$W \in \mathbb{R}^{d_h \times d}$ is the weight matrix for the hidden layer, and $b \in \mathbb{R}^{d_h}$ is the bias vector. $Y \in \mathbb{R}^{m \times n_{\text{output}}}$ is the output matrix. + +$$ +\begin{align*} +H &= f(X W^T + b) \\ +Z &= H U^T\\ +\hat{Y} &= \text{softmax}(Z) +\end{align*} +$$ + +## Training Neural Nets + +We want to learn the parameters $W^{[i]}$ and $b^{[i]}$ for each layer $i$ that make $\hat{y}$ as close as possible to the true $y$. + +### Loss Function + +Same as the one used for MLR, the cross-entropy loss function. + + +For binary classification, the loss function is: +$$ +L_{\text{CE}}(\hat{y}, y) = - \log p(y | x) = - \left [ y \log \hat{y} + (1 - y) \log (1 - \hat{y}) \right ] +$$ + +For multi-class classification, the loss function is: + +$$ +L_{\text{CE}}(\hat{y}, y) = - \sum_{i=1}^n y_i \log \hat{y}_i = - \log \hat{y}_i \text{ where } y_i = 1 +$$ + +$$ +L_{\text{CE}}(\hat{y}, y) = -\log \frac{exp(z_{c})}{\sum_{i=1}^K exp(z_i)} +$$ + +### Backpropagation + +One must pass gradients back through the network to update the weights. This is done using the chain rule. Each node in a computation graph takes an **upstream** gradient and computes its **local** gradient, multiplying the two to get the **downstream** gradient. A node may have multiple local gradients, one for each incoming edge. + +#### A very simple example + +Consider the function $L(a, b, c) = c(a + 2b)$. Create a computation graph with nodes $a, b, c$ for the inputs, and $d = 2b, e = a + d, L = ce$ for the intermediate computations. + +``` +(a) ---------------- \ + (e) ------------ (L) + / / +(b) --------(d)----- /------------- + / +(c) ------------------- +``` + +$$ +\begin{align*} +\frac{\partial L}{\partial c} &= e = a + 2b \\ +\frac{\partial L}{\partial a} &= \frac{\partial L}{\partial e} \cdot \frac{\partial e}{\partial a} = c \\ +\frac{\partial L}{\partial b} &= \frac{\partial L}{\partial e} \cdot \frac{\partial e}{\partial d} \cdot \frac{\partial d}{\partial b} = 2c +\end{align*} +$$ \ No newline at end of file diff --git a/site/categories/algorithm analysis.html b/site/categories/algorithm analysis.html index 398936e1..fd5f2a7a 100644 --- a/site/categories/algorithm analysis.html +++ b/site/categories/algorithm analysis.html @@ -183,7 +183,7 @@

Category: Algorithm Analysis

- Last modified: 2025-01-13 + Last modified: 2025-01-14
diff --git a/site/categories/algorithms.html b/site/categories/algorithms.html index 4c856732..851ade7a 100644 --- a/site/categories/algorithms.html +++ b/site/categories/algorithms.html @@ -183,7 +183,7 @@

Category: algorithms

- Last modified: 2025-01-13 + Last modified: 2025-01-14
diff --git a/site/categories/computer science.html b/site/categories/computer science.html index 017bc8ab..58068ad2 100644 --- a/site/categories/computer science.html +++ b/site/categories/computer science.html @@ -183,7 +183,7 @@

Category: Computer Science

- Last modified: 2025-01-13 + Last modified: 2025-01-14
diff --git a/site/categories/database design.html b/site/categories/database design.html index 69fa0ab9..0f0ec73d 100644 --- a/site/categories/database design.html +++ b/site/categories/database design.html @@ -183,7 +183,7 @@

Category: Database Design

- Last modified: 2025-01-13 + Last modified: 2025-01-14
diff --git a/site/categories/database systems.html b/site/categories/database systems.html index 8dc73529..9d32df42 100644 --- a/site/categories/database systems.html +++ b/site/categories/database systems.html @@ -183,7 +183,7 @@

Category: Database Systems

- Last modified: 2025-01-13 + Last modified: 2025-01-14
diff --git a/site/categories/distributed systems.html b/site/categories/distributed systems.html index 2a015ff7..bcd740b0 100644 --- a/site/categories/distributed systems.html +++ b/site/categories/distributed systems.html @@ -183,7 +183,7 @@

Category: Distributed Systems

- Last modified: 2025-01-13 + Last modified: 2025-01-14
diff --git a/site/categories/graph theory.html b/site/categories/graph theory.html index 5c33c735..8dfc3eb3 100644 --- a/site/categories/graph theory.html +++ b/site/categories/graph theory.html @@ -183,7 +183,7 @@

Category: Graph Theory

- Last modified: 2025-01-13 + Last modified: 2025-01-14
diff --git a/site/categories/index.html b/site/categories/index.html index 48b272f2..b95c6a24 100644 --- a/site/categories/index.html +++ b/site/categories/index.html @@ -182,7 +182,7 @@

Categories

- Last modified: 2025-01-13 + Last modified: 2025-01-14
@@ -196,7 +196,10 @@

Categories

  • Mathematics (1 pages)
  • Operations Research (1 pages)
  • Software Engineering (1 pages)
  • +
  • System Design (1 pages)
  • algorithms (7 pages)
  • +
  • natural language processing (2 pages)
  • +
  • networking (1 pages)
  • research (3 pages)
  • system-design (1 pages)
  • diff --git a/site/categories/mathematics.html b/site/categories/mathematics.html index 649cdb78..21b17d3a 100644 --- a/site/categories/mathematics.html +++ b/site/categories/mathematics.html @@ -183,7 +183,7 @@

    Category: Mathematics

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/categories/natural language processing.html b/site/categories/natural language processing.html new file mode 100644 index 00000000..32dcd6c3 --- /dev/null +++ b/site/categories/natural language processing.html @@ -0,0 +1,198 @@ + + + + + + Category: natural language processing + + + + + +
    + +

    Category: natural language processing

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Category: natural language processing

    + +
    + +
    + + \ No newline at end of file diff --git a/site/categories/networking.html b/site/categories/networking.html new file mode 100644 index 00000000..e9c6c674 --- /dev/null +++ b/site/categories/networking.html @@ -0,0 +1,197 @@ + + + + + + Category: networking + + + + + +
    + +

    Category: networking

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Category: networking

    + +
    + +
    + + \ No newline at end of file diff --git a/site/categories/operations research.html b/site/categories/operations research.html index c978ef55..8edd5fe2 100644 --- a/site/categories/operations research.html +++ b/site/categories/operations research.html @@ -183,7 +183,7 @@

    Category: Operations Research

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/categories/research.html b/site/categories/research.html index c7b381ef..305df947 100644 --- a/site/categories/research.html +++ b/site/categories/research.html @@ -183,7 +183,7 @@

    Category: research

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/categories/software engineering.html b/site/categories/software engineering.html index 199f6418..76bb5b27 100644 --- a/site/categories/software engineering.html +++ b/site/categories/software engineering.html @@ -183,7 +183,7 @@

    Category: Software Engineering

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/categories/system design.html b/site/categories/system design.html new file mode 100644 index 00000000..ccd34100 --- /dev/null +++ b/site/categories/system design.html @@ -0,0 +1,197 @@ + + + + + + Category: System Design + + + + + +
    + +

    Category: System Design

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Category: System Design

    + +
    + +
    + + \ No newline at end of file diff --git a/site/categories/system-design.html b/site/categories/system-design.html index 16bb128a..eb5c14b6 100644 --- a/site/categories/system-design.html +++ b/site/categories/system-design.html @@ -183,7 +183,7 @@

    Category: system-design

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/index.html b/site/index.html index 6d476cb3..31eaceb4 100644 --- a/site/index.html +++ b/site/index.html @@ -182,22 +182,22 @@

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    - 151 + 154 Notes
    - 12 + 15 Categories
    - 88 + 100 Tags
    @@ -205,6 +205,31 @@

    Recent

    @@ -295,10 +295,22 @@

    Categories

    Software Engineering (1) +
  • + System Design + (1) +
  • algorithms (7)
  • +
  • + natural language processing + (2) +
  • +
  • + networking + (1) +
  • research (3) @@ -310,7 +322,7 @@

    Categories

  • Featured Tags

    - +
    diff --git a/site/natural-language-processing/multinomial-logistic-regression.html b/site/natural-language-processing/multinomial-logistic-regression.html index 45647c5e..1a442203 100644 --- a/site/natural-language-processing/multinomial-logistic-regression.html +++ b/site/natural-language-processing/multinomial-logistic-regression.html @@ -4,7 +4,7 @@ Multinomial Logistic Regression - + + + + +
    + +

    Feedforward Neural Networks

    +
    + Last modified: 2025-01-14 + Category: natural language processing +
    +
    +

    Neural Networks

    +

    Contrasting with MLR, neural networks are a more flexible model that can learn complex patterns in the data, even without hand-crafted features.

    +

    Activation Functions

    +

    A single computational unit $z = w \cdot x + b$ is a linear function of the input $x$ with weights $w$ and bias $b$. The output $y$ is a non-linear function of $f(z)$, where $f$ is the activation function (typically one of $\tanh$, $\text{ReLU}$, or $\sigma$).

    +

    $$ +y = \sigma(w \cdot x + b) = \frac{1}{1 + e^{-(w \cdot x + b)}} +$$

    +

    In practice, $\sigma$ is rarely the best choice, and $\tanh$ is similar yet almost always better. $\tanh$ is a scaled version of $\sigma$ that ranges from $-1$ to $1$.

    +

    $$ +y = \tanh(w \cdot x + b) = \frac{e^{w \cdot x + b} - e^{-(w \cdot x + b)}}{e^{w \cdot x + b} + e^{-(w \cdot x + b)}} +$$

    +

    The simplest activation function is the Rectified Linear Unit (ReLU), which is $0$ for negative inputs and linear for positive inputs.

    +

    $$ +y = \text{ReLU}(w \cdot x + b) = \max(0, w \cdot x + b) +$$

    +

    A potential upside with ReLU is that it is computationally efficient, and also prevents the vanishing gradient problem, e.g. when the gradient is $\approx 0$, and the network stops learning.

    +

    The XOR Problem

    +

    It can be shown that a single computational unit cannot solve XOR, as it is a non-linear problem. However, a two-layer network can solve XOR, as it can learn to represent the input in a higher-dimensional space where the problem is linearly separable.

    +

    $$ +y = \begin{cases} +1 & \text{if } w \cdot x + b > 0 \ +0 & \text{otherwise} +\end{cases} +$$

    +

    XOR turns out to be a simple example of a problem that is not linearly separable in the input space, since the inputs $(x_1, x_2) = (0, 0)$ and $(1, 1)$ are in the same class, while $(0, 1)$ and $(1, 0)$ are in the other class. It is not possible to draw a straight line that separates the two classes.

    +

    Feedforward Neural Networks

    +

    A feedforward NN is a multi-layer network where the output of each layer is the input to the next layer, all with no cycles. They are sometimes called multilayer perceptrons (MLPs), although this term is technically only applicable to networks with a single step function as their activation function.

    +

    The network has three different types of nodes:

    +

    Input units

    +

    vector of input units is $x$. One node for each feature in the input.

    +

    Hidden layers

    +

    one or more layers of hidden units, each with a non-linear activation function. In the standard architecture, each node is connected with all nodes in the previous layer. Thus, each hidden unit sums over all input values.

    +

    For a given hidden layer $h$, we combine the weights $w$ and bias $b$ for each computational unit into a weight matrix $W$ and bias vector $b$. Each element $W_{ij}$ of the weight matrix is the weight from the $i$th input unit $x_i$ to the $j$th hidden unit $h_j$.

    +

    Thus, the output for a given hidden layer with activation function $f$ is:

    +

    $$ +h = f(W \cdot x + b) +$$

    +

    Dimensionality

    +

    Referring to the input layer as layer $0$, and $n_0$ as the number of input units, we have an input $x \in \mathbb{R}^{n_0}$, e.g. a column vector with dimension $n_0 \times 1$.

    +

    The first hidden layer $h^{(1)}$ has $n_1$ hidden units, so $W \in \mathbb{R}^{n_1 \times n_0}$, and $b \in \mathbb{R}^{n_1}$.

    +

    $$ +h_j = f\left(\sum_{i=1}^{n_0} W_{ji} x_i + b_j\right) +$$

    +

    Output units

    +

    one or more output units, each with a non-linear activation function. The output layer is the final layer of the network, and the output $y$ with $dim(y) = n_{\text{output}}$ is an estimate for the probability distribution of the correct class/output.

    +

    Normalization

    +

    In order to get that probability distribution, we normalize the output of the network using the softmax function.

    +

    $$ +y = \text{softmax}(W \cdot h + b) +$$

    +

    $$ +\text{softmax}(z) = \frac{e^z}{\sum_{i=1}^n e^{z_i}} +$$

    +

    Comparison with MLR

    +

    A NN is like MLR but with with a few differences: +- many layers, since a deep NN is like layer after layer of MLR classifiers +- intermediate layers have non-linear activation functions. In fact, without these, the network would just be a linear classifier since the composition of linear functions is still linear +- instead of feature selection, previous layers build up a representation of the input that is useful for the final layer

    +

    Details/Notation

    + +

    Example: 2-layer NN

    +

    $$ +\begin{align} +z^{[1]} &= W^{[1]} \cdot a^{[0]} + b^{[1]} \ +a^{[1]} &= g^{[1]}(z^{[1]}) \ +z^{[2]} &= W^{[2]} \cdot a^{[1]} + b^{[2]} \ +a^{[2]} &= g^{[2]}(z^{[2]}) \ +\hat{y} &= a^{[2]} +\end{align} +$$

    +

    Feedforward Computation

    +

    $$ +\begin{align*} +\text{for } l = 1, \ldots, L: \ +z^{[l]} &= W^{[l]} \cdot a^{[l-1]} + b^{[l]} \ +a^{[l]} &= g^{[l]}(z^{[l]})\

    +

    \text{return } \hat{y} = a^{[L]} +\end{align*} +$$

    +
    def feedforward(x):
    +  a = x
    +  for l in range(1, L):
    +    z = W[l] @ a + b[l]
    +    a = g[l](z)
    +  return a
    +
    +

    Replacing the Bias

    +

    Often, the bias term is included in the weight matrix, by adding a column of $1$s to the input vector $x$.

    +

    With $a^{[0]}_0 = 1$, we can write $z^{[l]} = W^{[l]} \cdot a^{[l-1]}$.

    +

    $$ +h_j = f\left(\sum_{i=1}^{n_0} W_{ji} x_i\right) +$$

    +

    FF networks for NLP: Classification

    +

    Instead of manually designed features, use words as embeddings (e.g. word2vec, GloVe). This constitutes "pre-training", i.e. relying on already computed values/embeddings. One simple method of representing a sentence is to sum the embeddings of the words in the sentence, or to average them.

    +

    To classify many examples at once, pack inputs into a single matrix $X$ where each row $i$ is an input vector $x^{(i)}$. If our input has $d$ features, then $X \in \mathbb{R}^{m \times d}$ where $m$ is the number of examples.

    +

    $W \in \mathbb{R}^{d_h \times d}$ is the weight matrix for the hidden layer, and $b \in \mathbb{R}^{d_h}$ is the bias vector. $Y \in \mathbb{R}^{m \times n_{\text{output}}}$ is the output matrix.

    +

    $$ +\begin{align} +H &= f(X W^T + b) \ +Z &= H U^T\ +\hat{Y} &= \text{softmax}(Z) +\end{align} +$$

    +

    Training Neural Nets

    +

    We want to learn the parameters $W^{[i]}$ and $b^{[i]}$ for each layer $i$ that make $\hat{y}$ as close as possible to the true $y$.

    +

    Loss Function

    +

    Same as the one used for MLR, the cross-entropy loss function.

    +

    For binary classification, the loss function is: +$$ +L_{\text{CE}}(\hat{y}, y) = - \log p(y | x) = - \left [ y \log \hat{y} + (1 - y) \log (1 - \hat{y}) \right ] +$$

    +

    For multi-class classification, the loss function is:

    +

    $$ +L_{\text{CE}}(\hat{y}, y) = - \sum_{i=1}^n y_i \log \hat{y}_i = - \log \hat{y}_i \text{ where } y_i = 1 +$$

    +

    $$ +L_{\text{CE}}(\hat{y}, y) = -\log \frac{exp(z_{c})}{\sum_{i=1}^K exp(z_i)} +$$

    +

    Backpropagation

    +

    One must pass gradients back through the network to update the weights. This is done using the chain rule. Each node in a computation graph takes an upstream gradient and computes its local gradient, multiplying the two to get the downstream gradient. A node may have multiple local gradients, one for each incoming edge.

    +

    A very simple example

    +

    Consider the function $L(a, b, c) = c(a + 2b)$. Create a computation graph with nodes $a, b, c$ for the inputs, and $d = 2b, e = a + d, L = ce$ for the intermediate computations.

    +
    (a) ---------------- \
    +                      (e) ------------ (L)
    +                     /                /
    +(b) --------(d)-----    /-------------
    +                       /
    +(c) -------------------
    +
    +

    $$ +\begin{align} +\frac{\partial L}{\partial c} &= e = a + 2b \ +\frac{\partial L}{\partial a} &= \frac{\partial L}{\partial e} \cdot \frac{\partial e}{\partial a} = c \ +\frac{\partial L}{\partial b} &= \frac{\partial L}{\partial e} \cdot \frac{\partial e}{\partial d} \cdot \frac{\partial d}{\partial b} = 2c +\end{align} +$$

    +
    +
    Tags: deep learning, feedforward, machine learning, natural language processing, neural networks
    +
    + + \ No newline at end of file diff --git a/site/systems-research/end-to-end-arguments-in-sys-design.html b/site/systems-research/end-to-end-arguments-in-sys-design.html new file mode 100644 index 00000000..d66cf827 --- /dev/null +++ b/site/systems-research/end-to-end-arguments-in-sys-design.html @@ -0,0 +1,224 @@ + + + + + + End-to-End Arguments in System Design + + + + + +
    + +

    End-to-End Arguments in System Design

    +
    + Last modified: 2025-01-13 + Category: System Design +
    +
    +

    source

    +
    End-to-End Arguments in System Design
    +
    +

    What is the Problem?

    + +

    Summary

    +

    Low Level Functionality

    + +

    Weakness

    + +

    Open Questions

    +

    -

    +

    Further Reading

    +
    +
    Tags: design, end-to-end, networking, system design
    +
    + + \ No newline at end of file diff --git a/site/systems-research/hints-for-computer-system-design.html b/site/systems-research/hints-for-computer-system-design.html index 6e59ea4c..6a4476ff 100644 --- a/site/systems-research/hints-for-computer-system-design.html +++ b/site/systems-research/hints-for-computer-system-design.html @@ -183,7 +183,7 @@

    Hints for Computer System Design

    - Last modified: 2025-01-12 + Last modified: 2025-01-13 Category: system-design
    @@ -220,8 +220,76 @@

    Throw one away

  • Always be prepared to discard your prototype
  • Throw ideas at the wall and go with what sticks
  • -

    Open Questions

    -

    -

    +

    Interface Design

    +

    Conflicting requirements: +- Simple +- Complete +- Efficient

    +

    In a way it's a lot like PL design; exposing new abstractions, objects and operations, manipulating them, etc.

    +

    KISS; Do one thing at a time and do it well.

    + +

    Implementation

    +

    Plan to throw one away - learn from prototyping

    +

    Keep secrets - impl details hidden from clients. Can be tradeoff for performance optimizations

    +

    Handle normal and worst cases separately

    + +

    Efficiency

    + +

    Reliability

    + +

    Takeaways

    +

    Further Reading

    Tag: acyclic graphs

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/algorithm-analysis.html b/site/tags/algorithm-analysis.html index 8472d43a..b4b34cc1 100644 --- a/site/tags/algorithm-analysis.html +++ b/site/tags/algorithm-analysis.html @@ -183,7 +183,7 @@

    Tag: algorithm-analysis

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/algorithm.html b/site/tags/algorithm.html index a182f6ae..a6e2e3af 100644 --- a/site/tags/algorithm.html +++ b/site/tags/algorithm.html @@ -183,7 +183,7 @@

    Tag: algorithm

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/algorithms.html b/site/tags/algorithms.html index 3c564314..d6bbdeca 100644 --- a/site/tags/algorithms.html +++ b/site/tags/algorithms.html @@ -183,7 +183,7 @@

    Tag: algorithms

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/approximation.html b/site/tags/approximation.html index 8d6bc669..56a9ba10 100644 --- a/site/tags/approximation.html +++ b/site/tags/approximation.html @@ -183,7 +183,7 @@

    Tag: approximation

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/asymptotic notation.html b/site/tags/asymptotic notation.html index 5150716b..bb330af6 100644 --- a/site/tags/asymptotic notation.html +++ b/site/tags/asymptotic notation.html @@ -183,7 +183,7 @@

    Tag: asymptotic notation

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/batch processing.html b/site/tags/batch processing.html index b5c62495..dc6f25b7 100644 --- a/site/tags/batch processing.html +++ b/site/tags/batch processing.html @@ -183,7 +183,7 @@

    Tag: batch processing

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/bipartite graphs.html b/site/tags/bipartite graphs.html index 81884ce9..b47c50c1 100644 --- a/site/tags/bipartite graphs.html +++ b/site/tags/bipartite graphs.html @@ -183,7 +183,7 @@

    Tag: bipartite graphs

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/bipartite matching.html b/site/tags/bipartite matching.html index 79d825b8..59bc586a 100644 --- a/site/tags/bipartite matching.html +++ b/site/tags/bipartite matching.html @@ -183,7 +183,7 @@

    Tag: bipartite matching

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/breadth-first search.html b/site/tags/breadth-first search.html index 83628e45..368aee28 100644 --- a/site/tags/breadth-first search.html +++ b/site/tags/breadth-first search.html @@ -183,7 +183,7 @@

    Tag: breadth-first search

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/caching.html b/site/tags/caching.html index 1e5b8517..26a11b80 100644 --- a/site/tags/caching.html +++ b/site/tags/caching.html @@ -183,7 +183,7 @@

    Tag: caching

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/classification.html b/site/tags/classification.html new file mode 100644 index 00000000..92e49f28 --- /dev/null +++ b/site/tags/classification.html @@ -0,0 +1,197 @@ + + + + + + Tag: classification + + + + + +
    + +

    Tag: classification

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Tag: classification

    + +
    + +
    + + \ No newline at end of file diff --git a/site/tags/column-oriented storage.html b/site/tags/column-oriented storage.html index 426197b1..c757240d 100644 --- a/site/tags/column-oriented storage.html +++ b/site/tags/column-oriented storage.html @@ -183,7 +183,7 @@

    Tag: column-oriented storage

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/compatibility.html b/site/tags/compatibility.html index e091142a..048c0853 100644 --- a/site/tags/compatibility.html +++ b/site/tags/compatibility.html @@ -183,7 +183,7 @@

    Tag: compatibility

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/complexity analysis.html b/site/tags/complexity analysis.html index 508b14d5..e0806720 100644 --- a/site/tags/complexity analysis.html +++ b/site/tags/complexity analysis.html @@ -183,7 +183,7 @@

    Tag: complexity analysis

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/complexity-analysis.html b/site/tags/complexity-analysis.html index aac14ff2..50d43b44 100644 --- a/site/tags/complexity-analysis.html +++ b/site/tags/complexity-analysis.html @@ -183,7 +183,7 @@

    Tag: complexity-analysis

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/connected components.html b/site/tags/connected components.html index 8c27f51e..1119da7a 100644 --- a/site/tags/connected components.html +++ b/site/tags/connected components.html @@ -183,7 +183,7 @@

    Tag: connected components

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/connected graphs.html b/site/tags/connected graphs.html index 38f072f1..f8d0a681 100644 --- a/site/tags/connected graphs.html +++ b/site/tags/connected graphs.html @@ -183,7 +183,7 @@

    Tag: connected graphs

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/data analysis.html b/site/tags/data analysis.html index 8e11cac4..c5e4b976 100644 --- a/site/tags/data analysis.html +++ b/site/tags/data analysis.html @@ -183,7 +183,7 @@

    Tag: data analysis

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/data modeling.html b/site/tags/data modeling.html index 3d15c710..2ebeade9 100644 --- a/site/tags/data modeling.html +++ b/site/tags/data modeling.html @@ -183,7 +183,7 @@

    Tag: data modeling

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/data replication.html b/site/tags/data replication.html index 0f2216e7..9e3ddf6f 100644 --- a/site/tags/data replication.html +++ b/site/tags/data replication.html @@ -183,7 +183,7 @@

    Tag: data replication

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/data serialization.html b/site/tags/data serialization.html index da3631d4..c5a52987 100644 --- a/site/tags/data serialization.html +++ b/site/tags/data serialization.html @@ -183,7 +183,7 @@

    Tag: data serialization

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/data structures.html b/site/tags/data structures.html index 7a08ed35..e91aba74 100644 --- a/site/tags/data structures.html +++ b/site/tags/data structures.html @@ -183,7 +183,7 @@

    Tag: data structures

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/data systems.html b/site/tags/data systems.html index 5f0e647c..6c07f78d 100644 --- a/site/tags/data systems.html +++ b/site/tags/data systems.html @@ -183,7 +183,7 @@

    Tag: data systems

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/deep learning.html b/site/tags/deep learning.html new file mode 100644 index 00000000..00cf7b6d --- /dev/null +++ b/site/tags/deep learning.html @@ -0,0 +1,197 @@ + + + + + + Tag: deep learning + + + + + +
    + +

    Tag: deep learning

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Tag: deep learning

    + +
    + +
    + + \ No newline at end of file diff --git a/site/tags/depth first search.html b/site/tags/depth first search.html index 066b4158..8785e5a4 100644 --- a/site/tags/depth first search.html +++ b/site/tags/depth first search.html @@ -183,7 +183,7 @@

    Tag: depth first search

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/depth-first search.html b/site/tags/depth-first search.html index 4574c901..6b8ffeca 100644 --- a/site/tags/depth-first search.html +++ b/site/tags/depth-first search.html @@ -183,7 +183,7 @@

    Tag: depth-first search

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/design.html b/site/tags/design.html new file mode 100644 index 00000000..9047f0c0 --- /dev/null +++ b/site/tags/design.html @@ -0,0 +1,198 @@ + + + + + + Tag: design + + + + + +
    + +

    Tag: design

    +
    + Last modified: 2025-01-14 + +
    + + +
    + + \ No newline at end of file diff --git a/site/tags/distributed filesystems.html b/site/tags/distributed filesystems.html index 05cfdfeb..6a4e96cc 100644 --- a/site/tags/distributed filesystems.html +++ b/site/tags/distributed filesystems.html @@ -183,7 +183,7 @@

    Tag: distributed filesystems

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/document databases.html b/site/tags/document databases.html index b19ce9a5..e327a9a1 100644 --- a/site/tags/document databases.html +++ b/site/tags/document databases.html @@ -183,7 +183,7 @@

    Tag: document databases

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/dynamic-programming.html b/site/tags/dynamic-programming.html index b02fb554..5ea9104d 100644 --- a/site/tags/dynamic-programming.html +++ b/site/tags/dynamic-programming.html @@ -183,7 +183,7 @@

    Tag: dynamic-programming

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/efficiency.html b/site/tags/efficiency.html index 90f54602..91592d58 100644 --- a/site/tags/efficiency.html +++ b/site/tags/efficiency.html @@ -183,7 +183,7 @@

    Tag: efficiency

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/encoding formats.html b/site/tags/encoding formats.html index 8146534c..b2ceae00 100644 --- a/site/tags/encoding formats.html +++ b/site/tags/encoding formats.html @@ -183,7 +183,7 @@

    Tag: encoding formats

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/end-to-end.html b/site/tags/end-to-end.html new file mode 100644 index 00000000..9b74a19d --- /dev/null +++ b/site/tags/end-to-end.html @@ -0,0 +1,197 @@ + + + + + + Tag: end-to-end + + + + + +
    + +

    Tag: end-to-end

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Tag: end-to-end

    + +
    + +
    + + \ No newline at end of file diff --git a/site/tags/etl.html b/site/tags/etl.html index ddd961a8..83e079d2 100644 --- a/site/tags/etl.html +++ b/site/tags/etl.html @@ -183,7 +183,7 @@

    Tag: etl

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/failover.html b/site/tags/failover.html index 91925ea4..9d18cc44 100644 --- a/site/tags/failover.html +++ b/site/tags/failover.html @@ -183,7 +183,7 @@

    Tag: failover

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/feedforward.html b/site/tags/feedforward.html new file mode 100644 index 00000000..41a5c62b --- /dev/null +++ b/site/tags/feedforward.html @@ -0,0 +1,197 @@ + + + + + + Tag: feedforward + + + + + +
    + +

    Tag: feedforward

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Tag: feedforward

    + +
    + +
    + + \ No newline at end of file diff --git a/site/tags/ford-fulkerson algorithm.html b/site/tags/ford-fulkerson algorithm.html index 3b02b7eb..07f81b5a 100644 --- a/site/tags/ford-fulkerson algorithm.html +++ b/site/tags/ford-fulkerson algorithm.html @@ -183,7 +183,7 @@

    Tag: ford-fulkerson algorithm

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/gale-shapley.html b/site/tags/gale-shapley.html index ffe60f61..7fc79361 100644 --- a/site/tags/gale-shapley.html +++ b/site/tags/gale-shapley.html @@ -183,7 +183,7 @@

    Tag: gale-shapley

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph coloring.html b/site/tags/graph coloring.html index b8a9fd9b..e9291e3c 100644 --- a/site/tags/graph coloring.html +++ b/site/tags/graph coloring.html @@ -183,7 +183,7 @@

    Tag: graph coloring

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph databases.html b/site/tags/graph databases.html index fdfe813e..066bb7a4 100644 --- a/site/tags/graph databases.html +++ b/site/tags/graph databases.html @@ -183,7 +183,7 @@

    Tag: graph databases

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph fundamentals.html b/site/tags/graph fundamentals.html index 27078645..e6c43e02 100644 --- a/site/tags/graph fundamentals.html +++ b/site/tags/graph fundamentals.html @@ -183,7 +183,7 @@

    Tag: graph fundamentals

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph properties.html b/site/tags/graph properties.html index 68fe57cd..91c4765f 100644 --- a/site/tags/graph properties.html +++ b/site/tags/graph properties.html @@ -183,7 +183,7 @@

    Tag: graph properties

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph representation.html b/site/tags/graph representation.html index 7dc9b8ca..227c5c2e 100644 --- a/site/tags/graph representation.html +++ b/site/tags/graph representation.html @@ -183,7 +183,7 @@

    Tag: graph representation

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph theory.html b/site/tags/graph theory.html index e3b8bb85..797acef2 100644 --- a/site/tags/graph theory.html +++ b/site/tags/graph theory.html @@ -183,7 +183,7 @@

    Tag: graph theory

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph traversal.html b/site/tags/graph traversal.html index 48164278..bc23ebb0 100644 --- a/site/tags/graph traversal.html +++ b/site/tags/graph traversal.html @@ -183,7 +183,7 @@

    Tag: graph traversal

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph-theory.html b/site/tags/graph-theory.html index 56c359a3..9afc6579 100644 --- a/site/tags/graph-theory.html +++ b/site/tags/graph-theory.html @@ -183,7 +183,7 @@

    Tag: graph-theory

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph-traversal.html b/site/tags/graph-traversal.html index 70179d15..51a078bc 100644 --- a/site/tags/graph-traversal.html +++ b/site/tags/graph-traversal.html @@ -183,7 +183,7 @@

    Tag: graph-traversal

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/graph.html b/site/tags/graph.html index d51b4155..577068ce 100644 --- a/site/tags/graph.html +++ b/site/tags/graph.html @@ -183,7 +183,7 @@

    Tag: graph

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/greedy-algorithms.html b/site/tags/greedy-algorithms.html index 10d12996..00b29a01 100644 --- a/site/tags/greedy-algorithms.html +++ b/site/tags/greedy-algorithms.html @@ -183,7 +183,7 @@

    Tag: greedy-algorithms

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/independent set.html b/site/tags/independent set.html index 0e5a9ffe..5220da6d 100644 --- a/site/tags/independent set.html +++ b/site/tags/independent set.html @@ -183,7 +183,7 @@

    Tag: independent set

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/index.html b/site/tags/index.html index bb3e527d..7b52e87f 100644 --- a/site/tags/index.html +++ b/site/tags/index.html @@ -182,7 +182,7 @@

    Tags

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    @@ -198,6 +198,7 @@

    Tags

  • bipartite matching (1 pages)
  • breadth-first search (2 pages)
  • caching (1 pages)
  • +
  • classification (1 pages)
  • column-oriented storage (1 pages)
  • compatibility (1 pages)
  • complexity analysis (1 pages)
  • @@ -210,15 +211,19 @@

    Tags

  • data serialization (1 pages)
  • data structures (2 pages)
  • data systems (1 pages)
  • +
  • deep learning (1 pages)
  • depth first search (1 pages)
  • depth-first search (1 pages)
  • +
  • design (2 pages)
  • distributed filesystems (1 pages)
  • document databases (1 pages)
  • dynamic-programming (1 pages)
  • efficiency (1 pages)
  • encoding formats (1 pages)
  • +
  • end-to-end (1 pages)
  • etl (1 pages)
  • failover (1 pages)
  • +
  • feedforward (1 pages)
  • ford-fulkerson algorithm (1 pages)
  • gale-shapley (1 pages)
  • graph (1 pages)
  • @@ -236,16 +241,22 @@

    Tags

  • indexing (1 pages)
  • induction (1 pages)
  • induction proofs (1 pages)
  • +
  • internet (1 pages)
  • interval (1 pages)
  • leader-follower model (1 pages)
  • linear programs (1 pages)
  • linear systems (1 pages)
  • +
  • machine learning (2 pages)
  • maintainability (1 pages)
  • mapreduce (1 pages)
  • matching (1 pages)
  • max flow min cut (1 pages)
  • message passing (1 pages)
  • meta (3 pages)
  • +
  • multinomial logistic regression (1 pages)
  • +
  • natural language processing (1 pages)
  • +
  • networking (1 pages)
  • +
  • neural networks (1 pages)
  • odd cycles (1 pages)
  • oltp vs olap (1 pages)
  • optimization (3 pages)
  • @@ -270,7 +281,8 @@

    Tags

  • spanning trees (1 pages)
  • stable matching (1 pages)
  • synchronous vs asynchronous (1 pages)
  • -
  • systems (1 pages)
  • +
  • system design (1 pages)
  • +
  • systems (2 pages)
  • template (1 pages)
  • time complexity (1 pages)
  • trees (1 pages)
  • diff --git a/site/tags/indexing.html b/site/tags/indexing.html index 006c580e..71e2aaf4 100644 --- a/site/tags/indexing.html +++ b/site/tags/indexing.html @@ -183,7 +183,7 @@

    Tag: indexing

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/induction proofs.html b/site/tags/induction proofs.html index 1a5a1126..7bf11a9b 100644 --- a/site/tags/induction proofs.html +++ b/site/tags/induction proofs.html @@ -183,7 +183,7 @@

    Tag: induction proofs

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/induction.html b/site/tags/induction.html index fe5dc95d..2d38e564 100644 --- a/site/tags/induction.html +++ b/site/tags/induction.html @@ -183,7 +183,7 @@

    Tag: induction

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/internet.html b/site/tags/internet.html new file mode 100644 index 00000000..6a231c7e --- /dev/null +++ b/site/tags/internet.html @@ -0,0 +1,197 @@ + + + + + + Tag: internet + + + + + +
    + +

    Tag: internet

    +
    + Last modified: 2025-01-14 + +
    + + +
    + + \ No newline at end of file diff --git a/site/tags/interval.html b/site/tags/interval.html index b260a281..aab9998e 100644 --- a/site/tags/interval.html +++ b/site/tags/interval.html @@ -183,7 +183,7 @@

    Tag: interval

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/leader-follower model.html b/site/tags/leader-follower model.html index 5991d812..d6bfd410 100644 --- a/site/tags/leader-follower model.html +++ b/site/tags/leader-follower model.html @@ -183,7 +183,7 @@

    Tag: leader-follower model

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/linear programs.html b/site/tags/linear programs.html index fd0ad323..0d855ad3 100644 --- a/site/tags/linear programs.html +++ b/site/tags/linear programs.html @@ -183,7 +183,7 @@

    Tag: linear programs

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/linear systems.html b/site/tags/linear systems.html index 607e77fb..2494e6dc 100644 --- a/site/tags/linear systems.html +++ b/site/tags/linear systems.html @@ -183,7 +183,7 @@

    Tag: linear systems

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/machine learning.html b/site/tags/machine learning.html new file mode 100644 index 00000000..434fd0e3 --- /dev/null +++ b/site/tags/machine learning.html @@ -0,0 +1,198 @@ + + + + + + Tag: machine learning + + + + + +
    + +

    Tag: machine learning

    +
    + Last modified: 2025-01-14 + +
    + + +
    + + \ No newline at end of file diff --git a/site/tags/maintainability.html b/site/tags/maintainability.html index 5d3c8815..a33e9c29 100644 --- a/site/tags/maintainability.html +++ b/site/tags/maintainability.html @@ -183,7 +183,7 @@

    Tag: maintainability

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/mapreduce.html b/site/tags/mapreduce.html index 6f8e5a16..ac43f1f1 100644 --- a/site/tags/mapreduce.html +++ b/site/tags/mapreduce.html @@ -183,7 +183,7 @@

    Tag: mapreduce

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/matching.html b/site/tags/matching.html index 74a609bc..f911f493 100644 --- a/site/tags/matching.html +++ b/site/tags/matching.html @@ -183,7 +183,7 @@

    Tag: matching

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/max flow min cut.html b/site/tags/max flow min cut.html index a0cf3146..1ad1688d 100644 --- a/site/tags/max flow min cut.html +++ b/site/tags/max flow min cut.html @@ -183,7 +183,7 @@

    Tag: max flow min cut

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/message passing.html b/site/tags/message passing.html index 23e123f7..46cb07ec 100644 --- a/site/tags/message passing.html +++ b/site/tags/message passing.html @@ -183,7 +183,7 @@

    Tag: message passing

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/meta.html b/site/tags/meta.html index 022cc784..34cbf617 100644 --- a/site/tags/meta.html +++ b/site/tags/meta.html @@ -183,7 +183,7 @@

    Tag: meta

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/multinomial logistic regression.html b/site/tags/multinomial logistic regression.html new file mode 100644 index 00000000..b7b843c0 --- /dev/null +++ b/site/tags/multinomial logistic regression.html @@ -0,0 +1,197 @@ + + + + + + Tag: multinomial logistic regression + + + + + +
    + +

    Tag: multinomial logistic regression

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Tag: multinomial logistic regression

    + +
    + +
    + + \ No newline at end of file diff --git a/site/tags/natural language processing.html b/site/tags/natural language processing.html new file mode 100644 index 00000000..123bff91 --- /dev/null +++ b/site/tags/natural language processing.html @@ -0,0 +1,197 @@ + + + + + + Tag: natural language processing + + + + + +
    + +

    Tag: natural language processing

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Tag: natural language processing

    + +
    + +
    + + \ No newline at end of file diff --git a/site/tags/networking.html b/site/tags/networking.html new file mode 100644 index 00000000..3698361b --- /dev/null +++ b/site/tags/networking.html @@ -0,0 +1,197 @@ + + + + + + Tag: networking + + + + + +
    + +

    Tag: networking

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Tag: networking

    + +
    + +
    + + \ No newline at end of file diff --git a/site/tags/neural networks.html b/site/tags/neural networks.html new file mode 100644 index 00000000..d8b6c777 --- /dev/null +++ b/site/tags/neural networks.html @@ -0,0 +1,197 @@ + + + + + + Tag: neural networks + + + + + +
    + +

    Tag: neural networks

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Tag: neural networks

    + +
    + +
    + + \ No newline at end of file diff --git a/site/tags/odd cycles.html b/site/tags/odd cycles.html index 801b6e49..05783dab 100644 --- a/site/tags/odd cycles.html +++ b/site/tags/odd cycles.html @@ -183,7 +183,7 @@

    Tag: odd cycles

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/oltp vs olap.html b/site/tags/oltp vs olap.html index adac100c..94fd0a5f 100644 --- a/site/tags/oltp vs olap.html +++ b/site/tags/oltp vs olap.html @@ -183,7 +183,7 @@

    Tag: oltp vs olap

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/optimization.html b/site/tags/optimization.html index 1340bbe3..9b73bb7c 100644 --- a/site/tags/optimization.html +++ b/site/tags/optimization.html @@ -183,7 +183,7 @@

    Tag: optimization

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/paper.html b/site/tags/paper.html index 28fe8ed8..45881d3b 100644 --- a/site/tags/paper.html +++ b/site/tags/paper.html @@ -183,7 +183,7 @@

    Tag: paper

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/partitioning.html b/site/tags/partitioning.html index f7963faf..a451f52a 100644 --- a/site/tags/partitioning.html +++ b/site/tags/partitioning.html @@ -183,7 +183,7 @@

    Tag: partitioning

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/performance.html b/site/tags/performance.html index 9b3e06a5..02374769 100644 --- a/site/tags/performance.html +++ b/site/tags/performance.html @@ -183,7 +183,7 @@

    Tag: performance

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/pigeonhole principle.html b/site/tags/pigeonhole principle.html index b26d1219..1b5195ae 100644 --- a/site/tags/pigeonhole principle.html +++ b/site/tags/pigeonhole principle.html @@ -183,7 +183,7 @@

    Tag: pigeonhole principle

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/problem-solving.html b/site/tags/problem-solving.html index 40d3ccbc..bd5ad1d0 100644 --- a/site/tags/problem-solving.html +++ b/site/tags/problem-solving.html @@ -183,7 +183,7 @@

    Tag: problem-solving

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/proof techniques.html b/site/tags/proof techniques.html index f07aba02..36d9923c 100644 --- a/site/tags/proof techniques.html +++ b/site/tags/proof techniques.html @@ -183,7 +183,7 @@

    Tag: proof techniques

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/query languages.html b/site/tags/query languages.html index 43fb1d08..f381a941 100644 --- a/site/tags/query languages.html +++ b/site/tags/query languages.html @@ -183,7 +183,7 @@

    Tag: query languages

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/relational databases.html b/site/tags/relational databases.html index ec03e5d4..1cb7507a 100644 --- a/site/tags/relational databases.html +++ b/site/tags/relational databases.html @@ -183,7 +183,7 @@

    Tag: relational databases

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/reliability.html b/site/tags/reliability.html index 74b5ad94..f48ba7df 100644 --- a/site/tags/reliability.html +++ b/site/tags/reliability.html @@ -183,7 +183,7 @@

    Tag: reliability

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/replication logs.html b/site/tags/replication logs.html index 062d09c8..25f8d643 100644 --- a/site/tags/replication logs.html +++ b/site/tags/replication logs.html @@ -183,7 +183,7 @@

    Tag: replication logs

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/research.html b/site/tags/research.html index ad974361..282efe34 100644 --- a/site/tags/research.html +++ b/site/tags/research.html @@ -183,7 +183,7 @@

    Tag: research

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/review.html b/site/tags/review.html index 4b104d32..62523aa0 100644 --- a/site/tags/review.html +++ b/site/tags/review.html @@ -183,7 +183,7 @@

    Tag: review

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/scalability.html b/site/tags/scalability.html index fdfc4a25..6a7dad0b 100644 --- a/site/tags/scalability.html +++ b/site/tags/scalability.html @@ -183,7 +183,7 @@

    Tag: scalability

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/scaling.html b/site/tags/scaling.html index 1d07ca95..c7d6987a 100644 --- a/site/tags/scaling.html +++ b/site/tags/scaling.html @@ -183,7 +183,7 @@

    Tag: scaling

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/scheduling.html b/site/tags/scheduling.html index 0747f171..e7ad6d65 100644 --- a/site/tags/scheduling.html +++ b/site/tags/scheduling.html @@ -183,7 +183,7 @@

    Tag: scheduling

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/schema evolution.html b/site/tags/schema evolution.html index 4265b859..e4f627f1 100644 --- a/site/tags/schema evolution.html +++ b/site/tags/schema evolution.html @@ -183,7 +183,7 @@

    Tag: schema evolution

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/set cover.html b/site/tags/set cover.html index a55c0357..968574a4 100644 --- a/site/tags/set cover.html +++ b/site/tags/set cover.html @@ -183,7 +183,7 @@

    Tag: set cover

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/shortest-paths.html b/site/tags/shortest-paths.html index 3bc7439e..e69cc6e5 100644 --- a/site/tags/shortest-paths.html +++ b/site/tags/shortest-paths.html @@ -183,7 +183,7 @@

    Tag: shortest-paths

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/spanning trees.html b/site/tags/spanning trees.html index 717b8ed4..1c906dc9 100644 --- a/site/tags/spanning trees.html +++ b/site/tags/spanning trees.html @@ -183,7 +183,7 @@

    Tag: spanning trees

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/stable matching.html b/site/tags/stable matching.html index 37cbb409..5ae36719 100644 --- a/site/tags/stable matching.html +++ b/site/tags/stable matching.html @@ -183,7 +183,7 @@

    Tag: stable matching

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/synchronous vs asynchronous.html b/site/tags/synchronous vs asynchronous.html index 6ecf66af..2778259a 100644 --- a/site/tags/synchronous vs asynchronous.html +++ b/site/tags/synchronous vs asynchronous.html @@ -183,7 +183,7 @@

    Tag: synchronous vs asynchronous

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/system design.html b/site/tags/system design.html new file mode 100644 index 00000000..243b2fc3 --- /dev/null +++ b/site/tags/system design.html @@ -0,0 +1,197 @@ + + + + + + Tag: system design + + + + + +
    + +

    Tag: system design

    +
    + Last modified: 2025-01-14 + +
    +
    +

    Tag: system design

    + +
    + +
    + + \ No newline at end of file diff --git a/site/tags/systems.html b/site/tags/systems.html index 3f567fa2..2fb06899 100644 --- a/site/tags/systems.html +++ b/site/tags/systems.html @@ -183,12 +183,13 @@

    Tag: systems

    - Last modified: 2025-01-13 + Last modified: 2025-01-14

    Tag: systems

    diff --git a/site/tags/template.html b/site/tags/template.html index 72dd07a7..2a08c432 100644 --- a/site/tags/template.html +++ b/site/tags/template.html @@ -183,7 +183,7 @@

    Tag: template

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/time complexity.html b/site/tags/time complexity.html index e6ce72cd..28258242 100644 --- a/site/tags/time complexity.html +++ b/site/tags/time complexity.html @@ -183,7 +183,7 @@

    Tag: time complexity

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/trees.html b/site/tags/trees.html index d37529b4..11bd7590 100644 --- a/site/tags/trees.html +++ b/site/tags/trees.html @@ -183,7 +183,7 @@

    Tag: trees

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/site/tags/vertex cover.html b/site/tags/vertex cover.html index 86ac4c4b..75e64ca5 100644 --- a/site/tags/vertex cover.html +++ b/site/tags/vertex cover.html @@ -183,7 +183,7 @@

    Tag: vertex cover

    - Last modified: 2025-01-13 + Last modified: 2025-01-14
    diff --git a/systems-research/end-to-end-arguments-in-sys-design.md b/systems-research/end-to-end-arguments-in-sys-design.md new file mode 100644 index 00000000..a9869d4b --- /dev/null +++ b/systems-research/end-to-end-arguments-in-sys-design.md @@ -0,0 +1,40 @@ +--- +title: End-to-End Arguments in System Design +category: System Design +tags: system design, end-to-end, design, networking +description: Paper review of "End-to-End Arguments in System Design" by Saltzer, Reed, and Clark +--- + +# [source](https://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf) + +###### End-to-End Arguments in System Design + +--- + +### What is the Problem? + +- Placing functions at lower levels of a system may not be beneficial + - Functions generally know best, and error checking can therefore be redundant + - Low level function placement may be costly +- A correct comms system can only be built with endpoints + - Ex: detecting crashes, delivering/sequencing messages, etc. + +### Summary + +#### Low Level Functionality + +- Paper argues that low-level functionality is mainly a performance optimization +- If the probability of an error is low, doesn't make sense to add error checking in the middle of the system. Instead, let the endpoints handle it + +### Weakness + +- Maintainability (checks missing in the middle) +- Certain systems ought to have intermediate checks (e.g. comms over lossy media) +- Catching errors can take longer, needs to make it all the way to an endpoint to detect + +### Open Questions + +- +- + +### Further Reading diff --git a/systems-research/hints-for-computer-system-design.md b/systems-research/hints-for-computer-system-design.md index 19d3186b..04731f9b 100644 --- a/systems-research/hints-for-computer-system-design.md +++ b/systems-research/hints-for-computer-system-design.md @@ -49,11 +49,64 @@ However, more complicated applications of caching exist. In real-time systems, y - Always be prepared to discard your prototype - Throw ideas at the wall and go with what sticks +#### Interface Design -### Open Questions +Conflicting requirements: +- Simple +- Complete +- Efficient -- -- +In a way it's a lot like PL design; exposing new abstractions, objects and operations, manipulating them, etc. + +KISS; Do one thing at a time and do it well. + +- Don't over-promise +- Get it right, but beware the dangers of abstractions (especially performance) +- Make it fast rather than general and complete. You should keep scope small so that it's easy to optimize, and also to compose with other systems/components +- Procedure args let you keep it general but extendable + - C function pointers, C++ functions + - `LD_PRELOAD` trick: override calls by providing a wrapper that calls the original function, but with some extra functionality +- Leave it to the client (check Exokernel paper) + - Unix pipes +- Keep interfaces stable + - Counterexample LLVM +- Keep a place to stand + - Virtualization!!! + +#### Implementation + +Plan to throw one away - learn from prototyping + +Keep secrets - impl details hidden from clients. Can be tradeoff for performance optimizations + +Handle normal and worst cases separately + +- Might be OK to crash a few processes if it means the system can recover +- Caches in processors are optimized for common case (principle of locality) +- Paging in virtual memory is optimized for common case (principle of locality) + +#### Efficiency + +- Split resources + - Faster to allocate a new resource than to wait for one to be freed + - Heterogeneous systems + - Specialized hardware like FPGA or GPU to run specialized tasks + - E.G. Google's TPU, Microsoft Azure FPGAs +- Use static analysis + +#### Reliability +- Log updates + - Can recover + - Append only is efficient + - Can be used for replication +- Atomic transactions + - E.G. ACID + +#### Takeaways + +- Most successful systems are built with particular themes, many of which are discussed in this paper +- When reading papers, look for what you can apply, and ignore irrelevant details. +- Hints can be added, e.g. approximation vs precision ### Further Reading diff --git a/systems-research/internet-design-philosophy.md b/systems-research/internet-design-philosophy.md new file mode 100644 index 00000000..6b2092fe --- /dev/null +++ b/systems-research/internet-design-philosophy.md @@ -0,0 +1,97 @@ +--- +title: Design Philosophy of DARPA Internet Protocols +category: networking +tags: internet, design, systems +description: A summary of the design philosophy of the DARPA internet protocols. +--- + +# [source](http://ccr.sigcomm.org/archive/1995/jan95/ccr-9501-clark.pdf) + +###### Design Philosophy of the DARPA Internet Protocols + +--- + +### What is the Problem? + + +### Summary + +**Fundamental Goal**: develop effective technique for **multiplexed** utilization of **interconnected** networks. + +**Multiplexing**: single channel used by many communicating parties +- Circuit-switching: dedicated channel for each pair of communicating parties, i.e. point-to-point comms + - Predictable performance because resources are "reserved" for each connection + - Inefficient use of resources, number of connections limited by number of channels. With `N` parties, `N(N-1)/2` channels needed. +- Packet-switching: packets from many parties share a single channel + - More efficient use of resources, but performance less predictable + - Packets can be lost, delayed, or delivered out of order + +At a high level, packet switching needs to happen in order to take advantage of the redundancy in paths between any two hosts. Connection at the transport layer can be established and maintained regardless of the underlying network topology, so long as a path exists between the two hosts. + +Another fundamental goal: connecting heterogeneous networks. The internet is a network of many different types of networks, each with its own protocols and addressing schemes. The internet protocols need to be able to connect these networks together/transmit data between them. + +Secondary Goals: + +- Continue despite failure +- Multiple types of communication +- Variety of networks +- Distributed management of resources + - Any centralized control would be a bottleneck + - Each network should be able to manage its own resources +- Cost effective +- Host attachment + - Hosts should be able to connect to the network without requiring changes to the network +- Accountability + - Hosts should be able to identify themselves to the network + - Quality of service should be able to be enforced + + +#### Datagrams + +- Connectionless service + - no state established ahead of time +- Key building block for switching +- UDP is app-level interface to datagram service of the internet + - building block for other protocols (TCP) +- Each packet is independent + +#### TCP vs. UDP + +- TCP: connection-oriented, reliable, in-order delivery +- UDP: connectionless, unreliable, unordered delivery (loss is OK) + - No QoS guarantees in lower-level + +#### Supporting Variety of Networks + +"Thin waist" of the internet/hourglass model: IP at the network layer, TCP/UDP at the transport layer. + +- IP: provides a common interface for all networks +- TCP/UDP: provides a common interface for all applications + +Abstraction hides details of lower layer, allowing whatever you want to happen at the lower level while the application remains unaware. + +Unfortunately, can also lead to some problems +- Can't use hints directly from lower level for optimizations + - Workarounds: ECN, dpdk + - Parallels in storage, e.g. direct access, spdk +- Can't evolve interface of IP without changing everything above it + +#### Fate Sharing + +Move state to endpoints for **survivability**. If a network fails, the endpoints can reestablish the connection. + +### Strengths + +- Simple idea of datagrams +- Scalable/distributed +- It works! + +### Weaknesses + +- Narrow IP interface hurts innovation at IP level +- Hiding secrets can hurt efficiency + + +### Further Reading + +- [Principles of Computer System Design](https://ocw.mit.edu/courses/res-6-004-principles-of-computer-system-design-an-introduction-spring-2009/pages/online-textbook/)