diff --git a/natural-language-processing/neural-networks.md b/natural-language-processing/neural-networks.md index 70315c4..8c83693 100644 --- a/natural-language-processing/neural-networks.md +++ b/natural-language-processing/neural-networks.md @@ -217,4 +217,13 @@ $$ \frac{\partial L}{\partial a} &= \frac{\partial L}{\partial e} \cdot \frac{\partial e}{\partial a} = c \\ \frac{\partial L}{\partial b} &= \frac{\partial L}{\partial e} \cdot \frac{\partial e}{\partial d} \cdot \frac{\partial d}{\partial b} = 2c \end{align*} -$$ \ No newline at end of file +$$ + +### Learning details + +NN optimization is a non-convex optimization problem, so it requires a few techniques to work well: + +- Initialize weights and biases to small random values instead of all zeros +- Normalize input values to $\mu = 0, \sigma = 1$ +- Dropout: randomly (with probability $p$) set some hidden units to 0, then renormalize other inputs to prevent overfitting +- Hyperparameters: learning rate, mini-batch size, number of hidden units, number of layers, choice of activation function, etc. \ No newline at end of file