- Linear function
- Activation function: to make the network non-linear and fit complex data
-
internal covariate shift
-
Probably Use Before the Activation
- It may be more appropriate to use batch normalization after the activation function if for s-shaped functions like the hyperbolic tangent and logistic function.
- It may be appropriate before the activation function for activations that may result in non-Gaussian distributions like the rectified linear activation function.
-
Use Large Learning Rates
- Using batch normalization makes the network more stable during training. This may require the use of much larger than normal learning rates, that in turn may further speed up the learning process.
- The faster training also means that the decay rate used for the learning rate may be increased.
-
Alternate to Data Preparation
- If the mean and standard deviations calculated for each input feature are calculated over the mini-batch instead of over the entire training dataset, then the batch size must be sufficiently representative of the range of each variable.
- It may not be appropriate for variables that have a data distribution that is highly non-Gaussian, in which case it might be better to perform data scaling as a pre-processing step.
8 layer feeds into the next layer and directly into the layers about 2–3 hops away.
- allow memory (or information) to flow from initial to last layers.
- The skip connections help to address the problem of vanishing and exploding gradients.
- For:
- Tabular datasets
- Classification prediction problems
- Regression prediction problems
- On:
- Image data
- Text Data
- Time series data
- Other types of data
- For:
- Image data
- Classification prediction problems
- Regression prediction problems
- ON:
- Text data
- Time series data
- Sequence input data
-
Some examples of sequence prediction problems include:
- One-to-Many: An observation as input mapped to a sequence with multiple steps as an output.
- Many-to-One: A sequence of multiple steps as input mapped to class or quantity prediction.
- Many-to-Many: A sequence of multiple steps as input mapped to a sequence with multiple steps as output.
-
For:
- Text data
- Speech data
- Classification prediction problems
- Regression prediction problems
- Generative models
-
Not for:
- Time series data
-
On:
- Time series data