Skip to content

Latest commit

 

History

History
78 lines (63 loc) · 3.38 KB

98_General_AI.md

File metadata and controls

78 lines (63 loc) · 3.38 KB

NN

  • Linear function
  • Activation function: to make the network non-linear and fit complex data

Linear Fun.

  • Multiplication of input data with the weight matrix and addition of the bias term for each layer. image

Activation: 1_Relu NN

image

Problem: The distribution of each layer's input changes during training

Solution: Fix the distribution

  • internal covariate shift

  • standardizes the inputs to a layer for each mini-batch
    image

  • Probably Use Before the Activation

    • It may be more appropriate to use batch normalization after the activation function if for s-shaped functions like the hyperbolic tangent and logistic function.
    • It may be appropriate before the activation function for activations that may result in non-Gaussian distributions like the rectified linear activation function.
  • Use Large Learning Rates

    • Using batch normalization makes the network more stable during training. This may require the use of much larger than normal learning rates, that in turn may further speed up the learning process.
    • The faster training also means that the decay rate used for the learning rate may be increased.
  • Alternate to Data Preparation

    • If the mean and standard deviations calculated for each input feature are calculated over the mini-batch instead of over the entire training dataset, then the batch size must be sufficiently representative of the range of each variable.
    • It may not be appropriate for variables that have a data distribution that is highly non-Gaussian, in which case it might be better to perform data scaling as a pre-processing step.

Residual block

8 layer feeds into the next layer and directly into the layers about 2–3 hops away.

  • allow memory (or information) to flow from initial to last layers.
  • The skip connections help to address the problem of vanishing and exploding gradients. image

WHEN/WHICH

MLPs: classical type of neural network

  • For:
    • Tabular datasets
    • Classification prediction problems
    • Regression prediction problems
  • On:
    • Image data
    • Text Data
    • Time series data
    • Other types of data

CNN: designed to map image data to an output variable.

  • For:
    • Image data
    • Classification prediction problems
    • Regression prediction problems
  • ON:
    • Text data
    • Time series data
    • Sequence input data

RNN: were designed to work with sequence prediction problems.

  • Some examples of sequence prediction problems include:

    • One-to-Many: An observation as input mapped to a sequence with multiple steps as an output.
    • Many-to-One: A sequence of multiple steps as input mapped to class or quantity prediction.
    • Many-to-Many: A sequence of multiple steps as input mapped to a sequence with multiple steps as output.
  • For:

    • Text data
    • Speech data
    • Classification prediction problems
    • Regression prediction problems
    • Generative models
  • Not for:

    • Time series data
  • On:

    • Time series data