Activation Function

Table of Content

Linear
Sigmoid
Hyperbolic Tangent
Rectified Linear Unit (ReLU)
Leaky ReLU
Softmax
Softplus

Concept

Non-linearities

Linear

Sigmoid (Logistic Function)

It squashes a vector in the range (0, 1). It is applied independently to each element of $\vec{z}$

$$ \operatorname{sigmoid}(z_i) = \frac{1}{1 + \exp(z_i)} $$

Gradient

$$ \frac{\partial E}{\partial w_i} = \sum_{d\in D}(t_d-o_d)\frac{\partial}{\partial w_i}(-o_d) = \cdots = \sum_{d\in D}(t_d-o_d)\operatorname{sig}(w\cdot x_d)(1-\operatorname{sig}(w \cdot x_d))\cdot x_{i,d} $$

Usage

Binary Classification

Hyperbolic Tangent

Tanh is just a rescaled and shifted sigmoid
Tanh often performs well for deep nets

Rectified Linear Unit

Leaky ReLU

Softmax

A "softened" version of the arg max. A generalization of the sigmoid function. An exponential follow by normalization.

Soft: continuous and differentiable
Max: arg max (its result is represented as a one-hot vector, is not continous or differentiable)

Purpose: To represent a probability distribution over a discrete variable with n possible values (over n different classes)

Requirement:

Each element of $\hat{y}_i$ be between 0 and 1
The entire vector sums to 1 (so that it represents a valid probability distribution)

Approach: (the same approach that worked for the Bernoulli distribution generalizes to the multinoulli distribution)

A linear layer predicts unnormalized log probabilities: (to be well-behaved for gradient-based optimization) $$ \vec{z} = W^T \vec{h} + \vec{b} $$ where $\vec{z}_i = \log \tilde{P}(y = i|\vec{x})$
Exponentiate and normalize $\vec{z}$ to obtain the desired $\hat{y}$ $$ \operatorname{softmax}(\vec{z})_i = \frac{\exp(z_i)}{\sum_j \exp(z_j)} $$

Derivatives:

$$ \frac {\partial \operatorname{softmax}({\vec {z}}){i}}{\partial x{j}}=\operatorname{softmax}({\vec {z}})_{i}(\delta {ij}-\operatorname{softmax}({\vec {z}}){j}) $$

use Kronecker delta $\delta_{ij}$

Usage

Multi-class Classification

Softplus

$$ y = \log(1 + e^x) $$

Resources

Wiki - Activation Function
- Softmax function

Article

Medium - Deep Learning: Overview of Neurons and Activation Functions

Softmax

The Softmax function and its derivative

Softplus

Softplus as a Neural Networks Activation Function - Sefik Ilkin Serengil

Github

eriklindernoren/ML-From-Scratch - Activation Function

Video

Youtube - Derivatives Of Activation Functions (C1W3L08)

Book

Deep Learning

Ch 6.2.2.3 Softmax Units for Multinoulli Output Distributions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activation_Function.md

Activation_Function.md

Activation Function

Concept

Non-linearities

Linear

Sigmoid (Logistic Function)

Hyperbolic Tangent

Rectified Linear Unit

Leaky ReLU

Softmax

Softplus

Resources

Article

Github

Video

Book

Files

Activation_Function.md

Latest commit

History

Activation_Function.md

File metadata and controls

Activation Function

Concept

Non-linearities

Linear

Sigmoid (Logistic Function)

Hyperbolic Tangent

Rectified Linear Unit

Leaky ReLU

Softmax

Softplus

Resources

Article

Github

Video

Book