Layers are the fundamental building blocks for NLP models. They can be used to
assemble new tf.keras
layers or models.
-
MultiHeadAttention implements an optionally masked attention between query, key, value tensors as described in "Attention Is All You Need". If
from_tensor
andto_tensor
are the same, then this is self-attention. -
CachedAttention implements an attention layer with cache used for auto-agressive decoding.
-
MatMulWithMargin implements a matrix multiplication with margin layer used for training retrieval / ranking tasks, as described in "Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax".
-
MultiChannelAttention implements an variant of multi-head attention which can be used to merge multiple streams for cross-attentions.
-
TalkingHeadsAttention implements the talking heads attention, as decribed in "Talking-Heads Attention".
-
Transformer implements an optionally masked transformer as described in "Attention Is All You Need".
-
TransformerDecoderBlock TransformerDecoderBlock is made up of self multi-head attention, cross multi-head attention and feedforward network.
-
RandomFeatureGaussianProcess implements random feature-based Gaussian process described in "Random Features for Large-Scale Kernel Machines".
-
ReZeroTransformer implements Transformer with ReZero described in "ReZero is All You Need: Fast Convergence at Large Depth".
-
OnDeviceEmbedding implements efficient embedding lookups designed for TPU-based models.
-
PositionalEmbedding creates a positional embedding as described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".
-
SelfAttentionMask creates a 3D attention mask from a 2D tensor mask.
-
SpectralNormalization implements a tf.Wrapper that applies spectral normalization regularization to a given layer. See Spectral Norm Regularization for Improving the Generalizability of Deep Learning
-
MaskedSoftmax implements a softmax with an optional masking input. If no mask is provided to this layer, it performs a standard softmax; however, if a mask tensor is applied (which should be 1 in positions where the data should be allowed through, and 0 where the data should be masked), the output will have masked positions set to approximately zero.
-
MaskedLM
implements a masked language model. It assumes the embedding table variable is passed to it. -
ClassificationHead A pooling head over a sequence of embeddings, commonly used by classification tasks.
-
GaussianProcessClassificationHead A spectral-normalized neural Gaussian process (SNGP)-based classification head as described in "Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness".
-
GatedFeedforward implements the gated linear layer feedforward as described in "GLU Variants Improve Transformer".
-
MultiHeadRelativeAttention implements a variant of multi-head attention with support for relative position encodings as described in "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"(https://arxiv.org/abs/1901.02860). This also has extended support for segment-based attention, a re-parameterization introduced in "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (https://arxiv.org/abs/1906.08237).
-
TwoStreamRelativeAttention implements a variant of multi-head relative attention as described in "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (https://arxiv.org/abs/1906.08237). This takes in a query and content stream and applies self attention.
-
TransformerXL implements Transformer XL introduced in "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" (https://arxiv.org/abs/1901.02860). This contains
TransformerXLBlock
, a block containing either one or two stream relative self-attention as well as subsequent feedforward networks. It also containsTransformerXL
, which contains attention biases as well as multipleTransformerXLBlocks
. -
MobileBertEmbedding and MobileBertTransformer implement the embedding layer and also transformer layer proposed in the MobileBERT paper.
-
BertPackInputs and BertTokenizer and SentencepieceTokenizer implements the layer to tokenize raw text and pack them into the inputs for BERT models.