Gated Positional Self Attention

class vformer.attention.gated_positional.GatedPositionalSelfAttention(dim, num_heads=8, head_dim=64, p_dropout=0)[source]

Bases: VanillaSelfAttention

Implementation of the Gated Positional Self-Attention from the paper: “ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases”

Parameters
  • dim (int) – Dimension of the embedding

  • num_heads (int) – Number of the attention heads, default is 8

  • head_dim (int) – Dimension of each head, default is 64

  • p_dropout (float) – Dropout probability, default is 0.0

forward(x)[source]
Parameters

x (torch.Tensor) – Input tensor

Returns

Returns output tensor by applying self-attention on input tensor

Return type

torch.Tensor

rel_embedding(n)[source]
training: bool