Gated Positional Self Attention
- class vformer.attention.gated_positional.GatedPositionalSelfAttention(dim, num_heads=8, head_dim=64, p_dropout=0)[source]
Bases:
VanillaSelfAttention
Implementation of the Gated Positional Self-Attention from the paper: “ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases”
- Parameters
dim (int) – Dimension of the embedding
num_heads (int) – Number of the attention heads, default is 8
head_dim (int) – Dimension of each head, default is 64
p_dropout (float) – Dropout probability, default is 0.0
- forward(x)[source]
- Parameters
x (torch.Tensor) – Input tensor
- Returns
Returns output tensor by applying self-attention on input tensor
- Return type
torch.Tensor
- training: bool