Vanilla Self Attention O(\(n^2\))

class vformer.attention.vanilla.VanillaSelfAttention(dim, num_heads=8, head_dim=64, p_dropout=0.0)[source]

Bases: Module

Vanilla O(\(n^2\)) Self attention introduced in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Parameters

dim (int) – Dimension of the embedding
num_heads (int) – Number of the attention heads
head_dim (int) – Dimension of each head
p_dropout (float) – Dropout Probability

forward(x)[source]

Parameters: x (torch.Tensor) – Input tensor
Returns: Returns output tensor by applying self-attention on input tensor
Return type: torch.Tensor

training: bool