Vanilla O(n^2)

class vformer.attention.vanilla.VanillaSelfAttention(dim, num_heads=8, head_dim=64, p_dropout=0.0)[source]

Bases: Module

Vanilla O(n^2) Self attention

Parameters
  • dim (int) – Dimension of the embedding

  • num_heads (int) – Number of the attention heads

  • head_dim (int) – Dimension of each head

  • p_dropout (float) – Dropout Probability

forward(x)[source]
Parameters

x (torch.Tensor) – Input tensor

Returns

Returns output tensor by applying self-attention on input tensor

Return type

torch.Tensor

training: bool