Vanilla Transformer

class vformer.encoder.vanilla.VanillaEncoder(embedding_dim, depth, num_heads, head_dim, mlp_dim, p_dropout=0.0, attn_dropout=0.0, drop_path_rate=0.0, drop_path_mode='batch')[source]

Parameters

embedding_dim (int) – Dimension of the embedding
depth (int) – Number of self-attention layers
num_heads (int) – Number of the attention heads
head_dim (int) – Dimension of each head
mlp_dim (int) – Dimension of the hidden layer in the feed-forward layer
p_dropout (float) – Dropout Probability
attn_dropout (float) – Dropout Probability
drop_path_rate (float) – Stochastic drop path rate

forward(x)[source]

Parameters: x (torch.Tensor) –
Returns: Returns output tensor
Return type: torch.Tensor