ConViT

class vformer.encoder.convit.ConViTEncoder(embedding_dim, depth, num_heads, head_dim, mlp_dim, p_dropout=0, attn_dropout=0, drop_path_rate=0, drop_path_mode='batch')[source]
Parameters
  • embedding_dim (int) – Dimension of the embedding

  • depth (int) – Number of self-attention layers

  • num_heads (int) – Number of the attention heads

  • head_dim (int) – Dimension of each head

  • mlp_dim (int) – Dimension of the hidden layer in the feed-forward layer

  • p_dropout (float) – Dropout Probability

  • attn_dropout (float) – Dropout Probability

  • drop_path_rate (float) – Stochastic drop path rate