ConvVT

class vformer.encoder.convvt.ConvVTBlock(dim_in, dim_out, mlp_ratio=4.0, p_dropout=0.0, drop_path=0.0, drop_path_mode='batch', **kwargs)[source]

Implementation of a Attention MLP block in CVT

Parameters
  • dim_in (int) – Input dimensions

  • dim_out (int) – Output dimensions

  • num_heads (int) – Number of heads in attention

  • img_size (int) – Size of image

  • mlp_ratio (float) – Feature dimension expansion ratio in MLP, default is 4.

  • p_dropout (float) – Probability of dropout in MLP, default is 0.0

  • attn_dropout (float) – Probability of dropout in attention, default is 0.0

  • drop_path (float) – Probability of droppath, default is 0.0

  • with_cls_token (bool) – Whether to include classification token, default is False

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vformer.encoder.convvt.ConvVTStage(patch_size=7, patch_stride=4, patch_padding=0, in_channels=3, embedding_dim=64, depth=1, p_dropout=0.0, drop_path_rate=0.0, with_cls_token=False, init='trunc_norm', **kwargs)[source]

Implementation of a Stage in CVT

Parameters
  • patch_size (int) – Size of patch, default is 16

  • patch_stride (int) – Stride of patch, default is 4

  • patch_padding (int) – Padding for patch, default is 0

  • in_channels (int) – Number of input channels in image, default is 3

  • img_size (int) – Size of the image, default is 224

  • embedding_dim (int) – Embedding dimensions, default is 64

  • depth (int) – Number of CVT Attention blocks in each stage, default is 1

  • num_heads (int) – Number of heads in attention, default is 6

  • mlp_ratio (float) – Feature dimension expansion ratio in MLP, default is 4.0

  • p_dropout (float) – Probability of dropout in MLP, default is 0.0

  • attn_dropout (float) – Probability of dropout in attention, default is 0.0

  • drop_path_rate (float) – Probability for droppath, default is 0.0

  • with_cls_token (bool) – Whether to include classification token, default is False

  • kernel_size (int) – Size of kernel, default is 3

  • padding_q (int) – Size of padding in q, default is 1

  • padding_kv (int) – Size of padding in kv, default is 2

  • stride_kv (int) – Stride in kv, default is 2

  • stride_q (int) – Stride in q, default is 1

  • init (str) – Initialization method, one of {trunc_norm or xavier} default is trunc_norm

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.