ConvVT

class vformer.encoder.convvt.ConvVTBlock(dim_in, dim_out, mlp_ratio=4.0, p_dropout=0.0, drop_path=0.0, drop_path_mode='batch', **kwargs)[source]

Implementation of a Attention MLP block in CVT

dim_in: int

Input dimensions

dim_out: int

Output dimensions

num_heads: int

Number of heads in attention

img_size: int

Size of image

mlp_ratio: float

Feature dimension expansion ratio in MLP, default is 4.

p_dropout: float

Probability of dropout in MLP, default is 0.0

attn_dropout: float

Probability of dropout in attention, default is 0.0

drop_path: float

Probability of droppath, default is 0.0

with_cls_token: bool

Whether to include classification token, default is False

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vformer.encoder.convvt.ConvVTStage(patch_size=7, patch_stride=4, patch_padding=0, in_channels=3, embedding_dim=64, depth=1, p_dropout=0.0, drop_path_rate=0.0, with_cls_token=False, init='trunc_norm', **kwargs)[source]

Implementation of a Stage in CVT

Parameters
  • patch_size (int) – Size of patch, default is 16

  • patch_stride (int) – Stride of patch, default is 4

  • patch_padding (int) – Padding for patch, default is 0

  • in_channels (int) – Number of input channels in image, default is 3

  • img_size (int) – Size of the image, default is 224

  • embedding_dim (int) – Embedding dimensions, default is 64

  • depth (int) – Number of CVT Attention blocks in each stage, default is 1

  • num_heads (int) – Number of heads in attention, default is 6

  • mlp_ratio (float) – Feature dimension expansion ratio in MLP, default is 4.0

  • p_dropout (float) – Probability of dropout in MLP, default is 0.0

  • attn_dropout (float) – Probability of dropout in attention, default is 0.0

  • drop_path_rate (float) – Probability for droppath, default is 0.0

  • with_cls_token (bool) – Whether to include classification token, default is False

  • kernel_size (int) – Size of kernel, default is 3

  • padding_q (int) – Size of padding in q, default is 1

  • padding_kv (int) – Size of padding in kv, default is 2

  • stride_kv (int) – Stride in kv, default is 2

  • stride_q (int) – Stride in q, default is 1

  • init (str ('trunc_norm' or 'xavier')) – Initialization method, default is ‘trunc_norm’

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.