ConvVT

class vformer.encoder.convvt.ConvVTBlock(dim_in, dim_out, mlp_ratio=4.0, p_dropout=0.0, drop_path=0.0, drop_path_mode='batch', **kwargs)[source]

Implementation of a Attention MLP block in CVT

Parameters

dim_in (int) – Input dimensions
dim_out (int) – Output dimensions
num_heads (int) – Number of heads in attention
img_size (int) – Size of image
mlp_ratio (float) – Feature dimension expansion ratio in MLP, default is 4.
p_dropout (float) – Probability of dropout in MLP, default is 0.0
attn_dropout (float) – Probability of dropout in attention, default is 0.0
drop_path (float) – Probability of droppath, default is 0.0
with_cls_token (bool) – Whether to include classification token, default is False

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vformer.encoder.convvt.ConvVTStage(patch_size=7, patch_stride=4, patch_padding=0, in_channels=3, embedding_dim=64, depth=1, p_dropout=0.0, drop_path_rate=0.0, with_cls_token=False, init='trunc_norm', **kwargs)[source]

Implementation of a Stage in CVT

Parameters

patch_size (int) – Size of patch, default is 16
patch_stride (int) – Stride of patch, default is 4
patch_padding (int) – Padding for patch, default is 0
in_channels (int) – Number of input channels in image, default is 3
img_size (int) – Size of the image, default is 224
embedding_dim (int) – Embedding dimensions, default is 64
depth (int) – Number of CVT Attention blocks in each stage, default is 1
num_heads (int) – Number of heads in attention, default is 6
mlp_ratio (float) – Feature dimension expansion ratio in MLP, default is 4.0
p_dropout (float) – Probability of dropout in MLP, default is 0.0
attn_dropout (float) – Probability of dropout in attention, default is 0.0
drop_path_rate (float) – Probability for droppath, default is 0.0
with_cls_token (bool) – Whether to include classification token, default is False
kernel_size (int) – Size of kernel, default is 3
padding_q (int) – Size of padding in q, default is 1
padding_kv (int) – Size of padding in kv, default is 2
stride_kv (int) – Stride in kv, default is 2
stride_q (int) – Stride in q, default is 1
init (str) – Initialization method, one of {trunc_norm or xavier} default is trunc_norm

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.