class vformer.attention.convvt.ConvVTAttention(dim_in, dim_out, num_heads, img_size, attn_dropout=0.0, proj_dropout=0.0, method='dw_bn', kernel_size=3, stride_kv=1, stride_q=1, padding_kv=1, padding_q=1, with_cls_token=False)[source]

Bases: Module

Attention with Convolutional Projection

dim_in: int

Dimension of input tensor

dim_out: int

Dimension of output tensor

num_heads: int

Number of heads in attention

img_size: int

Size of image

attn_dropout: float

Probability of dropout in attention

proj_dropout: float

Probability of dropout in convolution projection

method: str (‘dw_bn’ for depth-wise convolution and batch norm, ‘avg’ for average pooling)

Method of projection

kernel_size: int

Size of kernel

stride_kv: int

Size of stride for key value

stride_q: int

Size of stride for query

padding_kv: int

Padding for key value

padding_q: int

Padding for query

with_cls_token: bool

Whether to include classification token


Defines the computation performed at every call.

Should be overridden by all subclasses.


Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool