class vformer.attention.convvt.ConvVTAttention(dim_in, dim_out, num_heads, img_size, attn_dropout=0.0, proj_dropout=0.0, method='dw_bn', kernel_size=3, stride_kv=1, stride_q=1, padding_kv=1, padding_q=1, with_cls_token=False)[source]

Bases: Module

Attention with Convolutional Projection

dim_in: int

Dimension of input tensor

dim_out: int

Dimension of output tensor

num_heads: int

Number of heads in attention

img_size: int

Size of image

attn_dropout: float

Probability of dropout in attention

proj_dropout: float

Probability of dropout in convolution projection

method: str (‘dw_bn’ for depth-wise convolution and batch norm, ‘avg’ for average pooling)

Method of projection

kernel_size: int

Size of kernel

stride_kv: int

Size of stride for key value

stride_q: int

Size of stride for query

padding_kv: int

Padding for key value

padding_q: int

Padding for query

with_cls_token: bool

Whether to include classification token


