ConvVT
- class vformer.attention.convvt.ConvVTAttention(dim_in, dim_out, num_heads, img_size, attn_dropout=0.0, proj_dropout=0.0, method='dw_bn', kernel_size=3, stride_kv=1, stride_q=1, padding_kv=1, padding_q=1, with_cls_token=False)[source]
Bases:
Module
Attention with Convolutional Projection
- dim_in: int
Dimension of input tensor
- dim_out: int
Dimension of output tensor
- num_heads: int
Number of heads in attention
- img_size: int
Size of image
- attn_dropout: float
Probability of dropout in attention
- proj_dropout: float
Probability of dropout in convolution projection
- method: str (‘dw_bn’ for depth-wise convolution and batch norm, ‘avg’ for average pooling)
Method of projection
- kernel_size: int
Size of kernel
- stride_kv: int
Size of stride for key value
- stride_q: int
Size of stride for query
- padding_kv: int
Padding for key value
- padding_q: int
Padding for query
- with_cls_token: bool
Whether to include classification token
- forward(x)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool