class vformer.attention.spatial.SpatialAttention(dim, num_heads, sr_ratio=1, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0, linear=False, act_fn=<class 'torch.nn.modules.activation.GELU'>)[source]

Bases: Module

Spatial Reduction Attention- Linear complexity attention layer

  • dim (int) – Dimension of the input tensor

  • num_heads (int) – Number of attention heads

  • sr_ratio (int) – Spatial Reduction ratio

  • qkv_bias (bool, default is True) – If True, add a learnable bias to query, key, value.

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set

  • attn_drop (float, optional) – Dropout rate

  • proj_drop (float, optional) – Dropout rate

  • linear (bool) – Whether to use linear Spatial attention,default is False

  • act_fn (nn.Module) – Activation function, default is False

forward(x, H, W)[source]
  • x (torch.Tensor) – Input tensor

  • H (int) – Height of image patches

  • W (int) – Width of image patches


Returns output tensor by applying spatial attention on input tensor

Return type


training: bool