class vformer.attention.window.WindowAttention(dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_dropout=0.0, proj_dropout=0.0)[source]

Bases: Module

  • dim (int) – Number of input channels.

  • window_size (int or tuple[int]) – The height and width of the window.

  • num_heads (int) – Number of attention heads.

  • qkv_bias (bool, default is True) – If True, add a learnable bias to query, key, value.

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set

  • attn_dropout (float, optional) – Dropout rate

  • proj_dropout (float, optional) – Dropout rate

forward(x, mask=None)[source]
  • x (torch.Tensor) – input Tensor

  • mask (torch.Tensor) – Attention mask used for shifted window attention, if None, window attention will be used, else attention mask will be taken into consideration. for better understanding you may refer this <>


Returns output tensor by applying Window-Attention or Shifted-Window-Attention on input tensor

Return type


training: bool