Swin Transformer

class vformer.models.classification.swin.SwinTransformer(img_size, patch_size, in_channels, n_classes, embedding_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=8, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, p_dropout=0.0, attn_dropout=0.0, drop_path_rate=0.1, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, ape=True, decoder_config=None, patch_norm=True)[source]

Implementation of Swin Transformer: Hierarchical Vision Transformer using Shifted Windows <https://arxiv.org/abs/2103.14030v1>`_

  • img_size (int) – Size of an Image

  • patch_size (int) – Patch Size

  • in_channels (int) – Input channels in image, default=3

  • n_classes (int) – Number of classes for classification

  • embedding_dim (int) – Patch Embedding dimension

  • depths (tuple[int]) – Depth in each Transformer layer

  • num_heads (tuple[int]) – Number of heads in each transformer layer

  • window_size (int) – Window Size

  • mlp_ratio (float) – Ratio of mlp heads to embedding dimension

  • qkv_bias (bool, default= True) – Adds bias to the qkv if true

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 in Window Attention if set

  • p_dropout (float) – Dropout rate, default is 0.0

  • attn_dropout (float) – Attention dropout rate,default is 0.0

  • drop_path_rate (float) – Stochastic depth rate, default is 0.1

  • norm_layer (nn.Module) – Normalization layer,default is nn.LayerNorm

  • ape (bool, optional) – Whether to add relative/absolute position embedding to patch embedding, default is True

  • decoder_config (int or tuple[int], optional) – Configuration of the decoder. If None, the default configuration is used.

  • patch_norm (bool, optional) – Whether to add Normalization layer in PatchEmbedding, default is True


x (torch.Tensor) – Input tensor


Returns tensor of size n_classes

Return type