Swin
- class vformer.encoder.swin.SwinEncoder(dim, input_resolution, depth, num_heads, window_size, mlp_ratio=4.0, qkv_bias=True, qkv_scale=None, p_dropout=0.0, attn_dropout=0.0, drop_path=0.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, downsample=None, use_checkpoint=False)[source]
- dim: int
Number of input channels.
- input_resolution: tuple[int]
Input resolution.
- depth: int
Number of blocks.
- num_heads: int
Number of attention heads.
- window_size: int
Local window size.
- mlp_ratio: float
Ratio of MLP hidden dim to embedding dim.
- qkv_bias: bool, default is True
Whether to add a bias vector to the q,k, and v matrices
- qk_scale: float, optional
Override default qk scale of head_dim ** -0.5 in Window Attention if set
- p_dropout: float,
Dropout rate.
- attn_dropout: float, optional
Attention dropout rate
- drop_path_rate: float or tuple[float]
Stochastic depth rate.
- norm_layer: nn.Module
Normalization layer. default is nn.LayerNorm
- downsample: nn.Module, optional
Downsample layer(like PatchMerging) at the end of the layer, default is None
- class vformer.encoder.swin.SwinEncoderBlock(dim, input_resolution, num_heads, window_size=7, shift_size=0, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, p_dropout=0.0, attn_dropout=0.0, drop_path_rate=0.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, drop_path_mode='batch')[source]
- Parameters
dim (int) – Number of the input channels
input_resolution (int or tuple[int]) – Input resolution of patches
num_heads (int) – Number of attention heads
window_size (int) – Window size
shift_size (int) – Shift size for Shifted Window Masked Self Attention (SW_MSA)
mlp_ratio (float) – Ratio of MLP hidden dimension to embedding dimension
qkv_bias (bool, default= True) – Whether to add a bias vector to the q,k, and v matrices
qk_scale (float, Optional) –
p_dropout (float) – Dropout rate
attn_dropout (float) – Dropout rate
drop_path_rate (float) – Stochastic depth rate
norm_layer (nn.Module) – Normalization layer, default is nn.LayerNorm