Segmentation

class vformer.models.dense.PVT.segmentation.PVTSegmentation(img_size=224, patch_size=[7, 3, 3, 3], in_channels=3, embedding_dims=[64, 128, 256, 512], num_heads=[1, 2, 4, 8], mlp_ratio=[4, 4, 4, 4], qkv_bias=False, qk_scale=None, p_dropout=0.0, attn_dropout=0.0, drop_path_rate=0.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], linear=False, out_channels=1, use_dwconv=False, ape=True, return_pyramid=False)[source]

Implementation of Pyramid Vision Transformer: https://arxiv.org/abs/2102.12122v1

Parameters
  • img_size (int) – Image size

  • patch_size (list(int)) – List of patch size

  • in_channels (int) – Input channels in image, default=3

  • embedding_dims (int) – Patch Embedding dimension

  • num_heads (tuple[int]) – Number of heads in each transformer layer

  • depths (tuple[int]) – Depth in each Transformer layer

  • mlp_ratio (float) – Ratio of mlp heads to embedding dimension

  • qkv_bias (bool, default= True) – Adds bias to the qkv if true

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 in Spatial Attention if set

  • p_dropout (float) – Dropout rate,default is 0.0

  • attn_dropout (float) – Attention dropout rate, default is 0.0

  • drop_path_rate (float) – Stochastic depth rate, default is 0.1

  • sr_ratios (float) – Spatial reduction ratio

  • linear (bool) – Whether to use linear spatial attention

  • use_dwconv (bool) – Whether to use Depth-wise convolutions in Overlap-patch embedding

  • ape (bool) – Whether to use absolute position embedding

  • return_pyramid (bool) – Whether to use all pyramid feature layers for up-sampling, default is False

forward(x)[source]
Parameters

x (torch.Tensor) – Input tensor

Returns

Returns output tensor

Return type

torch.Tensor

class vformer.models.dense.PVT.segmentation.PVTSegmentationV2(img_size=224, patch_size=[7, 3, 3, 3], in_channels=3, embedding_dims=[64, 128, 256, 512], num_heads=[1, 2, 4, 8], mlp_ratio=[4, 4, 4, 4], qkv_bias=False, qk_scale=0.0, p_dropout=0.0, attn_dropout=0.0, drop_path_rate=0.0, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], ape=False, use_dwconv=True, linear=False, return_pyramid=False)[source]

Implementation of Pyramid Vision Transformer - https://arxiv.org/abs/2102.12122v1

Parameters
  • img_size (int) – Image size

  • patch_size (list(int)) – List of patch size

  • in_channels (int) – Input channels in image, default=3

  • embedding_dims (int) – Patch Embedding dimension

  • num_heads (tuple[int]) – Number of heads in each transformer layer

  • depths (tuple[int]) – Depth in each Transformer layer

  • mlp_ratio (float) – Ratio of mlp heads to embedding dimension

  • qkv_bias (bool, default= True) – Adds bias to the qkv if true

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 in Spatial Attention if set

  • p_dropout (float,) – Dropout rate,default is 0.0

  • attn_dropout (float,) – Attention dropout rate, default is 0.0

  • drop_path_rate (float) – Stochastic depth rate, default is 0.1

  • sr_ratios (float) – Spatial reduction ratio

  • linear (bool) – Whether to use linear spatial attention, default is False

  • use_dwconv (bool) – Whether to use Depth-wise convolutions in Overlap-patch embedding, default is True

  • ape (bool) – Whether to use absolute position embedding, default is False

  • return_pyramid (bool) – Whether to use all pyramid feature layers for up-sampling, default is true