Vision Transformers for Dense Prediction

class vformer.models.dense.dpt.AddReadout(start_index=1)[source]

Handles readout operation when readout parameter is add. Removes cls_token or readout_token from tensor and adds it to the rest of tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vformer.models.dense.dpt.DPTDepth(backbone, in_channels=3, img_size=(384, 384), readout='project', hooks=(2, 5, 8, 11), channels_last=False, use_bn=False, enable_attention_hooks=False, non_negative=True, scale=1.0, shift=0.0, invert=False)[source]

Implementation of ” Vision Transformers for Dense Prediction ” https://arxiv.org/abs/2103.13413

Parameters
  • backbone (str) – Name of ViT model to be used as backbone, must be one of {vitb16,`vitl16`,`vit_tiny`}

  • in_channels (int) – Number of channels in input image, default is 3

  • img_size (tuple[int]) – Input image size, default is (384,384)

  • readout (str) – Method to handle the readout_token or cls_token Must be one of {add, ignore,`project`}, default is project

  • hooks (list[int]) – List representing index of encoder blocks on which hooks will be registered. These hooks extract features from different ViT blocks, eg attention, default is (2,5,8,11).

  • channels_last (bool) – Alters the memory format of storing tensors, default is False, For more information visit, this blogpost<https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html>

  • use_bn (bool) – If True, BatchNormalisation is used in FeatureFusionBlock_custom, default is False

  • enable_attention_hooks (bool) – If True, get_attention hook is registered, default is false

  • non_negative (bool) – If True, Relu operation will be applied in DPTDepth.model.head block, default is True

  • invert (bool) – If True, forward pass output of DPTDepth.model.head will be transformed (inverted) according to scale and shift parameters, default is False

  • scale (float) – Float value that will be multiplied with forward pass output from DPTDepth.model.head, default is 1.0

  • shift (float) – Float value that will be added with forward pass output from DPTDepth.model.head after scaling, default is 0.0

forward(x)[source]

Forward pass of DPTDepth

Parameters

x (torch.Tensor) – Input image tensor

forward_vit(x)[source]

Performs forward pass on backbone ViT model and fetches output from different encoder blocks with the help of hooks

Parameters

x (torch.Tensor) – Input image tensor

class vformer.models.dense.dpt.FeatureFusionBlock_custom(features, activation, deconv=False, bn=False, expand=False, align_corners=True)[source]

Feature fusion block.

forward(*xs)[source]

Forward pass

class vformer.models.dense.dpt.Interpolate(scale_factor, mode, align_corners=False)[source]

Interpolation module

Parameters
  • scale_factor (float) – Scaling factor used in interpolation

  • mode (str) – Interpolation mode

  • align_corners (bool) – Whether to align corners in Interpolation operation

forward(x)[source]

Forward pass

class vformer.models.dense.dpt.ProjectReadout(in_features, start_index=1)[source]

Another class that handles readout operation. Used when readout parameter is project

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vformer.models.dense.dpt.ResidualConvUnit_custom(features, activation=<class 'torch.nn.modules.activation.GELU'>, bn=True)[source]

Residual convolution module

Parameters
  • features (int) – Number of features

  • activation (nn.Module) – Activation module, default is nn.GELU

  • bn (bool) – Whether to use batch normalisation

forward(x)[source]

forward pass

class vformer.models.dense.dpt.Slice(start_index=1)[source]

Handles readout operation when readout parameter is ignore. Removes cls_token or readout_token by index slicing

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vformer.models.dense.dpt.Transpose(dim0, dim1)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.