Vision Transformers for Dense Prediction

class vformer.models.dense.dpt.AddReadout(start_index=1)[source]

Handles readout operation when readout parameter is add. Removes cls_token or readout_token from tensor and adds it to the rest of tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vformer.models.dense.dpt.DPTDepth(backbone, in_channels=3, img_size=(384, 384), readout='project', hooks=(2, 5, 8, 11), channels_last=False, use_bn=False, enable_attention_hooks=False, non_negative=True, scale=1.0, shift=0.0, invert=False)[source]

Implementation of ” Vision Transformers for Dense Prediction ” https://arxiv.org/abs/2103.13413

Parameters

backbone (str) – Name of ViT model to be used as backbone, must be one of {vitb16,`vitl16`,`vit_tiny`}
in_channels (int) – Number of channels in input image, default is 3
img_size (tuple[int]) – Input image size, default is (384,384)
readout (str) – Method to handle the readout_token or cls_token Must be one of {add, ignore,`project`}, default is project
hooks (list[int]) – List representing index of encoder blocks on which hooks will be registered. These hooks extract features from different ViT blocks, eg attention, default is (2,5,8,11).
channels_last (bool) – Alters the memory format of storing tensors, default is False, For more information visit, this blogpost<https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html>
use_bn (bool) – If True, BatchNormalisation is used in FeatureFusionBlock_custom, default is False
enable_attention_hooks (bool) – If True, get_attention hook is registered, default is false
non_negative (bool) – If True, Relu operation will be applied in DPTDepth.model.head block, default is True
invert (bool) – If True, forward pass output of DPTDepth.model.head will be transformed (inverted) according to scale and shift parameters, default is False
scale (float) – Float value that will be multiplied with forward pass output from DPTDepth.model.head, default is 1.0
shift (float) – Float value that will be added with forward pass output from DPTDepth.model.head after scaling, default is 0.0

forward(x)[source]

Forward pass of DPTDepth

Parameters: x (torch.Tensor) – Input image tensor

forward_vit(x)[source]

Performs forward pass on backbone ViT model and fetches output from different encoder blocks with the help of hooks

Parameters: x (torch.Tensor) – Input image tensor

class vformer.models.dense.dpt.FeatureFusionBlock_custom(features, activation, deconv=False, bn=False, expand=False, align_corners=True)[source]

Feature fusion block.

forward(*xs)[source]: Forward pass

class vformer.models.dense.dpt.Interpolate(scale_factor, mode, align_corners=False)[source]

Interpolation module

Parameters

scale_factor (float) – Scaling factor used in interpolation
mode (str) – Interpolation mode
align_corners (bool) – Whether to align corners in Interpolation operation

forward(x)[source]: Forward pass

class vformer.models.dense.dpt.ProjectReadout(in_features, start_index=1)[source]

Another class that handles readout operation. Used when readout parameter is project

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vformer.models.dense.dpt.ResidualConvUnit_custom(features, activation=<class 'torch.nn.modules.activation.GELU'>, bn=True)[source]

Residual convolution module

Parameters

features (int) – Number of features
activation (nn.Module) – Activation module, default is nn.GELU
bn (bool) – Whether to use batch normalisation

forward(x)[source]: forward pass

class vformer.models.dense.dpt.Slice(start_index=1)[source]

Handles readout operation when readout parameter is ignore. Removes cls_token or readout_token by index slicing

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vformer.models.dense.dpt.Transpose(dim0, dim1)[source]

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.