Convvt

class vformer.encoder.embedding.convvt.ConvEmbedding(patch_size=7, in_channels=3, embedding_dim=64, stride=4, padding=2)[source]

This class converts images to tensors.

Parameters
  • patch_size (int, default is 7) – Size of a patch

  • in_channels (int, default is 3) – Number of input channels

  • embedding_dim (int, default is 64) – Dimension of hidden layer

  • stride (int or tuple, default is 4) – Stride of the convolution operation

  • padding (int, default is 2) – Padding to all sides of the input

forward(x)[source]
Parameters

x (torch.tensor) – Input tensor

Returns

Returns output tensor (embedding) by applying a convolution operations on input tensor

Return type

torch.Tensor