Convvt
- class vformer.encoder.embedding.convvt.ConvEmbedding(patch_size=7, in_channels=3, embedding_dim=64, stride=4, padding=2)[source]
Projects image patches into embedding space using convolutional layer.
- Parameters
patch_size (int, default is 7) – Size of a patch
in_channels (int, default is 3) – Number of input channels
embedding_dim (int, default is 64) – Dimension of hidden layer
stride (int or tuple, default is 4) – Stride of the convolution operation
padding (int, default is 2) – Padding to all sides of the input