Video Patch Embeddings

class vformer.encoder.embedding.video_patch_embeddings.LinearVideoEmbedding(embedding_dim, patch_height, patch_width, patch_dim)[source]
Parameters
  • embedding_dim (int) – Dimension of the resultant embedding

  • patch_height (int) – Height of the patch

  • patch_width (int) – Width of the patch

forward(x)[source]
Parameters

x (torch.Tensor) – Input tensor

Returns

Returns patch embeddings of size embedding_dim

Return type

torch.Tensor

class vformer.encoder.embedding.video_patch_embeddings.TubeletEmbedding(embedding_dim, tubelet_t, tubelet_h, tubelet_w, in_channels)[source]
Parameters
  • embedding_dim (int) – Dimension of the resultant embedding

  • tubelet_t (int) – Temporal length of single tube/patch

  • tubelet_h (int) – Heigth of single tube/patch

  • tubelet_w (int) – Width of single tube/patch

forward(x)[source]
Parameters

x (Torch.tensor) – Input tensor