Video Patch Embeddings

class vformer.encoder.embedding.video_patch_embeddings.LinearVideoEmbedding(embedding_dim, patch_height, patch_width, patch_dim)[source]

Parameters

embedding_dim (int) – Dimension of the resultant embedding
patch_height (int) – Height of the patch
patch_width (int) – Width of the patch

forward(x)[source]

Parameters: x (torch.Tensor) – Input tensor
Returns: Returns patch embeddings of size embedding_dim
Return type: torch.Tensor

class vformer.encoder.embedding.video_patch_embeddings.TubeletEmbedding(embedding_dim, tubelet_t, tubelet_h, tubelet_w, in_channels)[source]

Parameters

embedding_dim (int) – Dimension of the resultant embedding
tubelet_t (int) – Temporal length of single tube/patch
tubelet_h (int) – Heigth of single tube/patch
tubelet_w (int) – Width of single tube/patch

forward(x)[source]

Parameters: x (Torch.tensor) – Input tensor