Compact Convolutional Transformer

class vformer.models.classification.cct.CCT(img_size=224, patch_size=4, in_channels=3, seq_pool=True, embedding_dim=768, num_layers=1, head_dim=96, num_heads=1, mlp_ratio=4.0, n_classes=1000, p_dropout=0.1, attn_dropout=0.1, drop_path=0.1, positional_embedding='learnable', decoder_config=(768, 1024), pooling_kernel_size=3, pooling_stride=2, pooling_padding=1)[source]

Implementation of Escaping the Big Data Paradigm with Compact Transformers: https://arxiv.org/abs/2104.05704

img_size: int

Size of the image

patch_size: int

Size of the single patch in the image

in_channels: int

Number of input channels in image

seq_pool:bool

Whether to use sequence pooling or not

embedding_dim: int

Patch embedding dimension

num_layers: int

Number of Encoders in encoder block

num_heads: int

Number of heads in each transformer layer

mlp_ratio:float

Ratio of mlp heads to embedding dimension

n_classes: int

Number of classes for classification

p_dropout: float

Dropout probability

attn_dropout: float

Dropout probability

drop_path: float

Stochastic depth rate, default is 0.1

positional_embedding: str

One of the string values {‘learnable’,’sine’,’None’}, default is learnable

decoder_config: tuple(int) or int

Configuration of the decoder. If None, the default configuration is used.

pooling_kernel_size: int or tuple(int)

Size of the kernel in MaxPooling operation

pooling_stride: int or tuple(int)

Stride of MaxPooling operation

pooling_padding: int

Padding in MaxPooling operation

forward(x)[source]
Parameters

x (torch.Tensor) – Input tensor

Returns

Returns tensor of size n_classes

Return type

torch.Tensor