Compact Convolutional Transformer

class vformer.models.classification.cct.CCT(img_size=224, patch_size=4, in_channels=3, seq_pool=True, embedding_dim=768, num_layers=1, head_dim=96, num_heads=1, mlp_ratio=4.0, n_classes=1000, p_dropout=0.1, attn_dropout=0.1, drop_path=0.1, positional_embedding='learnable', decoder_config=(768, 1024), pooling_kernel_size=3, pooling_stride=2, pooling_padding=1)[source]

Implementation of Escaping the Big Data Paradigm with Compact Transformers

  • img_size (int) – Size of the image

  • patch_size (int) – Size of the single patch in the image

  • in_channels (int) – Number of input channels in image

  • seq_pool (bool) – Whether to use sequence pooling or not

  • embedding_dim (int) – Patch embedding dimension

  • num_layers (int) – Number of Encoders in encoder block

  • num_heads (int) – Number of heads in each transformer layer

  • mlp_ratio (float) – Ratio of mlp heads to embedding dimension

  • n_classes (int) – Number of classes for classification

  • p_dropout (float) – Dropout probability

  • attn_dropout (float) – Dropout probability

  • drop_path (float) – Stochastic depth rate, default is 0.1

  • positional_embedding (str) – One of the string values {'learnable', 'sine' , None}, default is 'learnable'.

  • decoder_config (tuple(int) or int) – Configuration of the decoder. If None, the default configuration is used.

  • pooling_kernel_size (int or tuple(int)) – Size of the kernel in MaxPooling operation

  • pooling_stride (int or tuple(int)) – Stride of MaxPooling operation

  • pooling_padding (int) – Padding in MaxPooling operation


x (torch.Tensor) – Input tensor


Returns tensor of size n_classes

Return type