Compact Convolutional Transformer
- class vformer.models.classification.cct.CCT(img_size=224, patch_size=4, in_channels=3, seq_pool=True, embedding_dim=768, num_layers=1, head_dim=96, num_heads=1, mlp_ratio=4.0, n_classes=1000, p_dropout=0.1, attn_dropout=0.1, drop_path=0.1, positional_embedding='learnable', decoder_config=(768, 1024), pooling_kernel_size=3, pooling_stride=2, pooling_padding=1)[source]
Implementation of Escaping the Big Data Paradigm with Compact Transformers: https://arxiv.org/abs/2104.05704
- img_size: int
Size of the image
- patch_size: int
Size of the single patch in the image
- in_channels: int
Number of input channels in image
- seq_pool:bool
Whether to use sequence pooling or not
- embedding_dim: int
Patch embedding dimension
- num_layers: int
Number of Encoders in encoder block
- num_heads: int
Number of heads in each transformer layer
- mlp_ratio:float
Ratio of mlp heads to embedding dimension
- n_classes: int
Number of classes for classification
- p_dropout: float
Dropout probability
- attn_dropout: float
Dropout probability
- drop_path: float
Stochastic depth rate, default is 0.1
- positional_embedding: str
One of the string values {‘learnable’,’sine’,’None’}, default is learnable
- decoder_config: tuple(int) or int
Configuration of the decoder. If None, the default configuration is used.
- pooling_kernel_size: int or tuple(int)
Size of the kernel in MaxPooling operation
- pooling_stride: int or tuple(int)
Stride of MaxPooling operation
- pooling_padding: int
Padding in MaxPooling operation