Vision Transformer (ViT)

Transformer-based architecture applied to images by splitting them into patches and processing them as sequences, often outperforming CNNs.

Advanced vision transformer arquitectura

Full definition

Transformer-based architecture applied to images by splitting them into patches and processing them as sequences, often outperforming CNNs.

Modern image classification models that use attention instead of convolutions.