An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Transformer를 vision task에 접목시킨 모델인 ViT 논문을 리뷰해봤다.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | Notion

1. Introduction

ruddy-sheet-75d.notion.site

정리한 내용 중 오류가 있다면 댓글로 알려주시면 감사하겠습니다!

Neural Discrete Representation Learning (0)	2024.02.03
Score-based Generative Modeling through Stochastic Differential Equations (0)	2024.02.03
Generative Modeling by Estimating Gradients of the Data Distribution (0)	2024.01.31
Denoising Diffusion Probabilistic Models (0)	2024.01.24
Deep Residual Learning for Image Recognition (0)	2024.01.24

Deep Dive: AI Research