Unsupervised Visual Representation Learning with SwAV
Last updated
Was this helpful?
Last updated
Was this helpful?
Unsupervised visual representation learning is progressing at an exceptionally fast pace. Most of the modern training frameworks (SimCLR[1], BYOL[2], MoCo (V2)[3]) in this area make use of a self-supervised model pre-trained with some contrastive learning objective. Saying these frameworks perform great w.r.t supervised model pre-training would be an understatement, as evident from the figure below -
Figure 1: Top-1 accuracy of linear classifiers trained with the frozen features of different self-supervised methods w.r.t the fully supervised methods (Source: SwAV [4]).
Moreover, when the features learned using these different self-supervised methods are fine-tuned with as little as 1% and 10% of labeled training data show tremendous performance -
Figure 2: Performance of different semi-supervised and self-supervised frameworks on fine-tuning with very little labeled data (Source: SwAV [4]).
From the above two figures, it is clear that SwAV is currently prevailing in the results (SwAV was published in July 2020), and is currently the SoTA in self-supervised learning for visual recognition. This report will discuss the novel parts that make SwAV such a powerful self-supervised method along with short code walkthroughs.
We expect that you are already familiar with how self-supervised learning works at a high level. If not, this blog post by Jeremy Howard can help you get started.