Unsupervised Visual Representation Learning with SwAV

Unsupervised visual representation learning is progressing at an exceptionally fast pace. Most of the modern training frameworks (SimCLR[1], BYOL[2], MoCo (V2)[3]) in this area make use of a self-supervised model pre-trained with some contrastive learning objective. Saying these frameworks perform great w.r.t supervised model pre-training would be an understatement, as evident from the figure below -

Figure 1: Top-1 accuracy of linear classifiers trained with the frozen features of different self-supervised methods w.r.t the fully supervised methods (Source: SwAV [4]).

Moreover, when the features learned using these different self-supervised methods are fine-tuned with as little as 1% and 10% of labeled training data show tremendous performance -

Figure 2: Performance of different semi-supervised and self-supervised frameworks on fine-tuning with very little labeled data (Source: SwAV [4]).

From the above two figures, it is clear that SwAV is currently prevailing in the results (SwAV was published in July 2020), and is currently the SoTA in self-supervised learning for visual recognition. This report will discuss the novel parts that make SwAV such a powerful self-supervised method along with short code walkthroughs.

We expect that you are already familiar with how self-supervised learning works at a high level. If not, this blog post by Jeremy Howard can help you get started.

​😼 Check out the GitHub repo here.

Last updated