Metric Learning for Image Search

Metric learning is a broad field with many definitions to define it. Primarily, it aims to measure the similarity among data samples and to learn embedding models. In a familiar classification setting, we give our model some X and learn to predict its class.

In the context of metric learning to learn embedding models, the motivation is to embed X′s in an embedding space such that similar X′s are close together in that space while dissimilar ones are far away. We are often not interested in how the embedding space looks as long as the X′s we want to be close together(similar) form a cluster in that space.

Euclidean distance is a popular distance metric. One can argue that given images, we can represent it into vectors(abstract features) using a pre-trained image classifier and use euclidean distance to separate features. However, most practical data is not linear and requires task and dataset-specific distance metric. Thus metric learning aims at automatically constructing task-specific distance metrics.

The field of metric learning is incredibly important and useful because the distance metric/embeddings learned can be useful for many downstream tasks. In literature, metric learning can be tied to model pre-training.

Metric learning falls under three categories:

  • Supervised learning: The metric learning algorithm has access to a set of data points, each of them belonging to a class (label) as in a standard classification problem. This setting aims to learn distance metrics to put points in the same label close together.

  • Weakly supervised learning: The metric learning algorithm has access to a set of data points with supervision only at the tuple level - typically pairs, triplets, or quadruplets of data points. Siamese network with triplet loss is a popular example in this setting.

  • Unsupervised learning: The metric learning algorithm in this setting has only access to X′s. In recent times, the contrastive loss has gained much traction to learn the state of the art embeddings for downstream tasks. The recent developments in unsupervised visual representation can be tied to the success of metric learning.

In this report, we explore supervised metric learning and extend the same for image search.

🤠 Read the full report here.

​😼 Check out the Colab notebook here.

Last updated