Simple Ways to Tackle Class Imbalance

In this report, we explore various methods used to counter class imbalance in image classification problems. To make the study more intuitive, we delve into the realms of binary classification. Instead of using a standard dataset with inherent class imbalance, we built a “synthetic” (not to confuse with GAN generated) dataset out of CIFAR-10 with two classes. We chose car and plane as the two classes. There is no particular reason for the choice, other than we being lazy 😴 and those two turning out to be the first two classes in the dataset.

In the CIFAR-10 dataset, each class consists of 5000 samples in the training set. We will call our dataset, the one with only two classes, the CIFAR-2 dataset for obvious reasons. The CIFAR-2 needs to have a stark data imbalance.

We opted to have the following data distribution – plane: 5000 samples (majority) and car: 50 samples (minority).

Here’s a quick overview of the methods that we have used to counter class imbalance:

  • Class Weighting

  • Over Sampling

  • Under Sampling

  • Two-Phase Learning(Experimental Trial) – Oversampling and Undersampling

🔥 Check out the report here.

💪 Check out Colab Notebook here.

This was co-written and implemented with Aritra.

Last updated