Contrastive Learning is a self-supervised learning paradigm that teaches a model to distinguish between similar and dissimilar data points. Instead of training a model to map an image to a fixed label (e.g., "Dog"), contrastive learning trains the model to ensure that two different views of the same dog are represented by similar vectors, while a view of a cat is represented by a distant vector.
The Objective: Similarity as Distance
The core goal is to learn an embedding function that maps raw data into a high-dimensional space where distance correlates with semantic similarity.
If and are similar, the cosine similarity of and should be close to 1. If they are different, it should be close to 0.
InfoNCE and the Contrastive Loss
The most common loss function used is InfoNCE (Information Noise-Contrastive Estimation). It treats the problem as a multi-class classification task where, given an anchor, the model must identify the single positive sample among many negatives:
Here, is a temperature parameter that controls the "sharpness" of the distribution. By minimizing this loss, the model effectively "pulls" the positive pair together and "pushes" the negatives away in the latent space.
Applications in Multimodal AI
Contrastive learning is the foundation of models like CLIP (Contrastive Language-Image Pre-training). In CLIP, the model is given a batch of image-text pairs. It uses a text encoder and an image encoder to project both into a shared space, then uses a contrastive loss to ensure the correct text matches the correct image.
This approach is powerful because it allows models to learn from raw, unlabelled data found on the internet (e.g., images with captions) rather than requiring manually annotated datasets. It creates a "universal" understanding that can be applied to zero-shot classification and search.
"Contrastive learning doesn't predict labels; it optimizes the geometry of the embedding space so that semantically related items cluster together."
Frequently Asked Questions
What is an 'Anchor' in contrastive learning?+
How are negative samples chosen?+
Join the EulerFold community
Track progress and collaborate on roadmaps with students worldwide.
Recommended Readings
The author of this article utilized generative AI (Google Gemini 3.1 Pro) to assist in part of the drafting and editing process.