Momentum Contrast for Unsupervised Visual Representation Learning
@article{He2019MomentumCF, title={Momentum Contrast for Unsupervised Visual Representation Learning}, author={Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross B. Girshick}, journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2019}, pages={9726-9735}, url={https://api.semanticscholar.org/CorpusID:207930212} }
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo…
Topics
Motion Compensation (opens in a new tab)Momentum Contrast (opens in a new tab)Unsupervised Visual Representation Learning (opens in a new tab)Dynamic Dictionary (opens in a new tab)Supervised Pre-training (opens in a new tab)Contrastive Learning (opens in a new tab)Momentum Update (opens in a new tab)Key Encoder (opens in a new tab)Moving-averaged Encoder (opens in a new tab)Memory Bank Mechanism (opens in a new tab)
11,065 Citations
Unsupervised Learning of Dense Visual Representations
- 2020
Computer Science
View-Agnostic Dense Representation (VADeR) is proposed for unsupervised learning of dense representations of pixelwise representations by forcing local features to remain constant over different viewing conditions through pixel-level contrastive learning.
When Does Contrastive Visual Representation Learning Work?
- 2022
Computer Science
Recent self-supervised representation learning techniques have largely closed the gap between supervised and unsupervised learning on ImageNet classification. While the particulars of pretraining on…
A Simple Framework for Contrastive Learning of Visual Representations
- 2020
Computer Science
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
- 2021
Computer Science
It is shown that learning additional invariances -- through the use of multi-scale cropping, stronger augmentations and nearest neighbors -- improves the representations and it is observed that MoCo learns spatially structured representations when trained with a multi-crop strategy.
Rethinking Image Mixture for Unsupervised Visual Representation Learning
- 2020
Computer Science
Despite its conceptual simplicity, it is shown empirically that with the simple solution -- image mixture, the authors can learn more robust visual representations from the transformed input, and the benefits of representations learned from this space can be inherited by the linear classification and downstream tasks.
Self-Supervised Visual Representation Learning from Hierarchical Grouping
- 2020
Computer Science
We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into…
Efficient Visual Pretraining with Contrastive Detection
- 2021
Computer Science
This work introduces a new self-supervised objective, contrastive detection, which tasks representations with identifying object-level features across augmentations, leading to state-of-the-art transfer accuracy on a variety of downstream tasks, while requiring up to 10× less pretraining.
Weakly Supervised Contrastive Learning
- 2021
Computer Science
A weakly supervised contrastive learning framework (WCL) based on two projection heads, one of which will perform the regular instance discrimination task, and the other head will use a graph-based method to explore similar samples and generate a weak label to pull the similar images closer.
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
- 2020
Computer Science
This paper proposes an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons, and uses a swapped prediction mechanism where it predicts the cluster assignment of a view from the representation of another view.
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
- 2021
Computer Science
The pixel-level pretext tasks are found to be effective for pre-training not only regular backbone networks but also head networks used for dense downstream tasks, and are complementary to instance-level contrastive methods.
65 References
A Simple Framework for Contrastive Learning of Visual Representations
- 2020
Computer Science
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Unsupervised Visual Representation Learning by Context Prediction
- 2015
Computer Science
It is demonstrated that the feature representation learned using this within-image context indeed captures visual similarity across images and allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset.
Unsupervised Representation Learning by Predicting Image Rotations
- 2018
Computer Science
This work proposes to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input, and demonstrates both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning.
Learning Features by Watching Objects Move
- 2017
Computer Science
Inspired by the human visual system, low-level motion-based grouping cues can be used to learn an effective visual representation that significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce.
Data-Efficient Image Recognition with Contrastive Predictive Coding
- 2020
Computer Science
This work revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations which make the variability in natural signals more predictable, and produces features which support state-of-the-art linear classification accuracy on the ImageNet dataset.
Revisiting Self-Supervised Visual Representation Learning
- 2019
Computer Science
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.
Local Aggregation for Unsupervised Learning of Visual Embeddings
- 2019
Computer Science
This work describes a method that trains an embedding function to maximize a metric of local aggregation, causing similar data instances to move together in the embedding space, while allowing dissimilar instances to separate.
Representation Learning with Contrastive Predictive Coding
- 2018
Computer Science, Mathematics
This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
Multi-task Self-Supervised Visual Learning
- 2017
Computer Science
The results show that deeper networks work better, and that combining tasks—even via a na¨ýve multihead architecture—always improves performance.
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
- 2016
Computer Science
A novel unsupervised learning approach to build features suitable for object detection and classification and to facilitate the transfer of features to other tasks, the context-free network (CFN), a siamese-ennead convolutional neural network is introduced.