A Method for Unsupervised Learning of Invariant Representatives

Technology #16258

Questions about this technology? Ask a Technology Manager

Download Printable PDF

Image Gallery
Performance of the Inventors' recent model on Labeled Faces in the Wild, a same/different person task for faces seen in different poses and in the presence of clutter. A layer which builds invariance to translation, scaling, and limited in-plane rotation is followed by another which pools over variability induced by other transformations.
Professor Tomaso Poggio
Department of Brain and Cognitive Sciences, MIT
External Link (poggio-lab.mit.edu)
Joel Leibo
Department of Brain and Cognitive Sciences, MIT
Managed By
Daniel Dardani
MIT Technology Licensing Officer
Patent Protection

Methods and apparatus for learning representations

US Patent Pending 2015-0278635
Unsupervised learning of invariant representations
Theoretical Computer Science, June 2016, 633, pp. 112-121


The Inventors have developed new methods in the application of machine learning techniques that allow for significant extension of current convolutional deep learning networks. These networks are used widely for commercial purposes including image classification/artificial vision and speech recognition.

Problem Addressed  

Animal brains are far superior to present supervised learning algorithms in their ability to learn novel items and develop recognition from just a few labeled examples. A new frontier for machine learning and its applications is to form invariant representations of images and sounds (e.g. speech). These representations are unchanged despite transformations (e.g. translation, scale and rotation) to simplify learning and subsequent classification/recognition. The Inventors have developed a novel approach for automatic, unsupervised learning of transformations from unlabeled signals such as images and sounds.


This method is based on collecting a sequence (called an orbit) of each of n frames of k arbitrary objects/images/sounds/signals (called templates) undergoing a transformation. At run time, each signal to be recognized is represented by a signature, which is a vector comprised by the dot products of the signals with the n points of each of the k orbits. These measurements may be moments of the empirical distribution or other nonparametric estimates of it. In this way, a signal’s signature is selective and invariant with respect to a group of transformations.

For each template, the system can store all of its transformations and later obtain an invariant signature for new images without any explicit knowledge of the transformations. This implicit knowledge of the transformations allows the system to become automatically invariant to those transformations for new inputs as well as compute an invariant signature for a new object seen only once. This technique therefore allows for recognition from very few labeled examples and advances machine learning algorithms further toward human intelligence. 


  • Saves labor cost of manually labeling large data sets needed to train supervised learning algorithms
  • Method allows a drastic reduction in the required number of labeled examples