An Effective Coreset Compression Algorithm for Large Scale Sensor Networks

Technology #15559

Questions about this technology? Ask a Technology Manager

Download Printable PDF

Categories
Inventors
Professor Daniela Rus
CSAIL, MIT
External Link (danielarus.csail.mit.edu)
Professor Dan Feldman
Department of Computer Science, University of Haifa
External Link (sites.hevra.haifa.ac.il)
Managed By
Daniel Dardani
MIT Technology Licensing Officer
Patent Protection

Data coreset compression

US Patent 9,286,312
Publications
An effective coreset compression algorithm for large scale sensor networks
Information Processing in Sensor Networks , April 2012, pp. 257-268
The single pixel GPS: learning big data signals from tiny coresets
Proceedings of the 20th International Conference on Advances in Geographic Information Systems, November 2016, pp. 23-32
Coresets for k-segmentation of streaming data
Advances in Neural Information Processing Systems, 2014, pp. 559-567

Applications

The Inventors present a data compression algorithm that employs a small coreset of a large sensor data set to yield a highly efficient compression system. This technology can be used in any applications where signal compression and simplification is desired, including services that handle video streams from wearable cameras, mobile sensors, GPS-based location and mapping, financial data and biological signals.

Problem Addressed  

Field-deployed sensor networks are collecting massive amounts of data in real-time across a wide range of applications, resulting in a great demand for systems that can support streaming and performing computation on this high frequency data. Methods that first store all the data from a given fielded sensor network struggle to analyze data in real time. Additionally, most analysis tools today are based on data mining algorithms that can only handle blocks of static data on the order of a few gigabytes, that fit in the internal memory. The Inventors’ approach employs coresets, which are small sets that approximately represents the original data, to achieve an efficient method of data compression. Running queries or fitting models on the coreset will yield similar results when applied to the original data set.

Technology

The Inventors use techniques from computational geometry to solve problems in data and signal compression using coresets. The input to the coreset algorithm is a constant e > 0, and a set P of n points in representing n signals from d sensors) that can be approximated by k segments. The algorithm returns a coreset of O(k) points, whose size is independent of the original input n, such that the Hausdorff Euclidean distance from P to any given set of points is preserved up to an additive error of e. This compression guarantees an approximation error for any query.

These coresets can be constructed in parallel, as well as in a streaming setting where data points arrive one by one, in which it would be otherwise impossible to remember the entire data set due to memory constraints. Additionally, they can be computed at one pass over streaming data and when combined with map-reduce techniques can yield a system capable of compressing a stream of O(n) points using only O(log n) space and update time.

Advantages

  • Insertion time per new observation in coreset and required memory is only linear in both the dimension of the data, and the number k of segments.
  • Approach achieves dimensionality reduction of very large scale sparse data sets by solving time and space limitations