Wednesday, November 21, 2007

How To: Advanced Queryable Compression (Continued from previous post)

Reducing I/O for a processing Pipeline is critical when processing high volume of data. By reducing your I/O you reduce the load on the network, Hard Drives, and CPU memory.

The following is a research paper from Vijayshankar Raman called:
“How to Wring a Table Dry: Entropy of Relations and Querying of Compressed Relations”. http://portal.acm.org/citation.cfm?id=1164201

Here is Vijayshankar Raman's website: http://domino.research.ibm.com/comm/research_people.nsf/pages/rshankar.index.html

Summary Coverage:
•Numeric Column compression
•Skew compression
•Correlating columns within a Tuple
•Incidental Tuple Ordering
•Tuple Correlations
•Delta Coding
•Huffman Code Scan Operations
•Segregated Coding
•Query Short Circuiting
•Results in 8x – 40x compression which still can be query-able without being decompressed

This is a continuation from a previous post:
http://dataglass.blogspot.com/2007/08/query-able-compression-no-need-to.html

No comments: