Paper:Super-scalar RAM-CPU cache compression

TODO: Create a Stub-like template which suggests people read the Corresponding Talk page as well.

TODO: Create an infobox for a publication (or just an article) and apply it to this paper.

A paper about the lightweight compression schemes used in Actian Vector (then MonetDB/X100), which allow also for adaptivity of compression scheme, pipeline-effective compression and decompression and other useful features.

WRITEME: Describe context for authoring this.

Take-home messages

 * Don't store raw DB data on disk, store the compressed form.
 * Don't decompress entire pages into memory; only decompress small working sets into CPU cache.
 * Don't compress an entire column; compress chunks of it independently to: 1. avoid global dictionary overflow. 2. Adapt compression to local features.
 * Using exception-patching allows: 1. Accounting for distribution outliers 2. Decoding in tight loops with no branching
 * Fast compression is also useful, not just fast decompression.
 * With appropriate compression schemes, can hold as much as x25 as high TPC-H scale factors in memory.
 * The exceptional values mechanism is usable as a skip-list into the compressed data.
 * Compression schemes should (and can) allow for random-access by index into the compressed data.
 * Sampling can be used to choose a compression scheme for a chunk of column data.

Concepts discussed
TODO: Create a glossary template instead of using plain lists here.

DBMS data Compression schemes:
 * FOR:     Frame of reference; encode difference to constant value)
 * DICT:    Dictionary; encode indices into a list-of-values)
 * DELTA:   Differences; encode current value minus previous value
 * PFOR:    Patched FOR - like FOR, but with the decode result 'patched' with an exceptions pass
 * PDICT:   Patched DICT (see PFOR)
 * PDELTA:  Patched DELTA

DBMSes discussed
The compression schemes known to be used (at the time, 2008) in some DBMSes were mentioned.
 * IBM DB2: Drops pointer prefixes in B-trees
 * Teradata: Dictionary compression for columns
 * Oracle: Dictionary compression for disk storage blocks
 * Sybase IQ:  Multi-scheme compression, each 'page' compressed separately with its own scheme