TODO: Create a Stub-like template which suggests people read the Corresponding Talk page as well.
TODO: Create an infobox for a publication (or just an article) and apply it to this paper.
A paper about the lightweight compression schemes used in Actian Vector (then MonetDB/X100), which allow also for adaptivity of compression scheme, pipeline-effective compression and decompression and other useful features.
WRITEME: Describe context for authoring this.
Take-home messages Edit
- Don't store raw DB data on disk, store the compressed form.
- Don't decompress entire pages into memory; only decompress small working sets into CPU cache.
- Don't compress an entire column; compress chunks of it independently to: 1. avoid global dictionary overflow. 2. Adapt compression to local features.
- Using exception-patching allows: 1. Accounting for distribution outliers 2. Decoding in tight loops with no branching
- Fast compression is also useful, not just fast decompression.
- With appropriate compression schemes, can hold as much as x25 as high TPC-H scale factors in memory.
- The exceptional values mechanism is usable as a skip-list into the compressed data.
- Compression schemes should (and can) allow for random-access by index into the compressed data.
- Sampling can be used to choose a compression scheme for a chunk of column data.
Concepts discussed Edit
TODO: Create a glossary template instead of using plain lists here.
DBMS data Compression schemes:
- FOR: Frame of reference; encode difference to constant value)
- DICT: Dictionary; encode indices into a list-of-values)
- DELTA: Differences; encode current value minus previous value
- PFOR: Patched FOR - like FOR, but with the decode result 'patched' with an exceptions pass
- PDICT: Patched DICT (see PFOR)
- PDELTA: Patched DELTA
DBMSes discussed Edit
The compression schemes known to be used (at the time, 2008) in some DBMSes were mentioned.