Re: Compression and on-disk sorting

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Andrew Piskorski <atp(at)piskorski(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-17 03:48:21
Message-ID: 871wuts456.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Andrew Piskorski <atp(at)piskorski(dot)com> writes:

> The main tricks seem to be: One, EXTREMELY lightweight compression
> schemes - basically table lookups designed to be as cpu friendly as
> posible. Two, keep the data compressed in RAM as well so that you can
> also cache more of the data, and indeed keep it the compressed until
> as late in the CPU processing pipeline as possible.
>
> A corrolary of that is forget compression schemes like gzip - it
> reduces data size nicely but is far too slow on the cpu to be
> particularly useful in improving overall throughput rates.

There are some very fast decompression algorithms:

http://www.oberhumer.com/opensource/lzo/

I think most of the mileage from "lookup tables" would be better implemented
at a higher level by giving tools to data modellers that let them achieve
denser data representations. Things like convenient enum data types, 1-bit
boolean data types, short integer data types, etc.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-05-17 03:51:51 Re: PL/pgSQL 'i = i + 1' Syntax
Previous Message Tom Lane 2006-05-17 03:48:04 Re: audit table containing Select statements submitted

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-05-17 04:03:15 Re: Compression and on-disk sorting
Previous Message Bruce Momjian 2006-05-17 02:18:30 Re: [HACKERS] .pgpass file and unix domain sockets