Re: Compression and on-disk sorting

From: Andrew Piskorski <atp(at)piskorski(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Greg Stark <gsstark(at)mit(dot)edu>
Subject: Re: Compression and on-disk sorting
Date: 2006-05-17 08:52:30
Message-ID: 20060517085230.GA53017@tehun.pair.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Tue, May 16, 2006 at 11:48:21PM -0400, Greg Stark wrote:

> There are some very fast decompression algorithms:
>
> http://www.oberhumer.com/opensource/lzo/

Sure, and for some tasks in PostgreSQL perhaps it would be useful.
But at least as of July 2005, a Sandor Heman, one of the MonetDB guys,
had looked at zlib, bzlib2, lzrw, and lzo, and claimed that:

"... in general, it is very unlikely that we could achieve any
bandwidth gains with these algorithms. LZRW and LZO might increase
bandwidth on relatively slow disk systems, with bandwidths up to
100MB/s, but this would induce high processing overheads, which
interferes with query execution. On a fast disk system, such as our
350MB/s 12 disk RAID, all the generic algorithms will fail to achieve
any speedup."

http://www.google.com/search?q=MonetDB+LZO+Heman&btnG=Search
http://homepages.cwi.nl/~heman/downloads/msthesis.pdf

> I think most of the mileage from "lookup tables" would be better implemented
> at a higher level by giving tools to data modellers that let them achieve
> denser data representations. Things like convenient enum data types, 1-bit
> boolean data types, short integer data types, etc.

Things like enums and 1 bit booleans certainly could be useful, but
they cannot take advantage of duplicate values across multiple rows at
all, even if 1000 rows have the exact same value in their "date"
column and are all in the same disk block, right?

Thus I suspect that the exact opposite is true, a good table
compression scheme would render special denser data types largely
redundant and obsolete.

Good table compression might be a lot harder to do, of course.
Certainly Oracle's implementation of it had some bugs which made it
difficult to use reliably in practice (in certain circumstances
updates could fail, or if not fail perhaps have pathological
performance), bugs which are supposed to be fixed in 10.2.0.2, which
was only released within the last few months.

--
Andrew Piskorski <atp(at)piskorski(dot)com>
http://www.piskorski.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-05-17 09:33:18 Re: does wal archiving block the current client connection?
Previous Message Martijn van Oosterhout 2006-05-17 08:45:59 Re: Compression and on-disk sorting

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2006-05-17 09:35:03 SLRU_BLCKSZ
Previous Message Martijn van Oosterhout 2006-05-17 08:45:59 Re: Compression and on-disk sorting