Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Douglas McNaught <doug(at)mcnaught(dot)org>, "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, lar(at)quicklz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date: 2009-01-05 23:05:34
Message-ID: 871vvhjqv5.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:

> It seems to me that transparent file system compression doesn't have limits
> like "files must be less than 1 Mbyte to be compressed". They don't exhibit
> poor file system performance.

Well I imagine those implementations are more complex than toast is. I'm not
sure what lessons we can learn from their behaviour directly.

> I remember back in the 386/486 days, that I would always DriveSpace compress
> everything, because hard disks were so slow then that DriveSpace would
> actually increase performance.

Surely this depends on whether your machine was cpu starved or disk starved?
Do you happen to recall which camp these anecdotal machines from 1980 fell in?

> The toast tables already give a sort of block-addressable scheme.
> Compression can be on a per block or per set of blocks basis allowing for
> seek into the block,

The current toast architecture is that we compress the whole datum, then store
the datum either inline or using the same external blocking mechanism that we
use when not compressing. So this doesn't fit at all.

It does seem like an interesting idea to have toast chunks which are
compressed individually. So each chunk could be, say, an 8kb chunk of
plaintext and stored as whatever size it ends up being after compression. That
would allow us to do random access into external chunks as well as allow
overlaying the cpu costs of decompression with the i/o costs. It would get a
lower compression ratio than compressing the whole object together but we
would have to experiment to see how big a problem that was.

It would be pretty much rewriting the toast mechanism for external compressed
data though. Currently the storage and the compression are handled separately.
This would tie the two together in a separate code path.

Hm, It occurs to me we could almost use the existing code. Just store it as a
regular uncompressed external datum but allow the toaster to operate on the
data column (which it's normally not allowed to) to compress it, but not store
it externally.

> or if compression doesn't seem to be working for the first few blocks, the
> later blocks can be stored uncompressed? Or is that too complicated compared
> to what we have now? :-)

Actually we do that now, it was part of the same patch we're discussing.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen R. van den Berg 2009-01-05 23:11:37 Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Previous Message Jeff Davis 2009-01-05 22:58:18 Re: Status of issue 4593