Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, Douglas McNaught <doug(at)mcnaught(dot)org>, "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, lar(at)quicklz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date: 2009-01-06 00:34:33
Message-ID: 28811.1231202073@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Hm, It occurs to me we could almost use the existing code. Just store it as a
> regular uncompressed external datum but allow the toaster to operate on the
> data column (which it's normally not allowed to) to compress it, but not store
> it externally.

Yeah, it would be very easy to do that, but the issue then would be that
instead of having a lot of toast-chunk rows that are all carefully made
to fit exactly 4 to a page, you have a lot of toast-chunk rows of
varying size, and you are certainly going to waste some disk space due
to not being able to pack pages full. In the worst case you'd end up
with zero benefit from compression anyway. As an example, if all of
your 2K chunks compress by just under 20%, you get no savings because
you can't quite fit 5 to a page. You'd need an average compression rate
more than 20% to get savings.

We could improve that figure by making the chunk size smaller, but that
carries its own performance penalties (more seeks to fetch all of a
toasted value). Also, the smaller the chunks the worse the compression
will get.

It's an interesting idea, and would be easy to try so I hope someone
does test it out and see what happens. But I'm not expecting miracles.

I think a more realistic approach would be the one somebody suggested
upthread: split large values into say 1MB segments that are compressed
separately and then stored to TOAST separately. Substring fetches then
pay the overhead of decompressing 1MB segments that they might need only
part of, but at least they're not pulling out the whole gosh-darn value.
As long as the segment size isn't tiny, the added storage inefficiency
should be pretty minimal.

(How we'd ever do upgrade-in-place to any new compression scheme is an
interesting question too...)

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Eric B. Ridge 2009-01-06 00:41:42 Kudos on v8.4dev
Previous Message Tom Lane 2009-01-06 00:16:47 Re: Segfault on CVS HEAD