Re: pg_lzcompress strategy parameters

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jan Wieck" <JanWieck(at)Yahoo(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: pg_lzcompress strategy parameters
Date: 2007-08-05 17:41:59
Message-ID: 87bqdlkheg.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> This whole structure seems a bit broken, independently of whether the
> particular parameter values are good. If the compressor is given an
> input of 1000000 bytes and manages to compress it to 999999 bytes,
> we'll store it compressed, and pay for decompression cycles on every
> access, even though the I/O savings are nonexistent. That's not sane.

Especially given that uncompressed toasted data is quite a bit more flexible
in that it can handle substr() efficiently.

Thinking about it, if the datum is stored inline then a single byte saved is
at least theoretically helpful. If it's stored in a toast table then anything
less than 2k is pretty slim odds to be helpful at all even if the percentage
gain is pretty big.

I don't know what the right answer is yet but it looks to me like there does
need to be two strategies, one for inline toasted tuples and one for
externally toasted tuples.

Unfortunately that's not the way the toaster is structured. First it goes
through and compresses all the fields starting with the largest and then it
starts pushing out to external storage all the fields starting with the
largest remaining. It doesn't really know whether something's going to be
stored externally when it's compressing.

It seems to me that having a fairly high minimum percentage of 25% would get
pretty close to the intended behaviour. Small data which happens to be highly
compressible would only have to save 8-32 bytes to be compressed. Data over 8k
would have to save at least 2k or more to be compressed.

(Incidentally, this means what I said earlier about uselessly trying to
compress objects below 256 is even grosser than I realized. If you have a
single large object which even after compressing will be over the toast target
it will force *every* varlena to be considered for compression even though
they mostly can't be compressed. Considering a varlena smaller than 256 for
compression only costs a useless palloc, so it's not the end of the world but
still. It does seem kind of strange that a tuple which otherwise wouldn't be
toasted at all suddenly gets all its fields compressed if you add one more
field which ends up being stored externally.)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2007-08-05 18:07:19 Re: Strange file in cvs repo
Previous Message Gregory Stark 2007-08-05 17:34:00 Re: Document and/or remove unreachable code in tuptoaster.c from varvarlena patch