Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)

From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Douglas McNaught <doug(at)mcnaught(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, lar(at)quicklz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date: 2009-01-05 23:11:37
Message-ID: 20090105231137.GB1251@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
>"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
>> The whole thing got started because Alex Hunsaker pointed out that his
>> database got a lot bigger because we disabled compression on columns >
>> 1MB. It seems like the obvious thing to do is turn it back on again.

>After poking around in those threads a bit, I think that the current
>threshold of 1MB was something I just made up on the fly (I did note
>that it needed tuning...). Perhaps something like 10MB would be a
>better default. Another possibility is to have different minimum
>compression rates for "small" and "large" datums.

As far as I can imagine, the following use cases apply:
a. Columnsize <= 2048 bytes without substring access.
b. Columnsize <= 2048 bytes with substring access.
c. Columnsize > 2048 bytes compressible without substring access (text).
d. Columnsize > 2048 bytes uncompressible with substring access (multimedia).

Can anyone think of another use case I missed here?

To cover those cases, the following solutions seem feasible:
Sa. Disable compression for this column (manually, by the DBA).
Sb. Check if the compression saves more than 20%, store uncompressed otherwise.
Sc. Check if the compression saves more than 20%, store uncompressed otherwise.
Sd. Check if the compression saves more than 20%, store uncompressed otherwise.

For Sb, Sc and Sd we should probably only check the first 256KB or so to
determine the expected savings.
--
Sincerely,
Stephen R. van den Berg.

"Well, if we're going to make a party of it, let's nibble Nobby's nuts!"

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2009-01-05 23:41:22 Re: Segfault on CVS HEAD
Previous Message Gregory Stark 2009-01-05 23:05:34 Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)