Re: Significantly larger toast tables on 8.4?

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, "Alex Hunsaker" <badalex(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Significantly larger toast tables on 8.4?
Date: 2009-01-02 17:44:38
Message-ID: 603c8f070901020944t7ff1c2ecg8da925bfe65401c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 2, 2009 at 11:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Stephen R. van den Berg" <srb(at)cuci(dot)nl> writes:
>> What seems to be hurting the most is the 1MB upper limit. What is the
>> rationale behind that limit?
>
> The argument was that compressing/decompressing such large chunks would
> require a lot of CPU effort; also it would defeat attempts to fetch
> subsections of a large string. In the past we've required people to
> explicitly "ALTER TABLE SET STORAGE external" if they wanted to make
> use of the substring-fetch optimization, but it was argued that this
> would make that more likely to work automatically.
>
> I'm not entirely convinced by Alex' analysis anyway; the only way
> those 39 large values explain the size difference is if they are
> *tremendously* compressible, like almost all zeroes. The toast
> compressor isn't so bright that it's likely to get 10X compression
> on typical data.

I've seen gzip approach 10X on what was basically a large
tab-separated values file, but I agree that some more experimentation
to determine the real cause of the problem would be useful.

I am a little mystified by the apparent double standard regarding
compressibility. My suggestion that we disable compression for
pg_statistic columns was perfunctorily shot down even though I
provided detailed performance results demonstrating that it greatly
sped up query planning on toasted statistics and even though the space
savings from compression in that case are bound to be tiny.

Here, we have a case where the space savings are potentially much
larger, and the only argument against it is that someone might be
disappointed in the performance of substring operations, if they
happen to do any. What if they know that they don't want to do any
and want to get compression? Even if the benefit is only 1.5X on
their data rather than 10X, that seems like a pretty sane and useful
thing to want to do. It's easy to shut off compression if you don't
want it; if the system makes an arbitrary decision to disable it, how
do you get it back?

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2009-01-02 18:01:55 Re: Documenting serializable vs snapshot isolation levels
Previous Message Simon Riggs 2009-01-02 17:35:09 Re: Latest version of Hot Standby patch