On Fri, Jan 2, 2009 at 11:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Stephen R. van den Berg" <srb(at)cuci(dot)nl> writes:
>> What seems to be hurting the most is the 1MB upper limit. What is the
>> rationale behind that limit?
> The argument was that compressing/decompressing such large chunks would
> require a lot of CPU effort; also it would defeat attempts to fetch
> subsections of a large string. In the past we've required people to
> explicitly "ALTER TABLE SET STORAGE external" if they wanted to make
> use of the substring-fetch optimization, but it was argued that this
> would make that more likely to work automatically.
> I'm not entirely convinced by Alex' analysis anyway; the only way
> those 39 large values explain the size difference is if they are
> *tremendously* compressible, like almost all zeroes. The toast
> compressor isn't so bright that it's likely to get 10X compression
> on typical data.
I've seen gzip approach 10X on what was basically a large
tab-separated values file, but I agree that some more experimentation
to determine the real cause of the problem would be useful.
I am a little mystified by the apparent double standard regarding
compressibility. My suggestion that we disable compression for
pg_statistic columns was perfunctorily shot down even though I
provided detailed performance results demonstrating that it greatly
sped up query planning on toasted statistics and even though the space
savings from compression in that case are bound to be tiny.
Here, we have a case where the space savings are potentially much
larger, and the only argument against it is that someone might be
disappointed in the performance of substring operations, if they
happen to do any. What if they know that they don't want to do any
and want to get compression? Even if the benefit is only 1.5X on
their data rather than 10X, that seems like a pretty sane and useful
thing to want to do. It's easy to shut off compression if you don't
want it; if the system makes an arbitrary decision to disable it, how
do you get it back?
In response to
pgsql-hackers by date
|Next:||From: Kevin Grittner||Date: 2009-01-02 18:01:55|
|Subject: Re: Documenting serializable vs snapshot isolationlevels|
|Previous:||From: Simon Riggs||Date: 2009-01-02 17:35:09|
|Subject: Re: Latest version of Hot Standby patch|