Re: Significantly larger toast tables on 8.4?

From: "Gregory Maxwell" <gmaxwell(at)gmail(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, "Alex Hunsaker" <badalex(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Significantly larger toast tables on 8.4?
Date: 2009-01-07 14:44:51
Message-ID: e692861c0901070644y6f55f441gb39397ab4aca736b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 2, 2009 at 5:48 PM, Martijn van Oosterhout
<kleptog(at)svana(dot)org> wrote:
> So you compromise. You split the data into say 1MB blobs and compress
> each individually. Then if someone does a substring at offset 3MB you
> can find it quickly. This barely costs you anything in the compression
> ratio mostly.
>
> Implementation though, that's harder. The size of the blobs is tunable
> also. I imagine the optimal value will probably be around 100KB. (12
> blocks uncompressed).

Or have the database do that internally: With the available fast
compression algorithms (zlib; lzo; lzf; etc) the diminishing return
from larger compression block sizes kicks in rather quickly. Other
algos like LZMA or BZIP gain more from bigger block sizes, but I
expect all of them are too slow to ever consider using in PostgreSQL.

So, I expect that the compression loss from compressing in chunks of
64kbytes would be minimal. The database could then include a list of
offsets for the 64kbyte chunks at the beginning of the field, or
something like that. A short substring would then require
decompressing just one or two blocks, far less overhead then
decompressing everything.

It would probably be worthwhile to graph compression ratio vs block
size for some reasonable input. I'd offer to do it; but I doubt I
have a reasonable test set for this.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-01-07 14:47:07 Re: Multiplexing SUGUSR1
Previous Message Tom Lane 2009-01-07 14:44:09 Re: reducing statistics write overhead