Re: Significantly larger toast tables on 8.4?

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
Cc: "Alex Hunsaker" <badalex(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Significantly larger toast tables on 8.4?
Date: 2009-01-02 21:43:40
Message-ID: 603c8f070901021343u509c3138v8434d516524445f1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 2, 2009 at 4:19 PM, Stephen R. van den Berg <srb(at)cuci(dot)nl> wrote:
> Robert Haas wrote:
>>On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb(at)cuci(dot)nl> wrote:
>>> Three things:
>>> a. Shouldn't it in theory be possible to have a decompression algorithm
>>> which is IO-bound because it decompresses faster than the disk can
>>> supply the data? (On common current hardware).
>>> b. Has the current algorithm been carefully benchmarked and/or optimised
>>> and/or chosen to fit the IO-bound target as close as possible?
>>> c. Are there any well-known pitfalls/objections which would prevent me from
>>> changing the algorithm to something more efficient (read: IO-bound)?
>
>>Any compression algorithm is going to require you to decompress the
>>entire string before extracting a substring at a given offset. When
>>the data is uncompressed, you can jump directly to the offset you want
>>to read. Even if the compression algorithm requires no overhead at
>>all, it's going to make the location of the data nondeterministic, and
>>therefore force additional disk reads.
>
> That shouldn't be insurmountable:
> - I currently have difficulty imagining applications that actually do
> lots of substring extractions from large compressible fields.
> The most likely operation would be a table which contains tsearch
> indexed large textfields, but those are unlikely to participate in
> a lot of substring extractions.

I completely agree. If your large text field has interior structure
with certain data items at certain positions, you'd presumably break
it into multiple fixed-width fields. If it doesn't, what's the use
case?

> - Even if substring operations would be likely, I could envision a compressed
> format which compresses in compressed chunks of say 64KB which can then
> be addressed randomly independently.

I think this would require some sort of indexing so that you could
find the page that contains the first bit of any particular chunk you
want to find, so it might be a bit complex to implement, and I expect
it would reduce compression ratios as well. I'm sure it could be
done, but I doubt it's worth the bother. If you're more concerned
about the speed with which you can access your data than the size of
it, you can and should turn compression off altogether.

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-01-02 22:36:31 Re: posix_fadvise v22
Previous Message Aidan Van Dyk 2009-01-02 21:33:34 Re: Several tags around PostgreSQL 7.1 broken