Quick Links

Re: Significantly larger toast tables on 8.4?

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
Cc:	"Alex Hunsaker" <badalex(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Significantly larger toast tables on 8.4?
Date:	2009-01-02 21:43:40
Message-ID:	603c8f070901021343u509c3138v8434d516524445f1@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jan 2, 2009 at 4:19 PM, Stephen R. van den Berg <srb(at)cuci(dot)nl> wrote:
> Robert Haas wrote:
>>On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb(at)cuci(dot)nl> wrote:
>>> Three things:
>>> a. Shouldn't it in theory be possible to have a decompression algorithm
>>> which is IO-bound because it decompresses faster than the disk can
>>> supply the data? (On common current hardware).
>>> b. Has the current algorithm been carefully benchmarked and/or optimised
>>> and/or chosen to fit the IO-bound target as close as possible?
>>> c. Are there any well-known pitfalls/objections which would prevent me from
>>> changing the algorithm to something more efficient (read: IO-bound)?
>
>>Any compression algorithm is going to require you to decompress the
>>entire string before extracting a substring at a given offset. When
>>the data is uncompressed, you can jump directly to the offset you want
>>to read. Even if the compression algorithm requires no overhead at
>>all, it's going to make the location of the data nondeterministic, and
>>therefore force additional disk reads.
>
> That shouldn't be insurmountable:
> - I currently have difficulty imagining applications that actually do
> lots of substring extractions from large compressible fields.
> The most likely operation would be a table which contains tsearch
> indexed large textfields, but those are unlikely to participate in
> a lot of substring extractions.

I completely agree. If your large text field has interior structure
with certain data items at certain positions, you'd presumably break
it into multiple fixed-width fields. If it doesn't, what's the use
case?

> - Even if substring operations would be likely, I could envision a compressed
> format which compresses in compressed chunks of say 64KB which can then
> be addressed randomly independently.

I think this would require some sort of indexing so that you could
find the page that contains the first bit of any particular chunk you
want to find, so it might be a bit complex to implement, and I expect
it would reduce compression ratios as well. I'm sure it could be
done, but I doubt it's worth the bother. If you're more concerned
about the speed with which you can access your data than the size of
it, you can and should turn compression off altogether.

...Robert

In response to

Re: Significantly larger toast tables on 8.4? at 2009-01-02 21:19:00 from Stephen R. van den Berg

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2009-01-02 22:36:31	Re: posix_fadvise v22
Previous Message	Aidan Van Dyk	2009-01-02 21:33:34	Re: Several tags around PostgreSQL 7.1 broken