On Fri, Jan 2, 2009 at 4:19 PM, Stephen R. van den Berg <srb(at)cuci(dot)nl> wrote:
> Robert Haas wrote:
>>On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb(at)cuci(dot)nl> wrote:
>>> Three things:
>>> a. Shouldn't it in theory be possible to have a decompression algorithm
>>> which is IO-bound because it decompresses faster than the disk can
>>> supply the data? (On common current hardware).
>>> b. Has the current algorithm been carefully benchmarked and/or optimised
>>> and/or chosen to fit the IO-bound target as close as possible?
>>> c. Are there any well-known pitfalls/objections which would prevent me from
>>> changing the algorithm to something more efficient (read: IO-bound)?
>>Any compression algorithm is going to require you to decompress the
>>entire string before extracting a substring at a given offset. When
>>the data is uncompressed, you can jump directly to the offset you want
>>to read. Even if the compression algorithm requires no overhead at
>>all, it's going to make the location of the data nondeterministic, and
>>therefore force additional disk reads.
> That shouldn't be insurmountable:
> - I currently have difficulty imagining applications that actually do
> lots of substring extractions from large compressible fields.
> The most likely operation would be a table which contains tsearch
> indexed large textfields, but those are unlikely to participate in
> a lot of substring extractions.
I completely agree. If your large text field has interior structure
with certain data items at certain positions, you'd presumably break
it into multiple fixed-width fields. If it doesn't, what's the use
> - Even if substring operations would be likely, I could envision a compressed
> format which compresses in compressed chunks of say 64KB which can then
> be addressed randomly independently.
I think this would require some sort of indexing so that you could
find the page that contains the first bit of any particular chunk you
want to find, so it might be a bit complex to implement, and I expect
it would reduce compression ratios as well. I'm sure it could be
done, but I doubt it's worth the bother. If you're more concerned
about the speed with which you can access your data than the size of
it, you can and should turn compression off altogether.
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2009-01-02 22:36:31|
|Subject: Re: posix_fadvise v22|
|Previous:||From: Aidan Van Dyk||Date: 2009-01-02 21:33:34|
|Subject: Re: Several tags around PostgreSQL 7.1 broken|