Re: [PATCH] Compression and on-disk sorting

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCH] Compression and on-disk sorting
Date: 2006-05-18 11:42:01
Message-ID: 20060518114201.GB4359@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Thu, May 18, 2006 at 11:34:36AM +0100, Simon Riggs wrote:
> Just do a Z_FULL_FLUSH when you hit end of block. That way all blocks
> will be independent of each other and you can rewind as much as you
> like. We can choose the block size to be 32KB or even 64KB, there's no
> dependency there, just memory allocation. It should be pretty simple to
> make the block size variable at run time, so we can select it according
> to how many files and how much memory we have.

If you know you don't need to seek, there's no need to block the data
at all, one long stream is fine. So that case is easy.

For seeking, you need more work. I assume you're talking about 32KB
input block sizes (uncompressed). The output blocks will be of variable
size. These compressed blocks would be divided up into fixed 8K blocks
and written to disk.

To allow seeking, you'd have to do something like a header comtaining:

- length of previous compressed block
- length of this compressed block
- offset of block in uncompressed bytes (from beginning of tape)

This would allow you to scan backwards and forwards. If you want to be
able to jump to anywhere in the file, you may be better off storing the
file offsets (which would be implicit if the blocks are 32KB) in the
indirect blocks, using a search to find the right block, and then a
header in the block to find the offset.

Still, I'd like some evidence of benefits before writing up something
like that.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-05-18 13:45:54 Re: does wal archiving block the current client connection?
Previous Message Robert Treat 2006-05-18 11:26:16 Re: Google and the Beta Freeze

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2006-05-18 16:07:05 Re: Compression and on-disk sorting
Previous Message Simon Riggs 2006-05-18 10:34:51 Re: Compression and on-disk sorting