Skip site navigation (1) Skip section navigation (2)

Re: [PATCH] Compression and on-disk sorting

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCH] Compression and on-disk sorting
Date: 2006-05-18 11:42:01
Message-ID: 20060518114201.GB4359@svana.org (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
On Thu, May 18, 2006 at 11:34:36AM +0100, Simon Riggs wrote:
> Just do a Z_FULL_FLUSH when you hit end of block. That way all blocks
> will be independent of each other and you can rewind as much as you
> like. We can choose the block size to be 32KB or even 64KB, there's no
> dependency there, just memory allocation. It should be pretty simple to
> make the block size variable at run time, so we can select it according
> to how many files and how much memory we have.

If you know you don't need to seek, there's no need to block the data
at all, one long stream is fine. So that case is easy.

For seeking, you need more work. I assume you're talking about 32KB
input block sizes (uncompressed). The output blocks will be of variable
size. These compressed blocks would be divided up into fixed 8K blocks
and written to disk.

To allow seeking, you'd have to do something like a header comtaining:

- length of previous compressed block
- length of this compressed block
- offset of block in uncompressed bytes (from beginning of tape)

This would allow you to scan backwards and forwards. If you want to be
able to jump to anywhere in the file, you may be better off storing the
file offsets (which would be implicit if the blocks are 32KB) in the
indirect blocks, using a search to find the right block, and then a
header in the block to find the offset.

Still, I'd like some evidence of benefits before writing up something
like that.

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog(at)svana(dot)org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2006-05-18 13:45:54
Subject: Re: does wal archiving block the current client connection?
Previous:From: Robert TreatDate: 2006-05-18 11:26:16
Subject: Re: Google and the Beta Freeze

pgsql-patches by date

Next:From: Bruce MomjianDate: 2006-05-18 16:07:05
Subject: Re: Compression and on-disk sorting
Previous:From: Simon RiggsDate: 2006-05-18 10:34:51
Subject: Re: Compression and on-disk sorting

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group