Re: [PATCH] Compression and on-disk sorting

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCH] Compression and on-disk sorting
Date: 2006-05-18 08:31:03
Message-ID: 20060518083103.GD32755@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Wed, May 17, 2006 at 06:38:47PM +0100, Simon Riggs wrote:
> > - Each tape is compressed as one long compressed stream. Currently no
> > seeking is allowed, so only sorts, no joins! (As tom said, quick and
> > dirty numbers). This should show this possibility in its best light
> > but if we want to support seeking we're going to need to change that.
> > Maybe no compression on the last pass?
>
> We should be able to do this without significant loss of compression by
> redefining the lts block size to be 32k. That's the size of the
> look-back window anyhow, so compressing the whole stream doesn't get us
> much more.

The major problem is looking back costs significantly more with
compression. If you need to look back into the previous compressed
block, you need to decompress the whole previous block. The simple
solution would be to keep a buffer of the last 32KB. Another posibility
would be to have a limit of 32KB of uncompressed data per block and
just remember the whole previous block.

Seek/Tell is not the hard part, it's the backspace. It would probably
be smart to make backspace call Seek, rather than trying to be smart
about it.

Another issue is that currently the compression code is completely
within logtape.c. To be able to seek backwards efficiently you might
have to change the abstraction so that it knows about the records from
tuplesort.c. That's much more work, which needs a lot more thinking.

Besides, we still havn't got any reports yet that this actually
provides a benefit on any machine less than five years ago. Anyone out
there doing tests?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas DCP SD 2006-05-18 08:57:16 Re: Compression and on-disk sorting
Previous Message Gregory S. Williamson 2006-05-18 07:56:57 Re: [HACKERS] Big IN() clauses etc : feature proposal

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2006-05-18 10:34:36 Re: [PATCH] Compression and on-disk sorting
Previous Message Tom Lane 2006-05-18 05:45:03 buildfarm failures