[PATCH] Compression and on-disk sorting

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: pgsql-patches(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: [PATCH] Compression and on-disk sorting
Date: 2006-05-17 16:17:30
Message-ID: 20060517161730.GI15180@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Persuant to the discussions currently on -hackers, here's a patch that
uses zlib to compress the tapes as they go to disk. I default to the
compression level 3 (think gzip -3).

Please speed test all you like, I *think* it's bug free, but you never
know.

Outstanding questions:

- I use zlib because the builtin pg_lzcompress can't do what zlib does.
Here we setup input and output buffers and zlib will process as much as
it can (input empty or output full). This means no marshalling is
required. We can compress the whole file without having it in memory.

- zlib allocates memory for compression and decompression, I don't know
how much. However, it allocates via the postgres mcxt system so it
shouldn't too hard to find out. Simon pointed out that we'll need to
track this because we might allow hundreds of tapes.

- Each tape is compressed as one long compressed stream. Currently no
seeking is allowed, so only sorts, no joins! (As tom said, quick and
dirty numbers). This should show this possibility in its best light
but if we want to support seeking we're going to need to change that.
Maybe no compression on the last pass?

- It's probable that the benefits are strongly correlated to the speed
of your disk subsystem. We need to measure this effect. I can't
accuratly measure this because my compiler doesn't inline any of the
functions in tuplesort.c.

In my test of a compression ratio around 100-to-1, on 160MB of data
with tiny work_mem on my 5 year old laptop, it speeds it up by 60% so
it's obviously not a complete waste of time. Ofcourse, YMMV :)

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment Content-Type Size
compress-sort.patch text/plain 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-05-17 16:21:39 Re: Compression and on-disk sorting
Previous Message Rod Taylor 2006-05-17 16:16:13 Re: Compression and on-disk sorting

Browse pgsql-patches by date

  From Date Subject
Next Message Greg Stark 2006-05-17 16:55:53 Re: Compression and on-disk sorting
Previous Message Rod Taylor 2006-05-17 16:16:13 Re: Compression and on-disk sorting