Skip site navigation (1) Skip section navigation (2)

[PATCH] Compression and on-disk sorting

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: pgsql-patches(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: [PATCH] Compression and on-disk sorting
Date: 2006-05-17 16:17:30
Message-ID: 20060517161730.GI15180@svana.org (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Persuant to the discussions currently on -hackers, here's a patch that
uses zlib to compress the tapes as they go to disk. I default to the
compression level 3 (think gzip -3).

Please speed test all you like, I *think* it's bug free, but you never
know.

Outstanding questions:

- I use zlib because the builtin pg_lzcompress can't do what zlib does.
Here we setup input and output buffers and zlib will process as much as
it can (input empty or output full). This means no marshalling is
required. We can compress the whole file without having it in memory.

- zlib allocates memory for compression and decompression, I don't know
how much. However, it allocates via the postgres mcxt system so it
shouldn't too hard to find out. Simon pointed out that we'll need to
track this because we might allow hundreds of tapes.

- Each tape is compressed as one long compressed stream. Currently no
seeking is allowed, so only sorts, no joins! (As tom said, quick and
dirty numbers). This should show this possibility in its best light
but if we want to support seeking we're going to need to change that.
Maybe no compression on the last pass?

- It's probable that the benefits are strongly correlated to the speed
of your disk subsystem. We need to measure this effect. I can't
accuratly measure this because my compiler doesn't inline any of the
functions in tuplesort.c.

In my test of a compression ratio around 100-to-1, on 160MB of data
with tiny work_mem on my 5 year old laptop, it speeds it up by 60% so
it's obviously not a complete waste of time. Ofcourse, YMMV :)

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog(at)svana(dot)org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

Responses

pgsql-hackers by date

Next:From: Martijn van OosterhoutDate: 2006-05-17 16:21:39
Subject: Re: Compression and on-disk sorting
Previous:From: Rod TaylorDate: 2006-05-17 16:16:13
Subject: Re: Compression and on-disk sorting

pgsql-patches by date

Next:From: Greg StarkDate: 2006-05-17 16:55:53
Subject: Re: Compression and on-disk sorting
Previous:From: Rod TaylorDate: 2006-05-17 16:16:13
Subject: Re: Compression and on-disk sorting

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group