Re: Compression and on-disk sorting

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Zeugswetter Andreas DCP SD <ZeugswetterA(at)spardat(dot)at>, Greg Stark <gsstark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rod Taylor <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-19 03:02:44
Message-ID: 20060519030243.GW64371@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 18, 2006 at 04:55:17PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > Actually, I guess the amount of memory used for zlib's lookback buffer
> > (or whatever they call it) could be pretty substantial, and I'm not sure
> > if there would be a way to combine that across all tapes.
>
> But there's only one active write tape at a time. My recollection of
> zlib is that compression is memory-hungry but decompression not so much,
> so it seems like this shouldn't be a huge deal.

It seems more appropriate to discuss results here, rather than on
-patches...

http://jim.nasby.net/misc/compress_sort.txt is preliminary results.
I've run into a slight problem in that even at a compression level of
-3, zlib is cutting the on-disk size of sorts by 25x. So my pgbench sort
test with scale=150 that was producing a 2G on-disk sort is now
producing a 80M sort, which obviously fits in memory. And cuts sort
times by more than half.

So, if nothing else, it looks like compression is definately a win if it
means you can now fit the sort within the disk cache. While that doesn't
sound like something very worthwhile, it essentially extends work_mem
from a fraction of memory to up to ~25x memory.

I'm currently loading up a pgbench database with a scaling factor of
15000; hopefully I'll have results from that testing in the morning.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2006-05-19 05:28:02 Re: [HACKERS] Toward A Positive Marketing Approach.
Previous Message Tom Lane 2006-05-19 02:32:53 TupleDesc refcounting, again