Re: Compression and on-disk sorting

From: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Zeugswetter Andreas DCP SD" <ZeugswetterA(at)spardat(dot)at>, "Greg Stark" <gsstark(at)mit(dot)edu>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Rod Taylor" <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-19 11:59:14
Message-ID: 3E37B936B592014B978C4415F90D662D03489494@MI8NYCMAIL06.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim,

> http://jim.nasby.net/misc/compress_sort.txt is preliminary results.
> I've run into a slight problem in that even at a compression
> level of -3, zlib is cutting the on-disk size of sorts by
> 25x. So my pgbench sort test with scale=150 that was
> producing a 2G on-disk sort is now producing a 80M sort,
> which obviously fits in memory. And cuts sort times by more than half.

When you're ready, we can test this on some other interesting cases and
on fast hardware.

BTW - external sorting is *still* 4x slower than popular commercial DBMS
(PCDB) on real workload when full rows are used in queries. The final
results we had after the last bit of sort improvements were limited to
cases where only the sort column was used in the query, and for that
case the improved external sort code was as fast as PCDB provided lots
of work_mem are used, but when the whole contents of the row are
consumed (as with TPC-H and in many real world cases) the performance is
still far slower.

So, compression of the tuples may be just what we're looking for.

- Luke

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-05-19 12:26:33 Re: does wal archiving block the current client connection?
Previous Message Martijn van Oosterhout 2006-05-19 11:58:44 Re: [PATCH] Compression and on-disk sorting