Re: Compression and on-disk sorting

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Zeugswetter Andreas DCP SD <ZeugswetterA(at)spardat(dot)at>, Greg Stark <gsstark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rod Taylor <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-24 20:12:32
Message-ID: 20060524201230.GI59464@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Finally completed testing of a dataset that doesn't fit in memory with
compression enabled. Results are at
http://jim.nasby.net/misc/pgsqlcompression .

Summary:
work_mem compressed not compressed gain
in-memory 20000 400.1 797.7 49.8%
in-memory 2000 371.4 805.7 53.9%
not in-memory 20000 8537 17436 51.0%
not in-memory 2000 8152 17820 54.3%

I find it very interesting that the gains are identical even when the
tapes should fit in memory. My guess is that for some reason the OS is
flushing those to disk anyway. In fact, watching gstat during a run, I
do see write activity hitting the drives. So if there was some way to
tune that behavior, the in-memory case would probably be much, much
faster. Anyone know FreeBSD well enough to suggest how to change this?
Anyone want to test on linux and see if the results are the same? This
could indicate that it might be advantageous to attempt an in-memory
sort with compressed data before spilling that compressed data to
disk...

As for CPU utilization, it was ~33% with compression and ~13% without.
That tells me that CPU could become a factor if everything was truely in
memory (including the table we were reading from), but if that's the
case there's a good chance that we wouldn't even be switching to an
on-disk sort. If everything isn't in memory then you're likely to be IO
bound anyway...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message korry 2006-05-24 20:19:33 Re: file-locking and postmaster.pid
Previous Message Simon Riggs 2006-05-24 20:11:24 Re: error-free disabling of individual child partition