Re: Compression and on-disk sorting

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Zeugswetter Andreas DCP SD <ZeugswetterA(at)spardat(dot)at>, Greg Stark <gsstark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rod Taylor <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-24 21:20:43
Message-ID: 4474CE2B.3070506@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim C. Nasby wrote:
> Finally completed testing of a dataset that doesn't fit in memory with
> compression enabled. Results are at
> http://jim.nasby.net/misc/pgsqlcompression .
>
> Summary:
> work_mem compressed not compressed gain
> in-memory 20000 400.1 797.7 49.8%
> in-memory 2000 371.4 805.7 53.9%
> not in-memory 20000 8537 17436 51.0%
> not in-memory 2000 8152 17820 54.3%
>
> I find it very interesting that the gains are identical even when the
> tapes should fit in memory. My guess is that for some reason the OS is
> flushing those to disk anyway. In fact, watching gstat during a run, I
> do see write activity hitting the drives. So if there was some way to
> tune that behavior, the in-memory case would probably be much, much
> faster. Anyone know FreeBSD well enough to suggest how to change this?
> Anyone want to test on linux and see if the results are the same? This
> could indicate that it might be advantageous to attempt an in-memory
> sort with compressed data before spilling that compressed data to
> disk...
>

I can test it on linux just let me know what you need.

J

> As for CPU utilization, it was ~33% with compression and ~13% without.
> That tells me that CPU could become a factor if everything was truely in
> memory (including the table we were reading from), but if that's the
> case there's a good chance that we wouldn't even be switching to an
> on-disk sort. If everything isn't in memory then you're likely to be IO
> bound anyway...

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message korry 2006-05-24 21:23:57 Re: file-locking and postmaster.pid
Previous Message Reini Urban 2006-05-24 21:20:30 Re: compiling source code!!!!!!!!!!!!!!!!!!!!!!!!!!!!!