Re: Compression and on-disk sorting

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Zeugswetter Andreas DCP SD <ZeugswetterA(at)spardat(dot)at>, Greg Stark <gsstark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rod Taylor <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-24 21:40:18
Message-ID: 20060524214018.GP59464@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 24, 2006 at 02:20:43PM -0700, Joshua D. Drake wrote:
> Jim C. Nasby wrote:
> >Finally completed testing of a dataset that doesn't fit in memory with
> >compression enabled. Results are at
> >http://jim.nasby.net/misc/pgsqlcompression .
> >
> >Summary:
> > work_mem compressed not compressed gain
> >in-memory 20000 400.1 797.7 49.8%
> >in-memory 2000 371.4 805.7 53.9%
> >not in-memory 20000 8537 17436 51.0%
> >not in-memory 2000 8152 17820 54.3%
> >
> >I find it very interesting that the gains are identical even when the
> >tapes should fit in memory. My guess is that for some reason the OS is
> >flushing those to disk anyway. In fact, watching gstat during a run, I
> >do see write activity hitting the drives. So if there was some way to
> >tune that behavior, the in-memory case would probably be much, much
> >faster. Anyone know FreeBSD well enough to suggest how to change this?
> >Anyone want to test on linux and see if the results are the same? This
> >could indicate that it might be advantageous to attempt an in-memory
> >sort with compressed data before spilling that compressed data to
> >disk...
> >
>
> I can test it on linux just let me know what you need.

Actually, after talking to Larry he mentioned that it'd be worth
checking to see if we're doing something like opening the files in
O_DIRECT, which I haven't had a chance to do. Might be worth looking at
that before running more tests.

Anyway, I've posted the patch now as well, and compress_sort.txt has the
commands I was running. Those are just against a plain pgbench database
that's been freshly initialized (ie: no dead tuples). I just created two
install directories from a checkout of HEAD via --prefix=, one with the
patch and one without. Both hit the same $PGDATA. I've posted the
postgresql.conf as well.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-05-24 21:46:00 Re: file-locking and postmaster.pid
Previous Message Alvaro Herrera 2006-05-24 21:35:02 Re: file-locking and postmaster.pid