Re: Compression and on-disk sorting

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Zeugswetter Andreas DCP SD <ZeugswetterA(at)spardat(dot)at>, Greg Stark <gsstark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rod Taylor <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compression and on-disk sorting
Date: 2006-05-26 16:31:08
Message-ID: 20060526163107.GD59464@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've done some more testing with Tom's recently committed changes to
tuplesort.c, which remove the tupleheaders from the sort data. It does
about 10% better than compression alone does. What's interesting is that
the gains are about 10% regardless of compression, which means
compression isn't helping very much on all the redundant header data,
which I find very odd. And the header data is very redundant:

bench=# select xmin,xmax,cmin,cmax,aid from accounts order by aid limit 1;
xmin | xmax | cmin | cmax | aid
--------+------+------+------+-----
280779 | 0 | 0 | 0 | 1
(1 row)

bench=# select xmin,xmax,cmin,cmax,aid from accounts order by aid desc limit 1;
xmin | xmax | cmin | cmax | aid
--------+------+------+------+-----------
310778 | 0 | 0 | 0 | 300000000
(1 row)

Makes sense, since pgbench loads the database via a string of COPY commands,
each of which loads 10000 rows.

Something else worth mentioning is that sort performance is worse with
larger work_mem for all cases except the old HEAD, prior to the
tuplesort.c changes. It looks like whatever was done to fix that will
need to be adjusted/rethought pending the outcome of using compression.

In any case, compression certainly seems to be a clear win, at least in
this case. If there's interest, I can test this on some larger hardware,
or if someone wants to produce a patch for pgbench that will load some
kind of real data into accounts.filler, I can test that as well.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-05-26 16:35:36 Re: Compression and on-disk sorting
Previous Message Gurjeet Singh 2006-05-26 16:15:25 Re: [HACKERS] BEGIN inside transaction should be an error