Re: Maintaining cluster order on insert

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Jaime Casanova <systemguards(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Maintaining cluster order on insert
Date: 2007-07-09 23:29:32
Message-ID: 457.1184023772@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

[ back to the cluster-order patch ]

Awhile back, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> The performance characteristics of this patch hasn't been thoroughly
> discussed yet. The reason why you want to cluster your tables is to
> speed up SELECTs that return a bunch of tuples with similar values, for
> example range queries. The reason for keeping them clustered on inserts
> is to reduce the need to run CLUSTER as often.

> It doesn't come without a cost, however. In the worst case, there never
> is room for new inserts on pages, and each insert needs to do one extra
> I/O to fetch the optimal heap page where the insert should go, see that
> there's no room, and then insert somewhere else. Using a non-zero
> fillfactor helps, but even when there is room on the page, it's often
> cheaper to just append to the end of the table and running CLUSTER at
> night for example, than do random access to insert to the "right" pages
> in the heap.

I looked through the thread and could not find any clear evidence that
anyone had done any performance testing of this patch at all, so I
hacked together a crude test that alternates between inserting/deleting
a random subset of rows and SELECTing a range of rows. The results
do not look very good: while there is detectable improvement in the
SELECT performance, it's not much, and it comes at a *very* sizable
penalty in INSERT performance.

Attached is a test program, which I ran against today's CVS HEAD with
and without the v8 version of the patch. The test program sets up
a clustered million-row table with a row width chosen to fit about
20 rows per page. It then iterates a loop of:

* delete a random 2% of the table
* vacuum to recover space
* insert a random 2% of the table
* select (about) 1000 consecutively-numbered rows
* select all the rows (this is just a cross check that the
number of rows isn't changing too much)

What you would hope to see as the benefit of the patch is that the time
for the range SELECT degrades more slowly as more of the table is
replaced. Ignoring the first SELECT as being a startup transient, it
looks like HEAD degrades from about 3 msec to 6 msec over 10 iterations
(20% replacement of the table), whereas with the patch it's about 3 msec
to about 4 and a half. However, the INSERT steps went from around 20
sec each to about twice that.

I used just the following nondefault parameter settings:

shared_buffers = 10000 # min 128kB or max_connections*16kB
maintenance_work_mem = 160MB # min 1MB
max_fsm_pages = 2048000 # min max_fsm_relations*16, 6 bytes each
wal_buffers = 640kB # min 32kB
checkpoint_segments = 30 # in logfile segments, min 1, 16MB each
enable_bitmapscan = off
enable_seqscan = off
autovacuum = off # enable autovacuum subprocess?

which for the most part are just 10x the factory defaults. The hardware
is just a Dell x86_64 workstation with crappy IDE disk, so maybe things
would look better elsewhere, but it's all I have to work with.

Considering that the patch is really severely ugly from a modularity and
layering standpoint, I'm now inclined to reject it. AFAICT this test
case is showing the patch at its best possible advantage; under
real-world conditions the benefit would be less. It doesn't look to me
like the gain is worth the loss of system understandability and
maintainability that the patch would impose.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 3.4 KB
unknown_filename text/plain 2.0 KB
unknown_filename text/plain 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian G. Pflug 2007-07-10 00:41:44 "Running readonly queried on PITR slaves" status update
Previous Message Florian G. Pflug 2007-07-09 22:34:15 Re: ReadRecord, EndRecPtr and XLOG_SWITCH

Browse pgsql-patches by date

  From Date Subject
Next Message Neil Conway 2007-07-10 05:57:22 Re: CREATE TABLE LIKE INCLUDING INDEXES support
Previous Message daveg 2007-07-09 21:04:03 Re: dblink connection security