Re: Resurrecting per-page cleaner for btree

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp
Subject: Re: Resurrecting per-page cleaner for btree
Date: 2006-07-25 19:37:58
Message-ID: 23887.1153856278@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> This is a revised patch originated by Junji TERAMOTO for HEAD.
> [BTree vacuum before page splitting]
> http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> I think we can resurrect his idea because we will scan btree pages
> at-atime now; the missing-restarting-point problem went away.

I've applied this but I'm now having some second thoughts about it,
because I'm seeing an actual *decrease* in pgbench numbers from the
immediately prior CVS HEAD code. Using
pgbench -i -s 10 bench
pgbench -c 10 -t 1000 bench (repeat this half a dozen times)
with fsync off but all other settings factory-stock, what I'm seeing
is that the first run looks really good but subsequent runs tail off in
spectacular fashion :-( Pre-patch there was only minor degradation in
successive runs.

What I think is happening is that because pgbench depends so heavily on
updating existing records, we get into a state where an index page is
about full and there's one dead tuple on it, and then for each insertion
we have

* check for uniqueness marks one more tuple dead (the
next-to-last version of the tuple)
* newly added code removes one tuple and does a write
* now there's enough room to insert one tuple
* lather, rinse, repeat, never splitting the page.

The problem is that we've traded splitting a page every few hundred
inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
record, on *every* insert. This is not good.

Had you done any performance testing on this patch, and if so what
tests did you use? I'm a bit hesitant to try to fix it on the basis
of pgbench results alone.

One possible fix that comes to mind is to only perform the cleanup
if we are able to remove more than one dead tuple (perhaps about 10
would be good). Or do the deletion anyway, but then go ahead and
split the page unless X amount of space has been freed (where X is
more than just barely enough for the incoming tuple).

After all the thought we've put into this, it seems a shame to
just abandon it :-(. But it definitely needs more tweaking.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joachim Wieland 2006-07-25 19:45:05 status of yet another timezone todo item
Previous Message Dave Page 2006-07-25 19:21:44 Re: root/administartor user check option.

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2006-07-25 21:47:06 Re: LDAP lookup of connection parameters
Previous Message Tom Lane 2006-07-25 18:02:31 Re: Resurrecting per-page cleaner for btree