Skip site navigation (1) Skip section navigation (2)

Re: Resurrecting per-page cleaner for btree

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp
Subject: Re: Resurrecting per-page cleaner for btree
Date: 2006-07-25 19:37:58
Message-ID: 23887.1153856278@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> This is a revised patch originated by Junji TERAMOTO for HEAD.
>   [BTree vacuum before page splitting]
>   http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php
> I think we can resurrect his idea because we will scan btree pages
> at-atime now; the missing-restarting-point problem went away.

I've applied this but I'm now having some second thoughts about it,
because I'm seeing an actual *decrease* in pgbench numbers from the
immediately prior CVS HEAD code.  Using
	pgbench -i -s 10 bench
	pgbench -c 10 -t 1000 bench	(repeat this half a dozen times)
with fsync off but all other settings factory-stock, what I'm seeing
is that the first run looks really good but subsequent runs tail off in
spectacular fashion :-(  Pre-patch there was only minor degradation in
successive runs.

What I think is happening is that because pgbench depends so heavily on
updating existing records, we get into a state where an index page is
about full and there's one dead tuple on it, and then for each insertion
we have

	* check for uniqueness marks one more tuple dead (the
	  next-to-last version of the tuple)
	* newly added code removes one tuple and does a write
	* now there's enough room to insert one tuple
	* lather, rinse, repeat, never splitting the page.

The problem is that we've traded splitting a page every few hundred
inserts for doing a PageIndexMultiDelete, and emitting an extra WAL
record, on *every* insert.  This is not good.

Had you done any performance testing on this patch, and if so what
tests did you use?  I'm a bit hesitant to try to fix it on the basis
of pgbench results alone.

One possible fix that comes to mind is to only perform the cleanup
if we are able to remove more than one dead tuple (perhaps about 10
would be good).  Or do the deletion anyway, but then go ahead and
split the page unless X amount of space has been freed (where X is
more than just barely enough for the incoming tuple).

After all the thought we've put into this, it seems a shame to
just abandon it :-(.  But it definitely needs more tweaking.

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Joachim WielandDate: 2006-07-25 19:45:05
Subject: status of yet another timezone todo item
Previous:From: Dave PageDate: 2006-07-25 19:21:44
Subject: Re: root/administartor user check option.

pgsql-patches by date

Next:From: Bruce MomjianDate: 2006-07-25 21:47:06
Subject: Re: LDAP lookup of connection parameters
Previous:From: Tom LaneDate: 2006-07-25 18:02:31
Subject: Re: Resurrecting per-page cleaner for btree

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group