Re: Bugs in b-tree dead page removal

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Bugs in b-tree dead page removal
Date: 2010-02-11 14:31:55
Message-ID: 1265898715.7341.1762.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 2010-02-07 at 21:33 -0500, Tom Lane wrote:

> That last problem is easy to fix, but I'm not at all sure what to do
> about the scan interlock problem. Thoughts?

AFAICS the problem doesn't exist in normal running.
_bt_page_recyclable() tests against RecentXmin, which includes the xmins
of read only transactions. So it doesn't matter if a read-only
transaction still exists that is earlier than the value of
opaque->btpo.xact when it is set. If it still there later then the page
cannot be reused.

A basic interlock approach can be put in place for Hot Standby. We just
WAL log the reuse of a btree page in _bt_getbuf() just before we
_bt_pageinit(), using transaction id that took that action. We can then
conflict on that xid.

- -

For the TODO, I'm thinking whether there's a way to allow the page to be
reused earlier and have it all just work. That would allow us to recycle
index blocks faster and avoid index bloat from occurring in the presence
of long lived transactions. Otherwise fixing this for the normal case
will accentuate index bloat.

It seems possible that a page can be reused and end up at exactly the
same place in the index key space, so that the left link of the new page
matches the right link of the page the scan just left. Most likely it
would be in a different place entirely and so ignoring the issue will
cause scans to potentially stop earlier than they should and we give an
incomplete answer to a query. So we can't just re-check links to
validate the page.

The only thing we actually need to record about the old page is the
right link, so perhaps we can store the right link value in a central
place, together with visibility information. Make that info WAL-logged
so it is available on standby also. That would allow us to find out
whether we should read the page or use the right link info to move
right.

We then store a recycled-by transaction id on the new page we are
recycling. When we scan onto a new page we check to see whether the page
has been recycled by a transaction that we consider still in progress.
If so, we consult the page-visibility info to see what the right link of
the page was as far as our scan is concerned, then use that to continue
our scan.

--
Simon Riggs www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Koichi Suzuki 2010-02-11 14:35:35 Bug on pg_lesslog
Previous Message Heikki Linnakangas 2010-02-11 14:22:54 Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL