Re: B-tree page deletion boundary cases

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: B-tree page deletion boundary cases
Date: 2012-04-24 08:08:36
Message-ID: CA+U5nM+Ra=3X269jC6x4fr4dVYj=rBAOS=y9FwO_kmhBuh5eCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 21, 2012 at 5:52 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:

> As I mentioned[1] peripherally back in November, that algorithm has been
> insufficient since the introduction of non-XID-bearing transactions in
> PostgreSQL 8.3.  Such transactions do not restrain RecentXmin.  If no running
> transaction has an XID, RecentXmin == ReadNewTransactionId() and the page
> incorrectly becomes available for immediate reuse.

Good observation.

> The fix is to compare the stored XID to RecentGlobalXmin, not RecentXmin.  We
> already use RecentGlobalXmin when wal_level = hot_standby.  If no running
> transaction has an XID and all running transactions began since the last
> transaction that did bear an XID, RecentGlobalXmin == ReadNewTransactionId().
> Therefore, the correct test is btpo.xact < RecentGlobalXmin, not btpo.xact <=
> RecentGlobalXmin as we have today.  This also cleanly removes the need for the
> bit of code in _bt_getbuf() that decrements btpo.xact before sending it down
> for ResolveRecoveryConflictWithSnapshot().  I suggested[2] that decrement on
> an unprincipled basis; it was just masking the off-by-one of using "<=
> RecentGlobalXmin" instead of "< RecentGlobalXmin" in _bt_page_recyclable().

Looks like the right fix. I'll apply this to 9.0/9.1/HEAD.

> This change makes empty B-tree pages wait through two generations of running
> transactions before reuse, so some additional bloat will arise.

Could arise, in some circumstances. But that assumes VACUUMs are
fairly frequent and that they would be delayed/rendered less effective
by this, which I don't think will be the case.

I note that we don't take any account of the number of pages that may
be reused when we VACUUM, so when HOT avoids a VACUUM we may
accumulate pages for a longer period. Looks like there is more work to
be done yet in cleaning indexes.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2012-04-24 08:17:14 Re: ECPG FETCH readahead
Previous Message Boszormenyi Zoltan 2012-04-24 08:04:41 Re: PL/PGSQL bug in handling composite types