Re: B-tree page deletion boundary cases

From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: B-tree page deletion boundary cases
Date: 2012-04-24 13:00:52
Message-ID: 20120424130052.GC20512@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 24, 2012 at 09:08:36AM +0100, Simon Riggs wrote:
> On Sat, Apr 21, 2012 at 5:52 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > The fix is to compare the stored XID to RecentGlobalXmin, not RecentXmin. ?We
> > already use RecentGlobalXmin when wal_level = hot_standby. ?If no running
> > transaction has an XID and all running transactions began since the last
> > transaction that did bear an XID, RecentGlobalXmin == ReadNewTransactionId().
> > Therefore, the correct test is btpo.xact < RecentGlobalXmin, not btpo.xact <=
> > RecentGlobalXmin as we have today. ?This also cleanly removes the need for the
> > bit of code in _bt_getbuf() that decrements btpo.xact before sending it down
> > for ResolveRecoveryConflictWithSnapshot(). ?I suggested[2] that decrement on
> > an unprincipled basis; it was just masking the off-by-one of using "<=
> > RecentGlobalXmin" instead of "< RecentGlobalXmin" in _bt_page_recyclable().
>
> Looks like the right fix. I'll apply this to 9.0/9.1/HEAD.

Thanks. 8.3 and 8.4 need it, too, albeit simplified due to some of the
affected code not existing yet.

> > This change makes empty B-tree pages wait through two generations of running
> > transactions before reuse, so some additional bloat will arise.
>
> Could arise, in some circumstances. But that assumes VACUUMs are
> fairly frequent and that they would be delayed/rendered less effective
> by this, which I don't think will be the case.
>
> I note that we don't take any account of the number of pages that may
> be reused when we VACUUM, so when HOT avoids a VACUUM we may
> accumulate pages for a longer period. Looks like there is more work to
> be done yet in cleaning indexes.

Agreed. Taking one VACUUM visit to mark the page BTP_DELETED and another to
add it to the FSM is fairly pessimistic; perhaps we can do better.

nm

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-04-24 13:21:38 Re: DOMAINs and CASTs
Previous Message Albe Laurenz 2012-04-24 09:40:55 Re: Foreign table scan estimates