Re: Avoiding pin scan during btree vacuum

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding pin scan during btree vacuum
Date: 2016-01-03 15:34:53
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On 21 December 2015 at 21:28, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>

> Simon Riggs wrote:
> > During VACUUM of btrees, we need to pin all pages, even those where
> tuples
> > are not removed, which I am calling here the "pin scan". This is
> especially
> > a problem during redo, where removing one tuple from a 100GB btree can
> take
> > a minute or more. That causes replication lags. Bad Thing.
> Agreed on there being a problem here.
> As a reminder, this "pin scan" is implemented in the
> HotStandbyActiveInReplay() block in btree_xlog_vacuum; this routine is
> in charge of replaying an action recorded by _bt_delitems_vacuum. That
> replay works by acquiring cleanup lock on all btree blocks from
> lastBlockVacuumed to "this one" (i.e. the one in which elements are
> being removed). In essence, this means pinning *all* the blocks in the
> index, which is bad.
> The new code sets lastBlockVacuumed to Invalid; a new test in the replay
> code makes that value not pin anything. This is supposed to optimize
> the case in question.

Nice summary, thanks.

> One thing that this patch should update is the comment above struct
> xl_btree_vacuum, which currently state that EnsureBlockUnpinned() is one
> of the options to apply to each block, but that routine doesn't exist.


> One possible naive optimization is to avoid locking pages that aren't
> leaf pages, but that would still require fetching the block from disk,
> so it wouldn't actually optimize anything other than the cleanup lock
> acquisition. (And probably not too useful, since leaf pages are said to
> be 95% - 99% of index pages anyway.)

Agreed, not worth it.

> Also of interest is the comment above the call to _bt_delitems_vacuum in
> btvacuumpage(); that needs an update too.


> It appears to me that your patch changes the call in btvacuumscan() but
> not the one in btvacuumpage(). I assume you should be changing both.

Yes, that was correct. Updated.

> I think the new comment that talks about Toast Index should explain
> *why* we can skip the pinning in all cases except that one, instead of
> just saying we can do it.

Greatly expanded comments.

Thanks for the review.

New patch attached.

Simon Riggs
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
avoid_standby_pin_scan.v2.patch application/octet-stream 5.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-01-03 15:40:01 Re: Avoiding pin scan during btree vacuum
Previous Message Robert Haas 2016-01-03 14:57:05 Re: dynloader.h missing in prebuilt package for Windows?