|From:||Simon Riggs <simon(at)2ndQuadrant(dot)com>|
|To:||Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>|
|Cc:||Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Avoiding pin scan during btree vacuum|
|Views:||Raw Message | Whole Thread | Download mbox|
On 21 December 2015 at 21:28, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
> Simon Riggs wrote:
> > During VACUUM of btrees, we need to pin all pages, even those where
> > are not removed, which I am calling here the "pin scan". This is
> > a problem during redo, where removing one tuple from a 100GB btree can
> > a minute or more. That causes replication lags. Bad Thing.
> Agreed on there being a problem here.
> As a reminder, this "pin scan" is implemented in the
> HotStandbyActiveInReplay() block in btree_xlog_vacuum; this routine is
> in charge of replaying an action recorded by _bt_delitems_vacuum. That
> replay works by acquiring cleanup lock on all btree blocks from
> lastBlockVacuumed to "this one" (i.e. the one in which elements are
> being removed). In essence, this means pinning *all* the blocks in the
> index, which is bad.
> The new code sets lastBlockVacuumed to Invalid; a new test in the replay
> code makes that value not pin anything. This is supposed to optimize
> the case in question.
Nice summary, thanks.
> One thing that this patch should update is the comment above struct
> xl_btree_vacuum, which currently state that EnsureBlockUnpinned() is one
> of the options to apply to each block, but that routine doesn't exist.
> One possible naive optimization is to avoid locking pages that aren't
> leaf pages, but that would still require fetching the block from disk,
> so it wouldn't actually optimize anything other than the cleanup lock
> acquisition. (And probably not too useful, since leaf pages are said to
> be 95% - 99% of index pages anyway.)
Agreed, not worth it.
> Also of interest is the comment above the call to _bt_delitems_vacuum in
> btvacuumpage(); that needs an update too.
> It appears to me that your patch changes the call in btvacuumscan() but
> not the one in btvacuumpage(). I assume you should be changing both.
Yes, that was correct. Updated.
> I think the new comment that talks about Toast Index should explain
> *why* we can skip the pinning in all cases except that one, instead of
> just saying we can do it.
Greatly expanded comments.
Thanks for the review.
New patch attached.
|Next Message||Simon Riggs||2016-01-03 15:40:01||Re: Avoiding pin scan during btree vacuum|
|Previous Message||Robert Haas||2016-01-03 14:57:05||Re: dynloader.h missing in prebuilt package for Windows?|