Re: GUC for cleanup indexes threshold.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Subject: Re: GUC for cleanup indexes threshold.
Date: 2017-03-04 00:29:36
Message-ID: CAH2-Wz=1=t5fcGGfarQGcAWBqaCh+dLMjpYCYHpEyzK8Qg6OrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 3, 2017 at 2:41 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> In other words, the number of B-Tree pages that the last VACUUM
> deleted, and thus made eligible to recycle by the next VACUUM has no
> relationship with the number of pages the next VACUUM will itself end
> up deleting, in general, or how long it will be before that next
> VACUUM comes, if it comes at all, or anything else that seems at all
> relevant.

This raises another question, though: Why have this GUC at all? Why
use *any* threshold that is to be compared against the number of heap
pages that were processed by VACUUM this time?

B-Tree page deletion isn't really part of the ordinary life cycle of a
B-Tree index. In order for that to be the case, somebody would have to
delete large ranges of indexed values (typically hundreds of logically
contiguous values -- no gaps), without anyone else ever inserting new
tuples that are in the same range before the next VACUUM. It's very
unlikely that this would happen again and again in the real world. So,
even if we never freeze, the number of B-Tree pages that we delete
when we VACUUM today is generally a useless predictor of how many will
be deleted by a VACUUM that occurs tomorrow. This is true despite the
fact that the number of dead heap tuples is probably almost identical
for each VACUUM (or the number of heap pages that end up being
processed by VACUUM, if you prefer).

Barring any concerns about crash safety, we can be completely certain
that any "recycling-orientated B-Tree VACUUM" (a btvacuumcleanup()
call to btvacuumscan(), which happens because there are no tuples in
the index to kill) will end up recycling however many pages the last
VACUUM managed to delete, which is a precisely knowable number (or
could be made knowable if we stashed that number somewhere, like the
meta-page). It will typically only take seconds or minutes after the
VACUUM finishes for its RecentGlobalXmin interlock to stop being a
problem (that is, for _bt_page_recyclable() to return "true" for any
pages that that VACUUM deleted). From that point on, those deleted
pages are "money in the bank" for the FSM. The only reason why we'd
want to tie "the FSM withdrawing that money" to VACUUM is because that
might be needed to clean up regular bloat anyway.

The test performed by this patch within lazy_scan_heap(), to determine
whether we should avoid calling lazy_cleanup_index() would therefore
look like this, ideally: Do I want to go to the trouble of scanning
this index (a cost that is proportionate to the size of the index) in
order to recycle this number of known-deleted pages (a benefit that is
proportionate to that number)? (I still think that the important thing
is that we don't let the number of unclaimed-by-FSM recyclable pages
grow forever, though.)

(Thinks about it some more...)

Unfortunately, I just saw a whole new problem with this patch:
_bt_page_recyclable() is the one place in the B-Tree AM where we stash
an XID. We don't need to set this to FrozenTransactionId at any point,
because this is stashed for deleted pages only, pages that are likely
to be recycled very soon. It might be that the VACUUM that ends up
deleting any such page is an anti-wraparound VACUUM, especially in the
case that this patch really wants to improve. However, with this
patch, that recycling won't happen, of course. As a result,
_bt_page_recyclable() will falsely report that the page is not
recyclable if it is ever asked again.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2017-03-04 00:47:47 Re: Enabling replication connections by default in pg_hba.conf
Previous Message Andres Freund 2017-03-04 00:26:00 Re: Performance degradation in TPC-H Q18