Re: decoupling table and index vacuum

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2022-02-09 21:40:59
Message-ID: CA+TgmoZJ3xXW7VNtT5fWbMK0i8uU3grEMbg4_fGkZCh8GJs2Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 9, 2022 at 2:27 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> We should probably dispense with the idea that we'll be making these
> decisions about what to do with an index like this (bloated in a way
> that bottom-up index deletion just won't help with) in an environment
> that is similar to how the current "skip index scan when # heap pages
> with one or more LP_DEAD items < 2% of rel_pages" thing. That
> mechanism has to be very conservative because we just don't know when
> the next opportunity to vacuum indexes will be -- we almost have to
> assume that the decision will be static, and made exactly once, so we
> better be defensive. But why should that continue to be true with the
> conveyor belt stuff in place, and with periodic mini-vacuums that
> coordinate over time? I don't think it has to be like that. We can
> make it much more dynamic.

I'm not sure that we can. I mean, there's still only going to be ~3
autovacuum workers, and there could be arbitrarily many tables. Even
if the vacuum load is within the bounds of what the system can
sustain, individual tables can't be assured of being visited
frequently (or so it seems to me) and it could be that there are
actually not enough resources to vacuum and have to try to cope as
best we can. Less unnecessary vacuuming of large indexes can help, of
course, but I'm not sure it fundamentally changes the calculus.

> We will need something like that. I think that LP_DEAD items (or
> would-be LP_DEAD items -- tuples with storage that would get pruned
> into LP_DEAD items if we were to prune) in the table are much more
> interesting than dead heap-only tuples, and also more interesting that
> dead index tuples. Especially the distribution of such LP_DEAD items
> in the table, and their concentration. That does seem much more likely
> to be robust as a quantitative driver of index vacuuming.

Hmm... why would the answer have to do with dead items in the heap? I
was thinking along the lines of trying to figure out either a more
reliable count of dead tuples in the index, subtracting out whatever
we save by kill_prior_tuple and bottom-up vacuuming; or else maybe a
count of the subset of dead tuples that are likely not to get
opportunistically pruned in one way or another, if there's some way to
guess that. Or maybe something where when we see an index page filling
up we try to figure out (or guess) that it's close to really needing a
split - i.e. that it's not full of tuples that we could just junk to
make space - and notice how often that's happening. I realize I'm
hand-waving, but if the property is a property of the heap rather than
the index, how will different indexes get different treatment?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Brindle 2022-02-09 21:51:18 Re: [PATCH v2] use has_privs_for_role for predefined roles
Previous Message Joe Conway 2022-02-09 21:39:11 Re: [PATCH v2] use has_privs_for_role for predefined roles