Re: decoupling table and index vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2022-02-08 17:12:19
Message-ID: CAH2-Wzk3Z3KhyMtrw6RHB1f+gVVKcaVWgPE0sBFEmjnL0ur5Fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 6, 2022 at 11:25 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > One thing we could try doing in order to make that easier would be:
> > tweak things so that when autovacuum vacuums the table, it only
> > vacuums the indexes if they meet some threshold for bloat. I'm not
> > sure exactly what happens with the heap vacuuming then - do we do
> > phases 1 and 2 always, or a combined heap pass, or what? But if we
> > pick some criteria that vacuums indexes sometimes and not other times,
> > we can probably start doing some meaningful measurement of whether
> > this patch is making bloat better or worse, and whether it's using
> > fewer or more resources to do it.
>
> I think we can always trigger phase 1 and 2 and phase 2 will only
> vacuum conditionally based on if all the indexes are vacuumed for some
> conveyor belt pages so we don't have risk of scanning without marking
> anything unused.

Not sure what you mean about a risk of scanning without marking any
LP_DEAD items as LP_UNUSED. If VACUUM always does some amount of this,
then it follows that the new mechanism added by the patch just can't
safely avoid any work at all, making it all pointless. We have to
expect heap vacuuming to take place much less often with the patch.
Simply because that's what the invariant described in comments above
lazy_scan_heap() requires.

Note that this is not the same thing as saying that we do less
*absolute* heap vacuuming with the conveyor belt -- my statement about
less heap vacuuming taking place is *only* true relative to the amount
of other work that happens in any individual "shortened" VACUUM
operation. We could do exactly the same total amount of heap vacuuming
as before (in a version of Postgres without the conveyor belt but with
the same settings), but much *more* index vacuuming (at least for one
or two problematic indexes).

> And we can try to measure with other approaches as
> well where we completely avoid phase 2 and it will be done only along
> with phase 1 whenever applicable.

I believe that the main benefit of the dead TID conveyor belt (outside
of global index use cases) will be to enable us to do more (much more)
index vacuuming for one index in particular. So it's not really about
doing less index vacuuming or less heap vacuuming -- it's about doing
a *greater* amount of *useful* index vacuuming, in less time. There is
often some way in which failing to vacuum one index for a long time
does lasting damage to the index structure.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-08 17:32:57 Re: decoupling table and index vacuum
Previous Message Robert Haas 2022-02-08 17:11:49 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints