Re: decoupling table and index vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2021-04-24 20:17:22
Message-ID: CAH2-WznNkXXzYPyF5LovSkjevsQ-+a+2uhs2AEeeHG=tTWQtBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 24, 2021 at 12:56 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Did anybody actually argue for using #live entries directly? I think
> *dead* entries is more relevant, partiuclarly because various forms of
> local cleanup can be taken into account. Live tuples might come in to
> put the number of dead tuples into perspective, but otherwise not that
> much?

I was unclear. I can't imagine how you'd do anything like this without
using both together. Or if you didn't use live tuples you'd use heap
blocks instead. Something like that.

> > There are many cases where this will do completely the wrong thing,
> > even if we have perfectly accurate information.
>
> Imo the question isn't really whether criteria will ever do something
> wrong, but how often and how consequential such mistakes will
> be. E.g. unnecessarily vacuuming an index isn't fun, but it's better
> than ending up not never cleaning up dead index pointers despite repeat
> accesses (think bitmap scans).

I strongly agree. The risk with what I propose is that we'd somehow
overlook a relevant extreme cost. But I think that that's an
acceptable risk. Plus I see no workable alternative -- your "indexes
that insert on one end, delete from the other" example works much
better as an argument against what you propose than an argument
against my own alternative proposal. Which reminds me: how would your
framework for index bloat/skipping indexes in VACUUM deal cope with
this same scenario?

Though I don't think that it's useful to use quantitative thinking as
a starting point here, that doesn't mean there is exactly zero role
for it. Not sure about how far I'd go here. But I would probably not
argue that we shouldn't vacuum an index that is known to (say) be more
than 60% dead tuples. I guess I'd always prefer to have a better
metric, but speaking hypothetically: Why take a chance? This is not
because it's definitely worth it -- it really isn't! It's just because
the benefit of being right is low compared to the cost of being wrong
-- as you point out, that is really important.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-04-24 20:39:03 Re: decoupling table and index vacuum
Previous Message Andres Freund 2021-04-24 19:56:43 Re: decoupling table and index vacuum