Re: decoupling table and index vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2021-04-22 19:10:54
Message-ID: CAH2-Wz=8E5QecDmzVcEWhwCyVhc2wsGRzviDZq0CyCwiv=zgLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 22, 2021 at 11:44 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I'm honestly getting a bit annoyed about this stuff.

You're easily annoyed.

> Yes it's a cool
> improvement, but no, it doesn't mean that there aren't still relevant
> issues in important cases. It doesn't help that you repeatedly imply
> that people that don't see it your way need to have their view "cleared
> up".

I don't think that anything that I've said about it contradicts
anything that you or Robert said. What I said that you're missing a
couple of important subtleties (or that you seem to be). It's not
really about the optimization in particular -- it's about the
subtleties that it exploits. I think that they're generalizable. Even
if there was only a 1% chance of that being true, it would still be
worth exploring in depth.

I think that everybody's beliefs about VACUUM tend to be correct. It
almost doesn't matter if scenario A is the problem in 90% or cases
versus 10% of cases for scenario B (or vice-versa). What actually
matters is that we have good handling for both. (It's probably some
weird combination of scenario A and scenario B in any case.)

> "Bottom up index deletion" is practically *irrelevant* for a significant
> set of workloads.

You're missing the broader point. Which is that we don't know how much
it helps in each case, just as we don't know how much some other
complementary optimization helps. It's important to develop
complementary techniques precisely because (say) bottom-up index
deletion only solves one class of problem. And because it's so hard to
predict.

I actually went on at length about the cases that the optimization
*doesn't* help. Because that'll be a disproportionate source of
problems now. And you really need to avoid all of the big sources of
trouble to get a really good outcome. Avoiding each and every source
of trouble might be much much more useful than avoiding all but one.

> > You both seem to be assuming that everything would be fine if you
> > could somehow inexpensively know the total number of undeleted dead
> > tuples in each index at all times.
>
> I don't think we'd need an exact number. Just a reasonable approximation
> so we know whether it's worth spending time vacuuming some index.

I agree.

> You also have to assume that you have roughly evenly distributed index
> insertions and deletions. But workloads that insert into some parts of a
> value range and delete from another range are common.
>
> I even would say that *precisely* because "Bottom up index deletion" can
> be very efficient in some workloads it is useful to have per-index stats
> determining whether an index should be vacuumed or not.

Exactly!

> Except that heap bloat not index bloat might be the more pressing
> concern. Or that there will be no meaningful amount of bottom-up
> deletions. Or ...

Exactly!

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-04-22 19:26:02 Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
Previous Message Tom Lane 2021-04-22 19:09:59 Re: posgres 12 bug (partitioned table)