Re: New IndexAM API controlling index vacuum strategies

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-03-14 19:36:35
Message-ID: CAH2-WzmBo8+Ccv3Sc+AdaoZ9uS0YzD=AoC9xJzhxQ3WOhSNPeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 13, 2021 at 7:23 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> In other words, I am not worried about debt, exactly. Debt is normal
> in moderation. Healthy, even. I am worried about bankruptcy, perhaps
> following a rare and extreme event. It's okay to be imprecise, but all
> of the problems must be survivable. The important thing to me for a
> maintenance_work_mem threshold is that there is *some* limit. At the
> same time, it may totally be worth accepting 2 or 3 index scans during
> some eventual VACUUM operation if there are many more VACUUM
> operations that don't even touch the index -- that's a good deal!
> Also, it may actually be inherently necessary to accept a small risk
> of having a future VACUUM operation that does multiple scans of each
> index -- that is probably a necessary part of skipping index vacuuming
> each time.
>
> Think about the cost of index vacuuming (the amount of I/O and the
> duration of index vacuuming) as less as less memory is available for
> TIDs. It's non-linear. The cost explodes once we're past a certain
> point. The truly important thing is to "never get killed by the
> explosion".

I just remembered this blog post, which gives a nice high level
summary of my mental model for things like this:

https://jessitron.com/2021/01/18/when-costs-are-nonlinear-keep-it-small/

This patch should eliminate inefficient index vacuuming involving very
small "batch sizes" (i.e. a small number of TIDs/index tuples to
delete from indexes). At the same time, it should not allow the batch
size to get too large because that's also inefficient. Perhaps larger
batch sizes are not exactly inefficient -- maybe they're risky. Though
risky is actually kind of the same thing as inefficient, at least to
me.

So IMV what we want to do here is to recognize cases where "batch
size" is so small that index vacuuming couldn't possibly be efficient.
We don't need to truly understand how that might change over time in
each case -- this is relatively easy.

There is some margin for error here, even with this reduced-scope
version that just does the SKIP_VACUUM_PAGES_RATIO thing. The patch
can afford to make suboptimal decisions about the scheduling of index
vacuuming over time (relative to the current approach), provided the
additional cost is at least *tolerable* -- that way we are still very
likely to win in the aggregate, over time. However, the patch cannot
be allowed to create a new risk of significantly worse performance for
any one VACUUM operation.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-03-14 19:41:15 Re: pl/pgsql feature request: shorthand for argument and local variable references
Previous Message Ibrar Ahmed 2021-03-14 18:55:06 Re: Transactions involving multiple postgres foreign servers, take 2