Re: New IndexAM API controlling index vacuum strategies

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-03-16 11:04:35
Message-ID: CAD21AoD8XGzk0=TXvpHiAAFkE7cFcPVD6tGZYQVdaD9AyHRREg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 14, 2021 at 12:23 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Fri, Mar 12, 2021 at 9:34 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > I agreed that when we're close to overrunning the
> > maintnenance_work_mem space, the situation changes. If we skip it in
> > even that case, the next vacuum will be likely to use up
> > maintenance_work_mem, leading to a second index scan. Which is
> > bad.
> >
> > If this threshold is aimed to avoid a second index scan due to
> > overrunning the maintenance_work_mem, using a ratio of
> > maintenance_work_mem would be a good idea. On the other hand, if it's
> > to avoid accumulating debt affecting the cost of index vacuuming,
> > using a ratio of the total heap tuples seems better.
>
> It's both, together. These are two *independent*
> considerations/thresholds. At least in the code that decides whether
> or not we skip. Either threshold can force a full index scan (index
> vacuuming).
>
> What I'm really worried about is falling behind (in terms of the
> amount of memory available for TIDs to delete in indexes) without any
> natural limit. Suppose we just have the SKIP_VACUUM_PAGES_RATIO
> threshold (i.e. no maintenance_work_mem threshold thing). With just
> SKIP_VACUUM_PAGES_RATIO there will be lots of tables where index
> vacuuming is almost always avoided, which is good. But
> SKIP_VACUUM_PAGES_RATIO might be a bit *too* effective. If we have to
> do 2 or even 3 scans of the index when we finally get to index
> vacuuming then that's not great, it's inefficient -- but it's at least
> *survivable*. But what if there are 10, 100, even 1000 bulk delete
> calls for each index when it finally happens? That's completely
> intolerable.
>
> In other words, I am not worried about debt, exactly. Debt is normal
> in moderation. Healthy, even. I am worried about bankruptcy, perhaps
> following a rare and extreme event. It's okay to be imprecise, but all
> of the problems must be survivable. The important thing to me for a
> maintenance_work_mem threshold is that there is *some* limit. At the
> same time, it may totally be worth accepting 2 or 3 index scans during
> some eventual VACUUM operation if there are many more VACUUM
> operations that don't even touch the index -- that's a good deal!
> Also, it may actually be inherently necessary to accept a small risk
> of having a future VACUUM operation that does multiple scans of each
> index -- that is probably a necessary part of skipping index vacuuming
> each time.
>
> Think about the cost of index vacuuming (the amount of I/O and the
> duration of index vacuuming) as less as less memory is available for
> TIDs. It's non-linear. The cost explodes once we're past a certain
> point. The truly important thing is to "never get killed by the
> explosion".

Agreed.

Maybe it's a good idea to add the ratio of dead tuples to the total
heap tuples as a threshold? I think that there are two risks in case
where we collect many dead tuples: maintenance_work_mem overrun and
LP_DEAD accumulation, even if those are concentrated in less than 1%
heap pages. The former risk is dealt with by the maintenance_work_mem
threshold as we discussed. But that threshold might not be enough to
deal with the latter risk. For example, a very larege table could have
many dead tuples in less than 1% heap pages and we may set
maintenance_work_mem to a high value. In that case, it might be ok in
terms of index vacuuming but might not be ok in terms of heap. So I
think we want not to skip index vacuuming. It’s an extream case. But
we also should note that the absolute number of tuples of 70% of
maintenance_work_mem is tend to increase if we improve memory
efficiency to store TIDs. So I think adding “dead tuples must be less
than 50% of total heap tuples” threshold to skip index vacuuming would
be a good safe guard against such an extream case.

This threshold is applied only at the last
lazy_vacuum_table_and_indexes() call so we know the total heap tuples
at that point. If we run out maintenance_work_mem in the middle of
heap scan, I think we should do index vacuuming regardless of the
number of dead tuples and the number of pages having at least one
LP_DEAD.

>
> > The situation where we need to deal with here is a very large table
> > that has a lot of dead tuples but those fit in fewer heap pages (less
> > than 1% of all heap blocks). In this case, it's likely that the number
> > of dead tuples also is relatively small compared to the total heap
> > tuples, as you mentioned. If dead tuples fitted in fewer pages but
> > accounted for most of all heap tuples in the heap, it would be a more
> > serious situation, there would definitely already be other problems.
> > So considering those conditions, I agreed to use a ratio of
> > maintenance_work_mem as a threshold. Maybe we can increase the
> > constant to 70, 80, or so.
>
> You mean 70% of maintenance_work_mem? That seems fine to me.

Yes.

> See my
> "Why does lazy_vacuum_table_and_indexes() not make one decision for
> the entire VACUUM on the first call, and then stick to its decision?"
> remarks at the end of this email, though -- maybe it should not be an
> explicit threshold at all.
>
> High level philosophical point: In general I think that the algorithm
> for deciding whether or not to perform index vacuuming should *not* be
> clever. It should also not focus on getting the benefit of skipping
> index vacuuming. I think that a truly robust design will be one that
> always starts with the assumption that index vacuuming will be
> skipped, and then "works backwards" by considering thresholds/reasons
> to *not* skip. For example, the SKIP_VACUUM_PAGES_RATIO thing. The
> risk of "explosions" or "bankruptcy" can be thought of as a cost here,
> too.
>
> We should simply focus on the costs directly, without even trying to
> understand the relationship between each of the costs, and without
> really trying to understand the benefit to the user from skipping
> index vacuuming.

Agreed.

>
> Question about your patch: lazy_vacuum_table_and_indexes() can be
> called multiple times (when low on maintenance_work_mem). Each time it
> is called we decide what to do for that call and that batch of TIDs.
> But...why should it work that way? The whole idea of a
> SKIP_VACUUM_PAGES_RATIO style threshold doesn't make sense to me if
> the code in lazy_vacuum_table_and_indexes() resets npages_deadlp (sets
> it to 0) on each call. I think that npages_deadlp should never be
> reset during a single VACUUM operation.
>
> npages_deadlp is supposed to be something that we track for the entire
> table. The patch actually compares it to the size of the whole table *
> SKIP_VACUUM_PAGES_RATIO inside lazy_vacuum_table_and_indexes():
>
> > + if (*npages_deadlp > RelationGetNumberOfBlocks(onerel) * SKIP_VACUUM_PAGES_RATIO)
> > + {
>
> > + }
>
> The code that I have quoted here is actually how I expect
> SKIP_VACUUM_PAGES_RATIO to work, but I notice an inconsistency:
> lazy_vacuum_table_and_indexes() resets npages_deadlp later on, which
> makes either the quoted code or the reset code wrong (at least when
> VACUUM needs multiple calls to the lazy_vacuum_table_and_indexes()
> function). With multiple calls to lazy_vacuum_table_and_indexes() (due
> to low memory), we'll be comparing npages_deadlp to the wrong thing --
> because npages_deadlp cannot be treated as a proportion of the blocks
> in the *whole table*. Maybe the resetting of npages_deadlp would be
> okay if you also used the number of heap blocks that were considered
> since the last npages_deadlp reset, and then multiply that by
> SKIP_VACUUM_PAGES_RATIO (instead of
> RelationGetNumberOfBlocks(onerel)). But I suspect that the real
> solution is to not reset npages_deadlp at all (without changing the
> quoted code, which seems basically okay).

Agreed. That was my bad.

IIUC we should use npages_deadlp throughout the entire vacuum
operation whereas use maintennace_work_mem threshold for each
index/table vacuum cycle.

>
> With tables/workloads that the patch helps a lot, we expect that the
> SKIP_VACUUM_PAGES_RATIO threshold will *eventually* be crossed by one
> of these VACUUM operations, which *finally* triggers index vacuuming.
> So not only do we expect npages_deadlp to be tracked at the level of
> the entire VACUUM operation -- we might even imagine it growing slowly
> over multiple VACUUM operations, perhaps over many months. At least
> conceptually -- it should only grow across VACUUM operations, until
> index vacuuming finally takes place. That's my mental model for
> npages_deadlp, at least. It tracks an easy to understand cost, which,
> as I said, is what the
> threshold/algorithm/lazy_vacuum_table_and_indexes() should focus on.
>
> Why does lazy_vacuum_table_and_indexes() not make one decision for the
> entire VACUUM on the first call, and then stick to its decision? That
> seems much cleaner. Another advantage of that approach is that it
> might be enough to handle low maintenance_work_mem risks -- perhaps
> those can be covered by simply waiting until the first VACUUM
> operation that runs out of memory and so requires multiple
> lazy_vacuum_table_and_indexes() calls. If at that point we decide to
> do index vacuuming throughout the entire vacuum operation, then we
> will not allow the table to accumulate many more TIDs than we can
> expect to fit in an entire maintenance_work_mem space.

I might be missing something but even if
lazy_vacuum_table_and_indexes() decided to do index vacuuming
throughout the entire vacuum operation, it could skill skip it if it
collects only a few dead tuples in the next index vacuuming. Is that
right? Otherwise, it will end up doing index vacuuming against only a
few dead tuples.

I think we can do index vacuuming anyway if maintenance_work_mem runs
out. It's enough reason to do index vacuuming even if the collected
dead tuples are concentrated in less than 1% of total heap pages.
Assuming maintenance_work_mem won't be changed by the next vacuum
operation, it doesn't make sense to skip it at that time. So maybe we
can apply this threshold to index vacuuming called at the end of lazy
vacuum?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2021-03-16 11:07:34 Re: Permission failures with WAL files in 13~ on Windows
Previous Message Magnus Hagander 2021-03-16 10:40:12 Re: Permission failures with WAL files in 13~ on Windows