Re: xid wraparound danger due to INDEX_CLEANUP false

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-11-20 11:16:56
Message-ID: CAD21AoCMrOmkNdqut21rmwpHVj9fXoUeudeM_c-riKR7Bun3mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 20, 2020 at 12:56 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Thu, Nov 19, 2020 at 5:58 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > Several months passed from the discussion. We decided not to do
> > anything on back branches but IIUC the fundamental issue is not fixed
> > yet. The issue pointed out by Andres that we should leave the index AM
> > to decide whether doing vacuum cleanup or not when INDEX_CLEANUP is
> > specified is still valid. Is that right?
>
> I don't remember if Andres actually said that publicly, but I
> definitely did. I do remember discussing this with him privately, at
> which point he agreed with what I said. Which you just summarized
> well.
>
> > For HEAD, there was a discussion that we change lazy vacuum and
> > bulkdelete and vacuumcleanup APIs so that it calls these APIs even
> > when INDEX_CLEANUP is specified. That is, when INDEX_CLEANUP false is
> > specified, it collects dead tuple TIDs into maintenance_work_mem space
> > and passes the flag indicating INDEX_CLEANUP is specified or not to
> > index AMs.
>
> Right.
>
> > Index AM decides whether doing bulkdelete/vacuumcleanup. A
> > downside of this idea would be that we will end up using
> > maintenance_work_mem even if all index AMs of the table don't do
> > bulkdelete/vacuumcleanup at all.
>
> That is a downside, but I don't think that it's a serious downside.
> But it may not matter, because there are lots of reasons to move in
> this direction.
>
> > The second idea I came up with is to add an index AM API (say,
> > amcanskipindexcleanup = true/false) telling index cleanup is skippable
> > or not. Lazy vacuum checks this flag for each index on the table
> > before starting. If index cleanup is skippable in all indexes, it can
> > choose one-pass vacuum, meaning no need to collect dead tuple TIDs in
> > maintenance_work_mem. All in-core index AM will set to true. Perhaps
> > it’s true (skippable) by default for backward compatibility.
>
> (The terminology here is very confusing, because the goal of the
> INDEX_CLEANUP feature in v12 is not really to skip a call to
> btvacuumcleanup(). The goal is really to skip a call to
> btbulkdelete(). I will try to be careful.)
>
> I think that the ideal design here would be a new hybrid of two
> existing features:
>
> 1.) Your INDEX_CLEANUP feature from Postgres 12.
>
> and:
>
> 2.) Your vacuum_cleanup_index_scale_factor feature from Postgres 11.
>
> The INDEX_CLEANUP feature is very useful, because skipping indexes
> entirely can be very helpful for many reasons (e.g. faster manual
> VACUUM in the event of wraparound related emergencies). But
> INDEX_CLEANUP has 2 problems today:
>
> A. It doesn't interact well with vacuum_cleanup_index_scale_factor.
> This is the problem that has been discussed on this thread.
>
> and:
>
> B. It is an "all or nothing" thing. Unlike the
> vacuum_cleanup_index_scale_factor feature, it does not care about what
> the index AM/individual index wants. But it should.

Agreed.

>
> (**Thinks some more***)
>
> Actually, on second thought, maybe INDEX_CLEANUP only has one problem.
> Problem A is actually just a special case of problem B. There are many
> interesting opportunities created by solving problem B
> comprehensively.
>
> So, what useful enhancements to VACUUM are possible once we have
> something like INDEX_CLEANUP, that is sensitive to the needs of
> indexes? Well, you already identified one yourself, so obviously
> you're thinking about this in a similar way already:
>
> > The in-core AMs including btree indexes will work same as before. This
> > fix is to make it more desirable behavior and possibly to help other
> > AMs that require to call vacuumcleanup in all cases. Once we fix it I
> > wonder if we can disable index cleanup when autovacuum’s
> > anti-wraparound vacuum.
>
> Obviously this is a good idea. The fact that anti-wraparound vacuum
> isn't really special compared to regular autovacuum is *bad*.
> Obviously anti-wraparound is in some sense more important than regular
> vacuum. Making it as similar as possible to vacuum simply isn't
> helpful. Maybe it is slightly more elegant in theory, but in the real
> world it is a poor design. (See also: every single PostgreSQL post
> mortem that has ever been written.)
>
> But why stop with this? There are other big advantages to allowing
> individual indexes/index AMs influence of the INDEX_CLEANUP behavior.
> Especially if they're sensitive to the needs of particular indexes on
> a table (not just all of the indexes on the table taken together).
>
> As you may know, my bottom-up index deletion patch can more or less
> eliminate index bloat in indexes that don't get "logically changed" by
> many non-HOT updates. It's *very* effective with non-HOT updates and
> lots of indexes. See this benchmark result for a recent example:
>
> https://postgr.es/m/CAGnEbohYF_K6b0v=2uc289=v67qNhc3n01Ftic8X94zP7kKqtw@mail.gmail.com
>
> The feature is effective enough to make it almost unnecessary to
> VACUUM certain indexes -- though not necessarily other indexes on the
> same table. Of course, in the long term it will eventually be
> necessary to really vacuum these indexes. Not because the indexes
> themselves care, though -- they really don't (if they don't receive
> logical changes from non-HOT updates, and so benefit very well from
> the proposed bottom-up index deletion mechanism, they really have no
> selfish reason to care if they ever get vacuumed by autovacuum).
>
> The reason we eventually need to call ambulkdelete() with these
> indexes (with all indexes, actually) even with these enhancements is
> related to the heap. We eventually want to make LP_DEAD line pointers
> in the heap LP_UNUSED. But we should be lazy about it, and wait until
> it becomes a real problem. Maybe we can only do a traditional VACUUM
> (with a second pass of the heap for heap LP_UNUSED stuff) much much
> less frequently than today. At the same time, we can set the FSM for
> index-only scans much more frequently.
>
> It's also important that we really make index vacuuming a per-index
> thing. You can see this in the example benchmark I linked to, which
> was posted by Victor: no page splits in one never-logically-modified
> index, and some page splits in other indexes that were actually
> changed by UPDATEs again and again. Clearly you can have several
> different indexes on the same table that have very different needs.
>
> With some indexes we want to be extra lazy (these are indexes that
> make good use of bottom-up deletion). But with other indexes on the
> same table we want to be eager. Maybe even very eager. If we can make
> per-index decisions, then every individual part of the system works
> well. Currently, the problem with autovacuum scheduling is that it
> probably makes sense for the heap with the defaults (or something like
> them), and probably doesn't make any sense for indexes (though it also
> varies among indexes). So today we have maybe 7 different things
> (maybe 6 indexes + 1 table), and we pretend that they are only one
> thing. It's just a fantasy. The reality is that we have 7 things that
> have only a very loose and complicated relationship to each other. We
> need to stop believing in this fantasy, and start paying attention to
> the more complicated reality. The only way to do that is ask each
> index directly, while being prepared to get very different answers
> from each index on the same table.
>
> Here is what I mean by that: it would also probably be very useful to
> do something like a ambulkdelete() call for only a subset of indexes
> that really need it. So you aggressively vacuum the one index that
> really does get logically modified by an UPDATE, and not the other 6
> that don't. (Of course it's still true that we cannot have a second
> heap pass to make LP_DEAD line pointers in the heap LP_UNUSED --
> obviously that's unsafe unless we're 100% sure that nothing in any
> index points to the now-LP_UNUSED line pointer. But many big
> improvements are possible without violating this basic invariant.)

I had missed your bottom-up index deletion patch but it's a promising
improvement. With that patch, the number of dead tuples in individual
indexes may differ. So it's important that we make index vacuuming a
per-index thing.

Given that patch, it seems to me that it would be better to ask
individual index AM before calling to bulkdelete about the needs of
bulkdelete. That is, passing VACUUM options and the number of
collected dead tuples etc. to index AM, we ask index AM via a new
index AM API whether it wants to do bulkdelete or not. We call
bulkdelete for only indexes that answered 'yes'. If we got 'no' from
any one of the indexes, we cannot have a second heap pass.
INDEX_CLEANUP is not enforceable. When INDEX_CLEANUP is set to false,
we expect index AMs to return 'no' unless they have a special reason
for the needs of bulkdelete.

One possible benefit of this idea even without bottom-up index
deleteion patch would be something like
vacuum_index_cleanup_scale_factor for bulkdelete. For example, in the
case where the amount of dead tuple is slightly larger than
maitenance_work_mem the second time calling to bulkdelete will be
called with a small amount of dead tuples, which is less efficient. If
an index AM is able to determine not to do bulkdelete by comparing the
number of dead tuples to a threshold, it can avoid such bulkdelete
calling.

Also, as a future extension, once we have retail index deletion
feature, we might be able to make that API return a ternary value:
'no', 'do_bulkdelete', ‘do_indexscandelete, so that index AM can
choose the appropriate method of index deletion based on the
statistics.

But for making index vacuuming per-index thing, we need to deal with
the problem that we cannot know which indexes of the table still has
index tuples pointing to the collected dead tuple. For example, if an
index always says 'no' (not need bulkdelete therefore we need to keep
dead line pointers), the collected dead tuples might already be marked
as LP_DEAD and there might already not be index tuples pointing to
them in other index AMs. In that case we don't want to call to
bulkdelete for other indexes. Probably we can have additional
statistics like the number of dead tuples in individual indexes so
that they can determine the needs of bulkdelete. But it’s not a
comprehensive solution.

>
> If you are able to pursue this project, in whole or in part, I would
> definitely be supportive of that. I may be able to commit it. I think
> that this project has many benefits, not just one or two. It seems
> strategic.

Thanks, that’s really helpful. I’m going to work on that. Since things
became complicated by these two features that I proposed I’ll do my
best to sort out the problem and improve it in PG14.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2020-11-20 11:23:57 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Kyotaro Horiguchi 2020-11-20 11:16:42 Re: Asynchronous Append on postgres_fdw nodes.