Re: xid wraparound danger due to INDEX_CLEANUP false

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-12-16 02:43:24
Message-ID: CAD21AoAJk7G+UKF30b+OxGtozrBc9Ye8qB=yWDB68WpJOmTGxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 21, 2020 at 8:03 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Fri, Nov 20, 2020 at 2:17 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > Does that make sense?
> >
> > I *think* so. For me the point is that the index never has a right to
> > insist on being vacuumed, but it can offer an opinion on how helpful
> > it would be.
>
> Right, that might be the single most important point. It's a somewhat
> more bottom-up direction for VACUUM, that is still fundamentally
> top-down. Because that's still necessary.
>
> Opportunistic heap pruning is usually very effective, so today we
> realistically have these 4 byte line pointers accumulating in heap
> pages. The corresponding "bloatum" in index pages is an index tuple +
> line pointer (at least 16 bytes + 4 bytes). Meaning that we accumulate
> that *at least* 20 bytes for each 4 bytes in the table. And, indexes
> care about *where* items go, making the problem even worse. So in the
> absence of index tuple LP_DEAD setting/deletion (or bottom-up index
> deletion in Postgres 14), the problem in indexes is probably at least
> 5x worse.
>
> The precise extent to which this is true will vary. It's a mistake to
> try to reason about it at a high level, because there is just too much
> variation for that approach to work. We should just give index access
> methods *some* say. Sometimes this allows index vacuuming to be very
> lazy, other times it allows index vacuuming to be very eager. Often
> this variation exists among indexes on the same table.
>
> Of course, vacuumlazy.c is still responsible for not letting the
> accumulation of LP_DEAD heap line pointers get out of hand (without
> allowing index TIDs to point to the wrong thing due to dangerous TID
> recycling issues/bugs). The accumulation of LP_DEAD heap line pointers
> will often take a very long time to get out of hand. But when it does
> finally get out of hand, index access methods don't get to veto being
> vacuumed. Because this isn't actually about their needs anymore.
>
> Actually, the index access methods never truly veto anything. They
> merely give some abstract signal about how urgent it is to them (like
> the 0.0 - 1.0 thing). This difference actually matters. One index
> among many on a table saying "meh, I guess I could benefit from some
> index vacuuming if it's no real trouble to you vacuumlazy.c" rather
> than saying "it's absolutely unnecessary, don't waste CPU cycles
> vacuumlazy.c" may actually shift how vacuumlazy.c processes the heap
> (at least occasionally). Maybe the high level VACUUM operation decides
> that it is worth taking care of everything all at once -- if all the
> indexes together either say "meh" or "now would be a good time", and
> vacuumlazy.c then notices that the accumulation of LP_DEAD line
> pointers is *also* becoming a problem (it's also a "meh" situation),
> then it can be *more* ambitious. It can do a traditional VACUUM early.
> Which might still make sense.
>
> This also means that vacuumlazy.c would ideally think about this as an
> optimization problem. It may be lazy or eager for the whole table,
> just as it may be lazy or eager for individual indexes. (Though the
> eagerness/laziness dynamic is probably much more noticeable with
> indexes in practice.)
>

I discussed this topic off-list with Peter Geoghegan. And we think
that we can lead this fix to future improvement. Let me summarize the
proposals.

The first proposal is the fix of this inappropriate behavior discussed
on this thread. We pass a new flag in calling bulkdelete(), indicating
whether or not the index can safely skip this bulkdelete() call. This
is equivalent to whether or not lazy vacuum will do the heap clean
(making LP_DEAD LP_UNUSED in lazy_vacuum_heap()). If it's true
(meaning to do heap clean), since dead tuples referenced by index
tuples will be physically removed, index AM would have to delete the
index tuples. If it's false, we call to bulkdelete() with this flag so
that index AM can safely skip bulkdelete(). Of course index AM also
can dare not to skip it because of its personal reason. Index AM
including BRIN that doesn't store heap TID can decide whether or not
to do regardless of this flag.

The next proposal upon the above proposal is to add a new index AM
API, say ambulkdeletestrategy(), which is called before bulkdelete()
for each index and asks the index bulk-deletion strategy. In this API,
lazy vacuum asks, "Hey index X, I collected garbage heap tuples during
heap scanning, how urgent is vacuuming for you?", and the index
answers either "it's urgent" when it wants to do bulk-deletion or
"it's not urgent, I can skip it". The point of this proposal is to
isolate heap vacuum and index vacuum for each index so that we can
employ different strategies for each index. Lazy vacuum can decide
whether or not to do heap clean based on the answers from the indexes.
Lazy vacuum can set the flag I proposed above according to the
decision. If all indexes answer 'yes' (meaning it will do
bulkdelete()), lazy vacuum can do heap clean. On the other hand, if
even one index answers 'no' (meaning it will not do bulkdelete()),
lazy vacuum cannot do the heap clean. On the other hand, lazy vacuum
would also be able to require indexes to do bulkdelete() for its
personal reason. It’s something like saying "Hey index X, you answered
not to do bulkdelete() but since heap clean is necessary for me please
don't skip bulkdelete()".

In connection with this change, we would need to rethink the meaning
of the INDEX_CLEANUP option. As of now, if it's not set (i.g.
VACOPT_TERNARY_DEFAULT in the code), it's treated as true and will do
heap clean. But I think we can make it something like a neutral state
by default. This neutral state could be "on" and "off" depending on
several factors including the answers of ambulkdeletestrategy(), the
table status, and user's request. In this context, specifying
INDEX_CLEANUP would mean making the neutral state "on" or "off" by
user's request. The table status that could influence the decision
could concretely be, for instance:

* Removing LP_DEAD accumulation due to skipping bulkdelete() for a long time.
* Making pages all-visible for index-only scan.

We would not benefit much from the bulkdeletestrategy() idea for now.
But there are potential enhancements using this API:

* If bottom-up index deletion feature[1] is introduced, individual
indexes could be a different situation in terms of dead tuple
accumulation; some indexes on the table can delete its garbage index
tuples without bulkdelete(). A problem will appear that doing
bulkdelete() for such indexes would not be efficient. This problem is
solved by this proposal because we can do bulkdelete() for a subset of
indexes on the table.

* If retail index deletion feature[2] is introduced, we can make the
return value of bulkdeletestrategy() a ternary value: "do_bulkdete",
"do_indexscandelete", and "no".

* We probably can introduce a threshold of the number of dead tuples
to control whether or not to do index tuple bulk-deletion (like
bulkdelete() version of vacuum_cleanup_index_scale_factor). In the
case where the amount of dead tuples is slightly larger than
maitenance_work_mem the second time calling to bulkdelete will be
called with a small number of dead tuples, which is inefficient. This
problem is also solved by this proposal by allowing a subset of
indexes to skip bulkdelete() if the number of dead tuple doesn't
exceed the threshold.

Any thoughts?

I'm writing a PoC patch so will share it.

Regards,

[1] https://www.postgresql.org/message-id/CAH2-Wzm%2BmaE3apHB8NOtmM%3Dp-DO65j2V5GzAWCOEEuy3JZgb2g%40mail.gmail.com
[2] https://www.postgresql.org/message-id/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-12-16 02:55:29 Re: Add Information during standby recovery conflicts
Previous Message Amit Kapila 2020-12-16 02:31:46 Re: Misleading comment in prologue of ReorderBufferQueueMessage