Re: xid wraparound danger due to INDEX_CLEANUP false

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-12-18 07:46:27
Message-ID: CAH2-WzmRgxJ4QkYsAZ=9X-RNrxB8YSvTFeJ-b1N=C4DkJLKJ3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 15, 2020 at 6:44 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> In connection with this change, we would need to rethink the meaning
> of the INDEX_CLEANUP option. As of now, if it's not set (i.g.
> VACOPT_TERNARY_DEFAULT in the code), it's treated as true and will do
> heap clean. But I think we can make it something like a neutral state
> by default. This neutral state could be "on" and "off" depending on
> several factors including the answers of ambulkdeletestrategy(), the
> table status, and user's request. In this context, specifying
> INDEX_CLEANUP would mean making the neutral state "on" or "off" by
> user's request.

I think a new value such as "smart" should be introduced, which can
become the default.

> The table status that could influence the decision
> could concretely be, for instance:
>
> * Removing LP_DEAD accumulation due to skipping bulkdelete() for a long time.
> * Making pages all-visible for index-only scan.
>
> We would not benefit much from the bulkdeletestrategy() idea for now.
> But there are potential enhancements using this API:
>
> * If bottom-up index deletion feature[1] is introduced, individual
> indexes could be a different situation in terms of dead tuple
> accumulation; some indexes on the table can delete its garbage index
> tuples without bulkdelete(). A problem will appear that doing
> bulkdelete() for such indexes would not be efficient. This problem is
> solved by this proposal because we can do bulkdelete() for a subset of
> indexes on the table.

The chances of the bottom-up index deletion being committed for
PostgreSQL 14 are very high. While it hasn't received too much review,
there seems to be very little downside, and lots of upside.

> * If retail index deletion feature[2] is introduced, we can make the
> return value of bulkdeletestrategy() a ternary value: "do_bulkdete",
> "do_indexscandelete", and "no".

Makes sense.

> * We probably can introduce a threshold of the number of dead tuples
> to control whether or not to do index tuple bulk-deletion (like
> bulkdelete() version of vacuum_cleanup_index_scale_factor). In the
> case where the amount of dead tuples is slightly larger than
> maitenance_work_mem the second time calling to bulkdelete will be
> called with a small number of dead tuples, which is inefficient. This
> problem is also solved by this proposal by allowing a subset of
> indexes to skip bulkdelete() if the number of dead tuple doesn't
> exceed the threshold.

Good idea. Maybe this won't be possible for PostgreSQL 14, but this is
the kind of possibility that we should try to unlock. I had a
similar-yet-different idea to this idea of Masahiko's, actually, which
is to use LSN to determine (unreliably) if a B-Tree leaf page is
likely to have garbage tuples within VACUUM.

This other idea probably also won't happen for PostgreSQL. That's not
important. The truly important thing is that we come up with the right
*general* design, that can support either technique in the future. I'm
not sure which precise design will work best, but I am confident that
*some* combination of these two ideas (or other ideas) will work very
well. Right now we don't have the appropriate general framework.

> Any thoughts?

Nothing to add to what you said, really. I agree that it makes sense
to think of all of these things at the same time.

It'll be easier to see how far these different ideas can be pushed
once a prototype is available.

> I'm writing a PoC patch so will share it.

Great! I suggest starting a new thread for that.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-12-18 08:30:56 Re: Minor documentation error regarding streaming replication protocol
Previous Message Tang, Haiying 2020-12-18 07:45:33 RE: [Patch] Optimize dropping of relation buffers using dlist