Re: xid wraparound danger due to INDEX_CLEANUP false

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-11-20 01:57:49
Message-ID: CAD21AoBGQA76_5PA3cN_7SGmNSkwQRyHiVMPohREYe6VuG03Yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 29, 2020 at 9:51 PM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Sun, 28 Jun 2020 at 02:44, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >
> > On Fri, Jun 26, 2020 at 10:15 PM Masahiko Sawada
> > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > Regarding to the extent of the impact, this bug will affect the user
> > > who turned vacuum_index_cleanup off or executed manually vacuum with
> > > INDEX_CLEANUP off for a long time, after some vacuums. On the other
> > > hand, the user who uses INDEX_CLEANUP off on the spot or turns
> > > vacuum_index_cleanup off of the table from the start would not be
> > > affected or less affected.
> >
> > I don't think that it's likely to cause too much trouble. It's already
> > possible to leak deleted pages, if only because the FSM isn't crash
> > safe. Actually, the nbtree README says this, and has since 2003:
> >
> > """
> > (Note: if we find a deleted page with an extremely old transaction
> > number, it'd be worthwhile to re-mark it with FrozenTransactionId so that
> > a later xid wraparound can't cause us to think the page is unreclaimable.
> > But in more normal situations this would be a waste of a disk write.)
> > """
> >
> > But, uh, isn't the btvacuumcleanup() call supposed to avoid
> > wraparound? Who knows?!
> >
> > It doesn't seem like the recycling aspect of page deletion was
> > rigorously designed, possibly because it's harder to test than page
> > deletion itself. This is a problem that we should fix.
>
> Agreed.
>
> >
> > > I apologize for writing this patch without enough consideration. I
> > > should have been more careful as I learned the nbtree page recycle
> > > strategy when discussing vacuum_cleanup_index_scale_factor patch.
> >
> > While it's unfortunate that this was missed, let's not lose
> > perspective. Anybody using the INDEX_CLEANUP feature (whether it's
> > through a direct VACUUM, or by using the reloption) is already asking
> > for an extreme behavior: skipping regular index vacuuming. I imagine
> > that the vast majority of users that are in that position just don't
> > care about the possibility of leaking deleted pages. They care about
> > avoiding a real disaster from XID wraparound.
>
> For back branches, I'm considering how we let users know about this.
> For safety, we can let users know that we recommend avoiding
> INDEX_CLEANUP false unless it's necessary to avoid running out of XIDs
> on the documentation and/or the release note. But on the other hand,
> since there is the fact that leaving recyclable pages is already
> possible to happen as you mentioned I'm concerned it gets the user
> into confusion and might needlessly incite unrest of users. I'm
> thinking what we can do for users, in addition to leaving the summary
> of this discussion as a source code comment. What do you think?
>

Several months passed from the discussion. We decided not to do
anything on back branches but IIUC the fundamental issue is not fixed
yet. The issue pointed out by Andres that we should leave the index AM
to decide whether doing vacuum cleanup or not when INDEX_CLEANUP is
specified is still valid. Is that right?

For HEAD, there was a discussion that we change lazy vacuum and
bulkdelete and vacuumcleanup APIs so that it calls these APIs even
when INDEX_CLEANUP is specified. That is, when INDEX_CLEANUP false is
specified, it collects dead tuple TIDs into maintenance_work_mem space
and passes the flag indicating INDEX_CLEANUP is specified or not to
index AMs. Index AM decides whether doing bulkdelete/vacuumcleanup. A
downside of this idea would be that we will end up using
maintenance_work_mem even if all index AMs of the table don't do
bulkdelete/vacuumcleanup at all.

The second idea I came up with is to add an index AM API (say,
amcanskipindexcleanup = true/false) telling index cleanup is skippable
or not. Lazy vacuum checks this flag for each index on the table
before starting. If index cleanup is skippable in all indexes, it can
choose one-pass vacuum, meaning no need to collect dead tuple TIDs in
maintenance_work_mem. All in-core index AM will set to true. Perhaps
it’s true (skippable) by default for backward compatibility.

The in-core AMs including btree indexes will work same as before. This
fix is to make it more desirable behavior and possibly to help other
AMs that require to call vacuumcleanup in all cases. Once we fix it I
wonder if we can disable index cleanup when autovacuum’s
anti-wraparound vacuum.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message kuroda.hayato@fujitsu.com 2020-11-20 02:05:09 RE: Terminate the idle sessions
Previous Message James Hilliard 2020-11-20 01:55:18 Re: [PATCH 1/1] Fix compilation on mac with Xcode >= 11.4.