Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-16 02:39:18
Message-ID: 20211116023918.5g6d6izevronx76m@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2021-11-10 19:08:21 -0800, Andres Freund wrote:
> On 2021-11-09 19:07:02 -0800, Peter Geoghegan wrote:
> > @@ -700,11 +720,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
> > }
> >
> > /*
> > - * Remember the last DEAD tuple seen. We will advance past
> > - * RECENTLY_DEAD tuples just in case there's a DEAD one after them;
> > - * but we can't advance past anything else. We have to make sure that
> > - * we don't miss any DEAD tuples, since DEAD tuples that still have
> > - * tuple storage after pruning will confuse VACUUM.
> > + * Remember the last DEAD tuple seen from this HOT chain.
> > + *
> > + * We expect to sometimes find a RECENTLY_DEAD tuple after a DEAD
> > + * tuple. When this happens, the RECENTLY_DEAD tuple will be treated
> > + * as if it was DEAD all along. Manage that now.
>
> I now actually wonder why this is correct. There's another comment about it:
>
> > We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple.
> > * This is OK because a RECENTLY_DEAD tuple preceding a DEAD tuple is really
>
> But that doesn't justify very much.
>
> What prevents the scenario that some other backend e.g. has a snapshot with
> xmin=xmax=RECENTLY_DEAD-row. If the RECENTLY_DEAD row has an xid that is later
> than the DEAD row, this afaict would make it perfectly legal to prune the DEAD
> row, but *not* the RECENTLY_DEAD one.

I think the code is actually correct if awfully commented. I.e. the above
scenario cannot happen in a harmful situation. The relevant piece is that the
set of "valid" snapshots is limited (for a specific cluster history).

A row can only be RECENTLY_DEAD if the deleting / updating transaction
committed.

An intermediary RECENTLY_DEAD row followed by a DEAD row would only be visible
to a snapshot that either had

1) row.xmax >= Snapshot.xmax
2) row.xmax in Snapshot.xip.

There cannot be a snapshot that sees the RECENTLY_DEAD row while also seeing
the subsequent row as DEAD, because the existence of one of the two conditions
would have held back the xid horizon:

1) is not possible, because Snapshot.xmin is <= Snapshot.xmax, and
Snapshot.xmin >= MyProc->xmin. The subsequent row could not be DEAD.

2) is not possible, because .xip only contains xids >= Snapshot.xmin.

IOW, there *can* be a snapshot with xmin=xmax=RECENTLY_DEAD-row, but it'd not
see the RECENTLY_DEAD row anyway.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2021-11-16 03:42:35 Re: Logical Replication not working for few Tables
Previous Message Tom Lane 2021-11-15 22:07:45 Re: pg_restore depending on user functions