Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-08 18:32:39
Message-ID: CAH2-Wzka8u1Y+GNP41TQiHjLMP7mYG9h3i+5jq-FzSKgWXBk+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Nov 8, 2021 at 9:28 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> Interesting, I don't think I've observed those errors. In fact after the
> recent changes (I've compiled here from 39a31056) around assertion logic
> and index_delete_check_htid now I'm getting another type of crashes
> using your scripts. This time heap_page_prune_execute stumbles upon a
> non heap-only tuple trying to update unused line pointers:

It looks like the new heap_page_prune_execute() assertions catch the
same problem earlier. It's hard to not suspect the code in pruneheap.c
itself. Whatever else may have happened, the code in pruneheap.c ought
to not even try to set a non-heap-only tuple item to LP_UNUSED. ISTM
that it should explicitly look out for and avoid doing that.

Offhand, I wonder if we just need to have an additional check in
heap_prune_chain(), which is like the existing "If we find a redirect
somewhere else, stop --- it must not be same chain" handling used for
LP_REDIRECT items that aren't at the start of a HOT chain:

diff --git a/src/backend/access/heap/pruneheap.c
b/src/backend/access/heap/pruneheap.c
index c7331d810..3c72cdf67 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -646,6 +646,9 @@ heap_prune_chain(Buffer buffer, OffsetNumber
rootoffnum, PruneState *prstate)
!TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
break;

+ if (nchain > 0 && !HeapTupleHeaderIsHeapOnly(htup))
+ break; /* not at start of chain */
+
/*
* OK, this tuple is indeed a member of the chain.
*/

I would expect the "Check the tuple XMIN against prior XMAX" test to
do what we need in the vast vast majority of cases, but why should it
be 100% effective?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-11-08 18:36:46 Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
Previous Message Andres Freund 2021-11-08 18:08:33 Re: BUG #17268: Possible corruption in toast index after reindex index concurrently