Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-09 15:02:16
Message-ID: 20211109150216.fgfybn35mwnkeef3@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> On Mon, Nov 08, 2021 at 10:32:39AM -0800, Peter Geoghegan wrote:
> On Mon, Nov 8, 2021 at 9:28 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> > Interesting, I don't think I've observed those errors. In fact after the
> > recent changes (I've compiled here from 39a31056) around assertion logic
> > and index_delete_check_htid now I'm getting another type of crashes
> > using your scripts. This time heap_page_prune_execute stumbles upon a
> > non heap-only tuple trying to update unused line pointers:
>
> It looks like the new heap_page_prune_execute() assertions catch the
> same problem earlier. It's hard to not suspect the code in pruneheap.c
> itself. Whatever else may have happened, the code in pruneheap.c ought
> to not even try to set a non-heap-only tuple item to LP_UNUSED. ISTM
> that it should explicitly look out for and avoid doing that.
>
> Offhand, I wonder if we just need to have an additional check in
> heap_prune_chain(), which is like the existing "If we find a redirect
> somewhere else, stop --- it must not be same chain" handling used for
> LP_REDIRECT items that aren't at the start of a HOT chain:

Yes, adding such condition works in this case, no non-heap-only tuples
were recorded as unused in heap_prune_chain, and nothing else popped up
afterwards. But now after a couple of runs I could also reproduce (at
least partially) what Alexander was talking about:

ERROR: could not open relation with OID 1056321

Not sure yet where is it coming from.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Zuber Farooqui 2021-11-09 15:22:09 Re: BUG #17276: pg_tblspc Permission denied
Previous Message Noah Misch 2021-11-09 14:40:21 Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data