Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2022-02-04 15:15:21
Message-ID: 391181.1643987721@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2022-02-03 15:54:28 -0500, Tom Lane wrote:
>> I'm writing release notes and wondering what I can tell users about
>> how to detect or recover from this bug. Is a REINDEX sufficient,
>> or is the presence of the bogus redirect item going to cause
>> persistent problems?

> Good questions.

> It's hard to answer whether there's any danger after a REINDEX. Afaics the
> build scan would just pick the "lower offset" version of the root
> pointer. Which should be fine.

> It's possible there could be trouble down the line, e.g. heap pruning doing
> something weird once starting in a corrupted state, that then leads REINDEX to
> do something bogus. The simple cases look OK, because a second visit/action by
> heap_prune_chain for one tid from two different root pointers would see
> ->marked[offnum] as true. It gets more complicated once multiple intermediary
> row versions are involved, because the intermediary row versions won't be in
> ->marked if an entire chain is pruned. But afaict that should still end up
> looking like a hot chain ending in an aborted tuple or such.

OK, I'll just recommend REINDEX.

> Except that it's not trivial to get right, I could see it being worthwhile to
> add verification of hot chains to amcheck, and backpatch that to 14.

I'd have thought that'd be a fundamental component of a heap check
module, so +1 for adding it. Dunno about the back-patch part though.
It seems like a new feature.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2022-02-04 16:05:28 Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0
Previous Message Tom Lane 2022-02-04 14:47:50 Re: BUG #17394: pg_dump: query returned 0 rows instead of one: