Re: pg16b2: REINDEX segv on null pointer in RemoveFromWaitQueue

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg16b2: REINDEX segv on null pointer in RemoveFromWaitQueue
Date: 2023-07-12 11:52:16
Message-ID: ZK6T8FpI4aeeqQO3@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 10, 2023 at 09:01:37PM -0500, Justin Pryzby wrote:
> An instance compiled locally, without assertions, failed like this:
>
...
>
> => REINDEX was running, with parallel workers, but deadlocked with
> ANALYZE, and then crashed.
>
> It looks like parallel workers are needed to hit this issue.
> I'm not sure if the issue is specific to extended stats - probably not.
>
> I reproduced the crash with manual REINDEX+ANALYZE, and with assertions (which
> were not hit), and on a more recent commit (1124cb2cf). The crash is hit about
> 30% of the time when running a loop around REINDEX and then also running
> ANALYZE.
>
> I hope someone has a hunch where to look; so far, I wasn't able to create a
> minimal reproducer.

I was able to reproduce this in isolation by reloading data into a test
instance, ANALYZEing the DB to populate pg_statistic_ext_data (so it's
over 3MB in size), and then REINDEXing the stats_ext index in a loop
while ANALYZEing a table with extended stats.

I still don't have a minimal reproducer, but on a hunch I found that
this fails at 5764f611e but not its parent.

commit 5764f611e10b126e09e37fdffbe884c44643a6ce
Author: Andres Freund <andres(at)anarazel(dot)de>
Date: Wed Jan 18 11:41:14 2023 -0800

Use dlist/dclist instead of PROC_QUEUE / SHM_QUEUE for heavyweight locks

I tried compiling with -DILIST_DEBUG, but that shows nothing beyond
segfaulting, which seems to show that the lists themselves are fine.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2023-07-12 12:16:00 Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index
Previous Message Peter Eisentraut 2023-07-12 11:48:56 Re: Synchronizing slots from primary to standby