Re: pg16b2: REINDEX segv on null pointer in RemoveFromWaitQueue

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg16b2: REINDEX segv on null pointer in RemoveFromWaitQueue
Date: 2023-07-24 01:50:13
Message-ID: CAD21AoCu9gWTmOAtzSf6Hx4z=Jq_oDpqhb7_MTFYrvA9ijaUDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 12, 2023 at 8:52 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Mon, Jul 10, 2023 at 09:01:37PM -0500, Justin Pryzby wrote:
> > An instance compiled locally, without assertions, failed like this:
> >
> ...
> >
> > => REINDEX was running, with parallel workers, but deadlocked with
> > ANALYZE, and then crashed.
> >
> > It looks like parallel workers are needed to hit this issue.
> > I'm not sure if the issue is specific to extended stats - probably not.
> >
> > I reproduced the crash with manual REINDEX+ANALYZE, and with assertions (which
> > were not hit), and on a more recent commit (1124cb2cf). The crash is hit about
> > 30% of the time when running a loop around REINDEX and then also running
> > ANALYZE.
> >
> > I hope someone has a hunch where to look; so far, I wasn't able to create a
> > minimal reproducer.
>
> I was able to reproduce this in isolation by reloading data into a test
> instance, ANALYZEing the DB to populate pg_statistic_ext_data (so it's
> over 3MB in size), and then REINDEXing the stats_ext index in a loop
> while ANALYZEing a table with extended stats.
>
> I still don't have a minimal reproducer, but on a hunch I found that
> this fails at 5764f611e but not its parent.
>
> commit 5764f611e10b126e09e37fdffbe884c44643a6ce
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Date: Wed Jan 18 11:41:14 2023 -0800
>
> Use dlist/dclist instead of PROC_QUEUE / SHM_QUEUE for heavyweight locks
>

Good catch. I didn't realize this email but while investigating the
same issue that has been reported recently[1], I reached the same
commit. I've sent my analysis and a patch to fix this issue there.
Andres, since this issue seems to be relevant with your commit
5764f611e, could you please look at this issue and my patch?

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoDs7vzK7NErse7xTruqY-FXmM%2B3K00SdBtMcQhiRNkoeQ%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2023-07-24 02:10:32 Re: POC: GROUP BY optimization
Previous Message Masahiko Sawada 2023-07-24 01:09:13 Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.