Re: LogwrtResult contended spinlock

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Subject: Re: LogwrtResult contended spinlock
Date: 2024-04-06 06:38:45
Message-ID: CALj2ACUiysS-Rv0aHnUm2vhmkN=sbyRuAzYRBEiG=jVzZy2pcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 6, 2024 at 9:21 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Sat, Apr 6, 2024 at 6:55 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > Pushed 0001.
>
> Could that be related to the 3 failures on parula that look like this?
>
> TRAP: failed Assert("node->next == 0 && node->prev == 0"), File:
> "../../../../src/include/storage/proclist.h", Line: 63, PID: 29119
> 2024-04-05 16:16:26.812 UTC [29114:15] pg_regress/drop_operator LOG:
> statement: DROP OPERATOR <|(bigint, bigint);
> postgres: postgres regression [local] CREATE
> ROLE(ExceptionalCondition+0x4c)[0x9b3fdc]
> postgres: postgres regression [local] CREATE ROLE[0x8529e4]
> postgres: postgres regression [local] CREATE
> ROLE(LWLockWaitForVar+0xec)[0x8538fc]
> postgres: postgres regression [local] CREATE ROLE[0x54c7d4]
> postgres: postgres regression [local] CREATE ROLE(XLogFlush+0xf0)[0x552600]
> postgres: postgres regression [local] CREATE ROLE[0x54a9b0]
> postgres: postgres regression [local] CREATE ROLE[0x54bbdc]
>
> Hmm, the comments for LWLockWaitForVar say:
>
> * Be aware that LWLockConflictsWithVar() does not include a memory barrier,
> * hence the caller of this function may want to rely on an explicit barrier or
> * an implied barrier via spinlock or LWLock to avoid memory ordering issues.
>
> But that seems to be more likely to make LWLockWaitForVar suffer data
> races (ie hang), not break assertions about LWLock sanity, so I don't
> know what's going on there. I happened to have a shell on a Graviton
> box, but I couldn't reproduce it after a while...

Thanks for reporting. I'll try to spin up a similar instance like
parula and reproduce. Meanwhile, I'm wondering if it is somehow
related to what's discussed in "Why is parula failing?"
https://www.postgresql.org/message-id/4009739.1710878318%40sss.pgh.pa.us.
It seems like parula is behaving unexpectedly because of the compiler
and other stuff.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-04-06 06:48:34 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Bharath Rupireddy 2024-04-06 06:25:38 Re: Introduce XID age and inactive timeout based replication slot invalidation