From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Sandeep Thakkar <sandeep(dot)thakkar(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, buildfarm-admins(at)lists(dot)postgresql(dot)org, CM Team <cm(at)enterprisedb(dot)com> |
Subject: | Re: "ERROR: latch already owned" on gharial |
Date: | 2024-02-08 12:57:47 |
Message-ID: | 30a17cd9-9ae1-4aaf-b08c-f8d1a8e4287d@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 08/02/2024 04:08, Soumyadeep Chakraborty wrote:
> A possible ordering of events:
>
> (1) DisownLatch() is called by pid Y during ProcKill() and the write for
> latch->owner_pid = 0 is NOT yet flushed to shmem.
>
> (2) The PGPROC object for pid Y is returned to the free list.
>
> (3) Pid X sees the same PGPROC object on the free list and grabs it.
>
> (4) Pid X does sanity check inside OwnLatch during InitProcess and
> still sees the
> old value of latch->owner_pid = Y (and not = 0), and trips the ERROR.
>
> The above sequence of operations should apply to PG HEAD as well.
>
> Suggestion:
>
> Should we do a pg_memory_barrier() at the end of DisownLatch(), like in
> ResetLatch(), like the one introduced in [3]? This would ensure that the write
> latch->owner_pid = 0; is flushed to shmem. The attached patch does this.
Hmm, there is a pair of SpinLockAcquire() and SpinLockRelease() in
ProcKill(), before step 3 can happen. Comment in spin.h about
SpinLockAcquire/Release:
> * Load and store operations in calling code are guaranteed not to be
> * reordered with respect to these operations, because they include a
> * compiler barrier. (Before PostgreSQL 9.5, callers needed to use a
> * volatile qualifier to access data protected by spinlocks.)
That talks about a compiler barrier, though, not a memory barrier. But
looking at the implementations in s_lock.h, I believe they do act as
memory barrier, too.
So you might indeed have that problem on 9.4, but AFAICS not on later
versions.
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Nazir Bilal Yavuz | 2024-02-08 13:07:36 | Re: Simplify documentation related to Windows builds |
Previous Message | jian he | 2024-02-08 12:34:56 | Re: 2024-02-08 release announcement draft |