Re: IPC/MultixactCreation on the Standby server

From: Álvaro Herrera <alvherre(at)kurilemu(dot)de>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Dmitry <dsy(dot)075(at)yandex(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: IPC/MultixactCreation on the Standby server
Date: 2025-07-18 10:30:48
Message-ID: 202507181030.k5pakywfa3xk@alvherre.pgsql
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025-Jul-17, Andrey Borodin wrote:

> Thinking more about the problem I see 3 ways to deal with this deadlock:
> 1. We check for recovery conflict even in presence of
> InterruptHoldoffCount. That's what patch v4 does.
> 2. Teach page_collect_tuples() to do HeapTupleSatisfiesVisibility()
> without holding buffer lock.
> 3. Why do we even HOLD_INTERRUPTS() when aquire shared lock??

Hmm, as you say, doing (3) is a very invasive system-wide change, but
can we do it more localized? I mean, what if we do RESUME_INTERRUPTS()
just before going to sleep on the CV, and restore with HOLD_INTERRUPTS()
once the sleep is done? That would only affect this one place rather
than the whole system, and should also (AFAICS) solve the issue.

> Yet, I see 3 as a correct solution. Can't we just abstain from
> HOLD_INTERRUPTS() if taken LWLock is not exclusive?

Hmm, the code in LWLockAcquire says

/*
* Lock out cancel/die interrupts until we exit the code section protected
* by the LWLock. This ensures that interrupts will not interfere with
* manipulations of data structures in shared memory.
*/
HOLD_INTERRUPTS();

which means if we want to change this, we would have to inspect every
single use of LWLocks in shared mode in order to be certain that such a
change isn't problematic. This is a discussion I'm not prepared for.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Si quieres ser creativo, aprende el arte de perder el tiempo"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2025-07-18 10:43:04 Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Previous Message Dean Rasheed 2025-07-18 10:29:44 Re: Foreign key isolation tests