Re: PoC: Add condition variable support to WaitEventSetWait()

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: Xuneng Zhou <xunengzhou(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PoC: Add condition variable support to WaitEventSetWait()
Date: 2026-04-23 09:52:10
Message-ID: 256e0146-eeb4-443d-b2e0-7ee834ef330e@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

23.04.2026 11:15, Chao Li пишет:
>
>
>> On Apr 22, 2026, at 21:13, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>>
>> 22.04.2026 05:58, Xuneng Zhou пишет:
>>> Hi Yura,
>>>
>>> On Wed, Apr 22, 2026 at 1:06 AM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>>>>
>>>> 08.04.2026 11:22, Chao Li пишет:
>>>>>> On Apr 8, 2026, at 11:50, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>>>>> On Apr 2, 2026, at 15:38, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>>>>>> On Mar 31, 2026, at 16:59, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>>>>>>> On Mar 31, 2026, at 15:28, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> There is an XXX comment in WalSndWait():
>>>>>>>>> ```
>>>>>>>>> * XXX: A desirable future improvement would be to add support for CVs
>>>>>>>>> * into WaitEventSetWait().
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> I have been exploring a possible approach for that. This patch is a PoC that adds ConditionVariable support to WaitEventSet. This v1 is mainly intended to gather feedback on the design, so I have only done some basic testing so far, such as a normal logical replication workflow.
>>>>>>>>>
>>>>>>>>> I’d like to highlight a few key points about the design:
>>>>>>>>>
>>>>>>>>> 1. In the current WalSndWait(), although it prepares to sleep on a ConditionVariable, it does not actually check whether the CV has been signaled. In this PoC, I kept that same behavior. However, I tried to make the WaitEventSet support for CVs generic, so that if we want to add actual signal checking in the future, that would be possible.
>>>>>>>>>
>>>>>>>>> 2. To keep the design generic, this patch introduces a new wait event type, WL_CONDITION_VARIABLE. A WL_CONDITION_VARIABLE event occupies a position in the event array, similar to latch and socket events. When a CV is signaled, the corresponding WL_CONDITION_VARIABLE event is returned in occurred_events.
>>>>>>>>>
>>>>>>>>> 3. The WaitEventSet APIs AddWaitEventToSet() and ModifyWaitEvent() are extended to support CVs by adding one more parameter “cv" to both APIs. The downside of this approach is that all call sites of these two APIs need to be updated. I also considered adding separate APIs for CVs, such as AddWaitEventToSetForCV() and ModifyWaitEventForCV(), since CVs do not rely on the kernel and it might therefore make sense to decouple them from socket and latch handling. But for v1, I chose the more generic approach. I’d be interested in hearing comments on this part of the design.
>>>>>>>>>
>>>>>>>>> 4. One important point is that this patch extends the WaitEventSet abstraction, not the underlying kernel wait primitives. A ConditionVariable is still a userspace/shared-memory concept, but with this design it can participate in the same waiting framework as sockets and latches. I think that is useful because it allows mixed waits to be handled through one interface.
>>>>>>>>>
>>>>>>>>> Here is the v1 patch.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I just noticed that I missed checking in my last edit when switching to the other branch, so attaching an updated v1.
>>>>>>>
>>>>>>> PFA v2 - fixed a CI failure from contrib/postgres_fdw.
>>>>>>
>>>>>> Fixed a CI test failure and rebased.
>>>>>
>>>>> PFA v4 - try to fix a test failure on windows.
>>>>
>>>> Good day, Chao Li.
>>>>
>>>> ConditionalVariable works through the MyLatch.
>>>> And almost always there is single latch for one backend:
>>>> MyLatch = PGPROC->procLatch;
>>>>
>>>> (Single exception I found is XLogRecoveryCtl->recoveryWakeupLatch.)
>>>>
>>>> And that is where ConditionVariablePrepareToSleep were used in WalSndWait
>>>> but without check: FeBeWaitSet always waits for MyLatch.
>>>>
>>>> Therefore, I don't clear get, how you suggest to simultaneously wait
>>>> WL_LATCH_SET on MyLatch and WL_CONDITION_VARIABLE?
>>>> How you will distinguish which one was fired?
>>>>
>>>> It looks to my, WL_LATCH_SET on MyLatch and WL_CONDITION_VARIABLE had to be
>>>> mutual exclusive. At least unless ConditionVariable internal are changed.
>>>>
>>>> And check after waiting for conditional variable set had to be placed into
>>>> WaitEventSetWaitBlock where all other such checks are.
>>>>
>>>> All written above is just my humble opinion.
>>>> Excuse me if i'm too strict.
>>>>
>>>
>>> I think you're correct that ConditionVariableSignal() wakes the target
>>> process by calling SetLatch(&proc->procLatch). So every CV signal also
>>> sets MyLatch. The patch introduces ConditionVariableSignaled() to
>>> distinguish the two by checking CV wait-list membership (not the latch
>>> state itself), so in principle the two sources are distinguishable.
>>
>> Nope.
>> Patch may say if ConditionVariable was signalled.
>> But it cann't say if MyLatch were set as well without touching
>> ConditionVariable.
>> It is not better than current state in term of code clearness and flawlessness.
>
> Hi Yura,
>
> Thank you for the review. This patch is marked as “PoC”, so nothing has been finalized yet, and I am open to any comments and suggestions.
>
> In theory, ConditionVariable relies on MyLatch, and in most cases a process has only one MyLatch, so your point that WL_LATCH_SET on MyLatch and WL_CONDITION_VARIABLE should ideally be mutually exclusive makes sense.
>
> However, the current code does not really follow that principle. Looking at the current code in WalSndWait():
> ```
> if (wait_event == WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION)
> ConditionVariablePrepareToSleep(&WalSndCtl->wal_confirm_rcv_cv);
> else if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
> ConditionVariablePrepareToSleep(&WalSndCtl->wal_flush_cv);
> else if (MyWalSnd->kind == REPLICATION_KIND_LOGICAL)
> ConditionVariablePrepareToSleep(&WalSndCtl->wal_replay_cv);
>
> if (WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, wait_event) == 1 &&
> (event.events & WL_POSTMASTER_DEATH))
> {
> ConditionVariableCancelSleep();
> proc_exit(1);
> }
>
> ConditionVariableCancelSleep();
> ```
>
> It calls ConditionVariablePrepareToSleep() on a specific CV, so ideally only ConditionVariableSignal() on that specific CV should wake the wait.
>
> However, the current code actually relies on WaitEventSetWait(FeBeWaitSet) waiting on MyLatch, where FeBeWaitSet includes WL_LATCH_SET. So if someone calls SetLatch(the_proc_latch), the wait will still wake up. The code does not check at all which CV was actually signaled.

And looks like it doesn't need to. It seems to me, ConditionVariable is
used just as convenient way to wake up many processes. Just as list of latches.

> With this patch, we would be able to verify whether the CV was signaled. The current PoC does not do that yet, because I simply followed the existing behavior, but with the new WaitEventSetWait() support, it would be easy to add such a check.

This check is just your function ConditionVariableSignaled . This function
is only valuable addition of your patch.

But still: ConditionVariable has no value by itself. It is just a way to
notify that some condition MAY BE changed. One had to recheck condition it
waits any way, because ConditionVariable could be awaken spontanously. If
you read comments in ConditionVariableBroadcast, you will see, it is
totally legal to signal "one more backend" if several broadcasts are
performed simultaneously.

> From that perspective, I think the PoC is already better than the current code.

I disagree. It is my personal opinion.

> As for your question, “How will you distinguish which one was fired?”: when FeBeWaitSet includes both WL_LATCH_SET and WL_CONDITION_VARIABLE, and MyLatch is used for WL_LATCH_SET, then if the CV is signaled, WL_LATCH_SET may also fire as a side effect. I think that would be okay. Do you have a use case where that would lead to a problem?

If you detect ConditionVariable was fired, how you will distinguish, was
MyLatch set separately from ConditionVariable or not?
In current state of your patch there is no way.
So, WL_LATCH_SET | WL_CONDITIONAL_VARIABLE become meaningless.

> Again, this PoC version is still far from the final version. Any discussion is very welcome.

So we discuss.

I repeat: ConditionVariable by itself is meaningless. It exists to signal
about probably changed other condition.
Therefore, WL_LATCH_SET + ConditionVariableSignaled() is more than enough,
imho.
I still don't see need in WL_CONDITION_VARIABLE.
And the place you did patch for is single and not representative.

If you find more places where it could be useful, then it will be clearer
which way API should look like and which semantic it should implement.

imho. I could be wrong.

--
regards
Yura Sokolov aka funny-falcon

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2026-04-23 10:04:35 Re: Parallel Apply
Previous Message Álvaro Herrera 2026-04-23 09:38:27 Re: Get rid of translation strings that only contain punctuation