| From: | Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com> |
|---|---|
| To: | Joel Jacobson <joel(at)compiler(dot)org> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [PATCH] Fix LISTEN startup race with direct advancement |
| Date: | 2026-05-20 11:01:52 |
| Message-ID: | CAE7r3ML+6-pokm+GHsfr+pJyZ_RUi=VY8Bp=qKx=3tcYunborA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, May 19, 2026 at 11:39 PM Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> Hi hackers,
>
> I had another pass over the async.c rework committed in 282b1cd,
> and found a race that can cause a notification committed after
> the listener registered its queue position to be missed entirely.
>
> This can happen in the small time window between PreCommit_Notify(),
> where the first LISTEN registers the backend and records its queue
> position, and AtCommit_Notify(), where the staged listen action is
> made active in the shared channel map by setting listening = true.
>
> If a concurrent NOTIFY commits in that window, SignalBackends() can
> see the staged listener entry with listening = false and conclude that
> the backend is not interested in the channel. With direct advancement,
> that can move the backend's queue pointer past the notification instead
> of waking it.
>
> This is distinct from the documented LISTEN startup race in
> listen.sgml. The documented race can produce false positives: after
> LISTEN returns, an application may receive a notification for work
> already observed by its initial database scan. That is harmless. This
> race is a false negative: a notification can be missed entirely.
>
> The fix is just to treat staged LISTEN entries as possible listeners
> when deciding whom to wake:
>
> ```diff
> - if (!listeners[j].listening)
> - continue; /* ignore not-yet-committed listeners */
> ```
>
> The attached patches split the report into tests and fix:
>
> 0001 Test missed LISTEN startup notification
> 0002 Test LISTEN startup notification for already-seen work
> 0003 Fix LISTEN startup race with direct advancement
>
> /Joel
Hi,
Thank you for the fix! I tried the test and managed to reproduce it. I
think what is definitely wrong here is that we can do direct
advancement over messages that the listener is interested in, when it
has set the position already. So +1 for the fix. I think the fix is
correct and it's IIUC really harmless as the worst thing that can
happen is an unnecessary signal.
One point - looks like the 0003 contains the same test as 0001.
Also while I was trying to understand the issue, another bad schedule
came to my mind: it's basically the same scenario as you reproduced,
but now we have a message in between the listener's position and
notification that the listener is interested in. In this case the
notifier can't do direct advancement, so the message is not lost, but
the notifier still doesn't signal about a new message, so the listener
doesn't know that there is something to read until something triggers
it (like another message). PFA the reproducer (there is a schedule
diagram in the head of the file). It depends on 0001. Your patch fixes
it too.
Thank you!
Best regards,
Arseniy Mukhin
| Attachment | Content-Type | Size |
|---|---|---|
| untriggered-notification.patch.nocfbot | application/octet-stream | 6.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Joel Jacobson | 2026-05-20 11:20:05 | Re: [PATCH] Fix LISTEN startup race with direct advancement |
| Previous Message | Shlok Kyal | 2026-05-20 10:50:34 | Re: Proposal: Conflict log history table for Logical Replication |