Re: Optimize LISTEN/NOTIFY

From: Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize LISTEN/NOTIFY
Date: 2025-10-23 10:02:49
Message-ID: CAE7r3MK-3AOdh1mpZ8hw9h6F_i0D5RMoAy7CttnfCJRpB8GJDA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Oct 23, 2025 at 11:17 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
> > On Oct 21, 2025, at 00:43, Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com> wrote:
> >
> >
> > I managed to reproduce the race with v20-alt3. I tried to write a TAP
> > test reproducing the issue, so it was easier to validate changes.
> > Please find the attached TAP test. I added it to some random package
> > for simplicity.
> >
>
> With alt3, as we have acquired the notification lock after reading every message to update the POS, I think we can do a little bit more optimization:
>
> The notifier: in SignalBackend()
> * Now we check if a listener’s pos equals to beforeWritePos, then we do “directly advancement”
> * We can change to if a listener’s pos is between beforeWritePos and afterWritePos, then we can do the advancement.
>
> The listener: in asyncQueueReadAllNotifications():
> * With alt3, we only lock and update pos
> * We can do more. If current pos in shared memory is after that local pos, then meaning some notifier has done an advancement, so it can stop reading.
>

I think this would be a reasonable optimization if it weren't for the
race condition mentioned above. The problem is that if the local pos
lags behind the shared memory pos, it could point to a truncated queue
segment, so we shouldn't allow that.

> I tried to run your TAP test on my MacBook, but failed:
>
> ```
> t/008_listen-pos-race.pl .. Dubious, test returned 32 (wstat 8192, 0x2000)
> No subtests run
>
> Test Summary Report
> -------------------
> t/008_listen-pos-race.pl (Wstat: 8192 (exited 32) Tests: 0 Failed: 0)
> Non-zero exit status: 32
> Parse errors: No plan found in TAP output
> Files=1, Tests=0, 3 wallclock secs ( 0.01 usr 0.01 sys + 0.10 cusr 0.29 csys = 0.41 CPU)
> Result: FAIL
> ```
>
> I didn’t spend time debugging the problem. If you can figure the problem, maybe I can run the test from my side.
>

Thank you for trying the test. I think the test works for you as
expected, it should fail with error and I have the same error status.
Sorry, I failed to realize it could be confusing, probably it was
better to fail on some assert instead, but I thought error is enough
for temp reproducer. Please see 008_listen-pos-race_test.log for
details.

Best regards,
Arseniy Mukhin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2025-10-23 10:07:23 Re: Question about InvalidatePossiblyObsoleteSlot()
Previous Message Greg Sabino Mullane 2025-10-23 09:57:19 Re: POC: Carefully exposing information without authentication