| From: | Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Joel Jacobson <joel(at)compiler(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Optimize LISTEN/NOTIFY |
| Date: | 2025-10-23 10:02:49 |
| Message-ID: | CAE7r3MK-3AOdh1mpZ8hw9h6F_i0D5RMoAy7CttnfCJRpB8GJDA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Thu, Oct 23, 2025 at 11:17 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
> > On Oct 21, 2025, at 00:43, Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com> wrote:
> >
> >
> > I managed to reproduce the race with v20-alt3. I tried to write a TAP
> > test reproducing the issue, so it was easier to validate changes.
> > Please find the attached TAP test. I added it to some random package
> > for simplicity.
> >
>
> With alt3, as we have acquired the notification lock after reading every message to update the POS, I think we can do a little bit more optimization:
>
> The notifier: in SignalBackend()
> * Now we check if a listener’s pos equals to beforeWritePos, then we do “directly advancement”
> * We can change to if a listener’s pos is between beforeWritePos and afterWritePos, then we can do the advancement.
>
> The listener: in asyncQueueReadAllNotifications():
> * With alt3, we only lock and update pos
> * We can do more. If current pos in shared memory is after that local pos, then meaning some notifier has done an advancement, so it can stop reading.
>
I think this would be a reasonable optimization if it weren't for the
race condition mentioned above. The problem is that if the local pos
lags behind the shared memory pos, it could point to a truncated queue
segment, so we shouldn't allow that.
> I tried to run your TAP test on my MacBook, but failed:
>
> ```
> t/008_listen-pos-race.pl .. Dubious, test returned 32 (wstat 8192, 0x2000)
> No subtests run
>
> Test Summary Report
> -------------------
> t/008_listen-pos-race.pl (Wstat: 8192 (exited 32) Tests: 0 Failed: 0)
> Non-zero exit status: 32
> Parse errors: No plan found in TAP output
> Files=1, Tests=0, 3 wallclock secs ( 0.01 usr 0.01 sys + 0.10 cusr 0.29 csys = 0.41 CPU)
> Result: FAIL
> ```
>
> I didn’t spend time debugging the problem. If you can figure the problem, maybe I can run the test from my side.
>
Thank you for trying the test. I think the test works for you as
expected, it should fail with error and I have the same error status.
Sorry, I failed to realize it could be confusing, probably it was
better to fail on some assert instead, but I thought error is enough
for temp reproducer. Please see 008_listen-pos-race_test.log for
details.
Best regards,
Arseniy Mukhin
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bertrand Drouvot | 2025-10-23 10:07:23 | Re: Question about InvalidatePossiblyObsoleteSlot() |
| Previous Message | Greg Sabino Mullane | 2025-10-23 09:57:19 | Re: POC: Carefully exposing information without authentication |