From: | Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com> |
---|---|
To: | Joel Jacobson <joel(at)compiler(dot)org> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimize LISTEN/NOTIFY |
Date: | 2025-10-20 16:43:27 |
Message-ID: | CAE7r3MLivh1sHWF06hrVXkiQbw-KChPcQsh+9CheXprm5vRVMQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 20, 2025 at 1:07 AM Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> > Minute of brainstorming
> >
> > I also thought about a workload that probably frequently can be met.
> > Let's say we have sequence of notifications:
> >
> > F F F T F F F T F F F T
> >
> > Here F - notification from the channel we don't care about and T - the opposite.
> > It seems that after the first 'T' notification it will be more
> > difficult for notifying backends to do 'direct advancement' as there
> > will be some lag before the listener reads the notification and
> > advances its position. Not sure if it's a problem, probably it depends
> > on the intensity of notifications.
>
> Hmm, I realize both the advisoryPos and donePos ideas share a problem;
> they both require listening backends to wakeup eventually anyway,
> just to advance the 'pos'.
>
> The holy grail would be to avoid this context switching cost entirely,
> and only need to wakeup listening backends when they are actually
> interested in the queued notifications. I think the third idea,
> alt3, is most promising in achieving this goal.
>
Yeah, it would be great.
> > But maybe we can use a bit more
> > sophisticated data structure here? Something like a list of skip
> > ranges. Every entry in the list is the range (pos1, pos2) that the
> > listener can skip during the reading. So instead of advancing
> > advisoryPos every notifying backend should add skip range to the list.
> > Notifying backends can merge neighbour ranges (pos1, pos2) & (pos2,
> > pos3) -> (pos1, pos3). We also can limit the number of entries to 5
> > for example. Listeners on their side should clear the list before
> > reading and skip all ranges from it. What do you think? Is it
> > overkill?
>
> Hmm, maybe, but I'm a bit wary about too much complication.
> Hopefully there is a simpler solution that avoids the need for this,
> but sure, if we can't find one, then I'm positive to try this skip ranges idea.
>
Yes, and it's probably worth doing a benchmarking to see if it's a
problem at all before implementing anything.
> >> > A different line of thought could be to get rid of
> >> > asyncQueueReadAllNotifications's optimization of moving the
> >> > queue pos only once, per
> >> >
> >> > * (We could alternatively retake NotifyQueueLock and move the position
> >> > * before handling each individual message, but that seems like too much
> >> > * lock traffic.)
> >> >
> >> > Since we only need shared lock to advance our own queue pos,
> >> > maybe that wouldn't be too awful. Not sure.
> >>
> >> Above idea is implemented in 0002-optimize_listen_notify-v19-alt3.txt
> >>
> >
> > Hmm, it seems we still have the race when in the beginning of
> > asyncQueueReadAllNotifications we read pos into the local variable and
> > release the lock. IIUC to avoid the race without introducing another
> > field here, the listener needs to hold the lock until it updates its
> > position so that the notifying backend cannot change it concurrently.
>
> *** 0002-optimize_listen_notify-v20-alt3.txt:
>
> * Fixed; the shared 'pos' is now only updated if the new position is ahead.
>
I managed to reproduce the race with v20-alt3. I tried to write a TAP
test reproducing the issue, so it was easier to validate changes.
Please find the attached TAP test. I added it to some random package
for simplicity.
Best regards,
Arseniy Mukhin
Attachment | Content-Type | Size |
---|---|---|
0001-TAP-test-with-listener-pos-race.patch.nocfbot | application/octet-stream | 5.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-10-20 16:46:27 | Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats() |
Previous Message | Tom Lane | 2025-10-20 16:27:58 | Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master |