Re: Optimize LISTEN/NOTIFY

From: Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize LISTEN/NOTIFY
Date: 2025-10-20 16:43:27
Message-ID: CAE7r3MLivh1sHWF06hrVXkiQbw-KChPcQsh+9CheXprm5vRVMQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 20, 2025 at 1:07 AM Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> > Minute of brainstorming
> >
> > I also thought about a workload that probably frequently can be met.
> > Let's say we have sequence of notifications:
> >
> > F F F T F F F T F F F T
> >
> > Here F - notification from the channel we don't care about and T - the opposite.
> > It seems that after the first 'T' notification it will be more
> > difficult for notifying backends to do 'direct advancement' as there
> > will be some lag before the listener reads the notification and
> > advances its position. Not sure if it's a problem, probably it depends
> > on the intensity of notifications.
>
> Hmm, I realize both the advisoryPos and donePos ideas share a problem;
> they both require listening backends to wakeup eventually anyway,
> just to advance the 'pos'.
>
> The holy grail would be to avoid this context switching cost entirely,
> and only need to wakeup listening backends when they are actually
> interested in the queued notifications. I think the third idea,
> alt3, is most promising in achieving this goal.
>

Yeah, it would be great.

> > But maybe we can use a bit more
> > sophisticated data structure here? Something like a list of skip
> > ranges. Every entry in the list is the range (pos1, pos2) that the
> > listener can skip during the reading. So instead of advancing
> > advisoryPos every notifying backend should add skip range to the list.
> > Notifying backends can merge neighbour ranges (pos1, pos2) & (pos2,
> > pos3) -> (pos1, pos3). We also can limit the number of entries to 5
> > for example. Listeners on their side should clear the list before
> > reading and skip all ranges from it. What do you think? Is it
> > overkill?
>
> Hmm, maybe, but I'm a bit wary about too much complication.
> Hopefully there is a simpler solution that avoids the need for this,
> but sure, if we can't find one, then I'm positive to try this skip ranges idea.
>

Yes, and it's probably worth doing a benchmarking to see if it's a
problem at all before implementing anything.

> >> > A different line of thought could be to get rid of
> >> > asyncQueueReadAllNotifications's optimization of moving the
> >> > queue pos only once, per
> >> >
> >> > * (We could alternatively retake NotifyQueueLock and move the position
> >> > * before handling each individual message, but that seems like too much
> >> > * lock traffic.)
> >> >
> >> > Since we only need shared lock to advance our own queue pos,
> >> > maybe that wouldn't be too awful. Not sure.
> >>
> >> Above idea is implemented in 0002-optimize_listen_notify-v19-alt3.txt
> >>
> >
> > Hmm, it seems we still have the race when in the beginning of
> > asyncQueueReadAllNotifications we read pos into the local variable and
> > release the lock. IIUC to avoid the race without introducing another
> > field here, the listener needs to hold the lock until it updates its
> > position so that the notifying backend cannot change it concurrently.
>
> *** 0002-optimize_listen_notify-v20-alt3.txt:
>
> * Fixed; the shared 'pos' is now only updated if the new position is ahead.
>

I managed to reproduce the race with v20-alt3. I tried to write a TAP
test reproducing the issue, so it was easier to validate changes.
Please find the attached TAP test. I added it to some random package
for simplicity.

Best regards,
Arseniy Mukhin

Attachment Content-Type Size
0001-TAP-test-with-listener-pos-race.patch.nocfbot application/octet-stream 5.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-10-20 16:46:27 Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats()
Previous Message Tom Lane 2025-10-20 16:27:58 Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master