Re: Optimize LISTEN/NOTIFY

From: Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize LISTEN/NOTIFY
Date: 2025-10-18 16:41:46
Message-ID: CAE7r3M+TKWcvBjHZnj-_TVyUyD7G2v-WQDxQGPmu_dMcinWR0Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 16, 2025 at 12:39 PM Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> On Wed, Oct 15, 2025, at 16:16, Tom Lane wrote:
> > Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com> writes:
> >> I think "Direct advancement" is a good idea. But the way it's
> >> implemented now has a concurrency bug. Listeners store its current
> >> position in the local variable 'pos' during the reading in
> >> asyncQueueReadAllNotifications() and don't hold NotifyQueueLock. It
> >> means that some notifier can directly advance the listener's position
> >> while the listener has an old value in the local variable. The same
> >> time we use listener positions to find out the limit we can truncate
> >> the queue in asyncQueueAdvanceTail().
> >
> > Good catch!
>
> I've implemented the three ideas presented below, attached as .txt files
> that are diffs on top of v19, which has these changes since v17:
>

Thank you for the new version and all implementations!

> 0002-optimize_listen_notify-v19.patch:
> * Improve wording of top comment per request from Chao Li.
> * Add initChannelHash call to top of SignalBackends,
> to fix bug reported by Arseniy Mukhin.
>
> > I think we can perhaps salvage the idea if we invent a separate
> > "advisory" queue position field, which tells its backend "hey,
> > you could skip as far as here if you want", but is not used for
> > purposes of SLRU truncation.
>
> Above idea is implemented in 0002-optimize_listen_notify-v19-alt1.txt

pos = QUEUE_BACKEND_POS(i);

/* Direct advancement for idle backends at the old head */
if (pendingNotifies != NULL &&
QUEUE_POS_EQUAL(pos, queueHeadBeforeWrite))
{
QUEUE_BACKEND_ADVISORY_POS(i) = queueHeadAfterWrite;

If we have several notifying backends, it looks like only the first
one will be able to do direct advancement here. Next notifying backend
will fail on QUEUE_POS_EQUAL(pos, queueHeadBeforeWrite) as we don't
wake up the listener and pos will be the same as it was for the first
notifying backend. It seems that to accumulate direct advancement from
several notifying backends we need to compare queueHeadBeforeWrite
with advisoryPos here. And we also need to advance advisoryPos to the
listener's position after reading if advisoryPos falls behind.

Minute of brainstorming

I also thought about a workload that probably frequently can be met.
Let's say we have sequence of notifications:

F F F T F F F T F F F T

Here F - notification from the channel we don't care about and T - the opposite.
It seems that after the first 'T' notification it will be more
difficult for notifying backends to do 'direct advancement' as there
will be some lag before the listener reads the notification and
advances its position. Not sure if it's a problem, probably it depends
on the intensity of notifications. But maybe we can use a bit more
sophisticated data structure here? Something like a list of skip
ranges. Every entry in the list is the range (pos1, pos2) that the
listener can skip during the reading. So instead of advancing
advisoryPos every notifying backend should add skip range to the list.
Notifying backends can merge neighbour ranges (pos1, pos2) & (pos2,
pos3) -> (pos1, pos3). We also can limit the number of entries to 5
for example. Listeners on their side should clear the list before
reading and skip all ranges from it. What do you think? Is it
overkill?

>
> > Alternatively, split the queue pos
> > into "this is where to read next" and "this is as much as I'm
> > definitively done with", where the second field gets advanced at
> > the end of asyncQueueReadAllNotifications. Not sure which
> > view would be less confusing (in the end I guess they're nearly
> > the same thing, differently explained).
>
> Above idea is implemented in 0002-optimize_listen_notify-v19-alt2.txt
>

IMHO it's a little bit more confusing than the first option. Two
points I noticed:

1) We have a fast path in asyncQueueReadAllNotifications()

if (QUEUE_POS_EQUAL(pos, head))
{
/* Nothing to do, we have read all notifications already. */
return;
}

Should we update donePos here? It looks like donePos may never be
updated without it.

2) In SignalBackends()

/* Signal backends that have fallen too far behind */
lag = asyncQueuePageDiff(QUEUE_POS_PAGE(QUEUE_HEAD),
QUEUE_POS_PAGE(pos));

if (lag >= QUEUE_CLEANUP_DELAY)
{
pid = QUEUE_BACKEND_PID(i);
Assert(pid != InvalidPid);

QUEUE_BACKEND_WAKEUP_PENDING(i) = true;
pids[count] = pid;
procnos[count] = i;
count++;
}

Should we use donePos here as it is responsible for queue truncation now?

> > A different line of thought could be to get rid of
> > asyncQueueReadAllNotifications's optimization of moving the
> > queue pos only once, per
> >
> > * (We could alternatively retake NotifyQueueLock and move the position
> > * before handling each individual message, but that seems like too much
> > * lock traffic.)
> >
> > Since we only need shared lock to advance our own queue pos,
> > maybe that wouldn't be too awful. Not sure.
>
> Above idea is implemented in 0002-optimize_listen_notify-v19-alt3.txt
>

Hmm, it seems we still have the race when in the beginning of
asyncQueueReadAllNotifications we read pos into the local variable and
release the lock. IIUC to avoid the race without introducing another
field here, the listener needs to hold the lock until it updates its
position so that the notifying backend cannot change it concurrently.

Best regards,
Arseniy Mukhin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Pegoraro 2025-10-18 17:01:05 Re: [PATCH] Add pg_get_trigger_ddl() to retrieve the CREATE TRIGGER statement
Previous Message Srinath Reddy Sadipiralla 2025-10-18 16:04:55 Re: Making pg_rewind faster