Re: Optimize LISTEN/NOTIFY

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize LISTEN/NOTIFY
Date: 2025-09-30 18:56:11
Message-ID: cad78ff4-aae9-4c1c-8e16-ff0ccd4957f1@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 29, 2025, at 04:33, Chao Li wrote:
> I never had a concern about using the timeout mechanism. My comment was
> about enabling timeout duplicately.

Thanks for reviewing. However, like said in my previous email, I'm
sorry, but don't believe in my suggested throughput/latency approach. I
unfortunately managed to derail from the IMO more promising approaches I
worked on initially.

What I couldn't find a solution to then, was the problem of possibly
ending up in a situation where some lagging backends would never catch
up.

In this new patch, I've simply introduced a new bgworker, given the
specific task of kicking lagging backends. I wish of course we could do
without the bgworker, but I don't see how that would be possible.

---

optimize_listen_notify-v5.patch:

Fix LISTEN/NOTIFY so it scales with idle listening backends

Currently, idle listening backends cause a dramatic slowdown due to
context switching when they are signaled and wake up. This is wasteful
when they are not listening to the channel being notified.

Just 10 extra idle listening connections cause a slowdown from 8700 TPS
to 6100 TPS, 100 extra cause it to drop to 2000 TPS, and at 1000 extra
it falls to 250 TPS.

To improve scalability with the number of idle listening backends, this
patch introduces a shared hash table to keep track of channels per
listening backend. This hash table is partitioned to reduce contention
on concurrent LISTEN/UNLISTEN operations.

We keep track of up to NOTIFY_MULTICAST_THRESHOLD (16) listeners per
channel. Benchmarks indicated diminishing gains above this level.
Setting it lower seems unnecessary, so a constant seemed fine; a GUC did
not seem motivated.

This patch also adds a wakeup_pending flag to each backend's queue
status to avoid redundant signaling when a wakeup is already pending as
the backend is signaled again. The flag is set when a backend is
signaled and cleared before processing the queue. This order is
important to ensure correctness.

It was also necessary to add a new bgworker, notify_bgworker, whose sole
responsibility is to wake up lagging listening backends, ensuring they
are kicked when they are about to fall too far behind. This bgworker is
always started at postmaster startup, but is only activated upon NOTIFY
by signaling it, unless it is already active. The notify_bgworker
staggers the signaling of lagging listening backends by sleeping 100 ms
between each signal, to prevent the thundering herd problem we would
otherwise get if all listening backends woke up at the same time. It
loops until there are no more lagging listening backends, and then
becomes inactive.

/Joel

Attachment Content-Type Size
optimize_listen_notify-v5.patch application/octet-stream 47.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-09-30 19:32:09 Re: Fixing MSVC's inability to detect elog(ERROR) does not return
Previous Message Paul A Jungwirth 2025-09-30 18:01:55 Re: Align tests for stored and virtual generated columns