From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | "Thomas Munro" <thomas(dot)munro(at)gmail(dot)com>, "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi>, "Rishu Bagga" <rishu(dot)postgres(at)gmail(dot)com> |
Subject: | Re: Optimize LISTEN/NOTIFY |
Date: | 2025-07-23 01:39:30 |
Message-ID: | 30c2aa7d-dd6c-4b68-a2e4-f217a1a34acf@app.fastmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 17, 2025, at 09:43, Joel Jacobson wrote:
> On Wed, Jul 16, 2025, at 02:20, Rishu Bagga wrote:
>> If we are doing this optimization, why not maintain a list of backends
>> for each channel, and only wake up those channels?
>
> Thanks for a contributing a great idea, it actually turned out to work
> really well in practice!
>
> The attached new v4 of the patch implements your multicast idea:
Hi hackers,
While my previous attempts of $subject has only focused on optimizing
the multi-channel scenario, I thought it would be really nice if
LISTEN/NOTIFY could be optimize in the general case, benefiting all
users, including those who just listen on a single channel.
To my surprise, this was not only possible, but actually quite simple.
The main idea in this patch, is to introduce an atomic state machine,
with three states, IDLE, SIGNALLED, and PROCESSED, so that we don't
interrupt backends that are already in the process of catching up.
Thanks to Thomas Munro for making me aware of his, Heikki Linnakanga's
and others work in the "Interrupts vs signals" [1] thread.
Maybe my patch is redundant due to their patch set, I'm not really sure?
Their patch seems to refactors the underlying wakeup mechanism. It
replaces the old, complex chain of events (SIGUSR1 signal -> handler ->
flag -> latch) with a single, direct function call: SendInterrupt(). For
async.c, this seems to be a low-level plumbing change that simplifies
how a notification wakeup is delivered.
My patch optimizes the high-level notification protocol. It introduces a
state machine (IDLE, SIGNALLED, PROCESSING) to only signal backends when
needed.
In their patch, in asyn.c's SignalBackends(), they do
SendInterrupt(INTERRUPT_ASYNC_NOTIFY, procno) instead of
SendProcSignal(pid, PROCSIG_NOTIFY_INTERRUPT, procnos[i]). They don't
seem to check if the backend is already signalled or not, but maybe
SendInterrupt() has signal coalescing built-in so it would be a noop
with almost no cost?
I'm happy to rebase my LISTEN/NOTIFY work on top of [1], but I could
also see benefits of doing the opposite.
I'm also happy to help with benchmarking of your work in [1].
Note that this patch doesn't contain the hash table to keep track of
listeners per backend, as proposed in earlier patches. I will propose
such a patch again later, but first we need to figure out if I should
rebase onto [1] or master (HEAD).
--- PATCH ---
Optimize NOTIFY signaling to avoid redundant backend signals
Previously, a NOTIFY would send SIGUSR1 to all listening backends, which
could lead to a "thundering herd" of redundant signals under high
traffic. To address this inefficiency, this patch replaces the simple
volatile notifyInterruptPending flag with a per-backend atomic state
machine, stored in asyncQueueControl->backend[i].state. This state
variable can be in one of three states: IDLE (awaiting signal),
SIGNALLED (signal received, work pending), or PROCESSING (actively
reading the queue).
From the notifier's perspective, SignalBackends now uses an atomic
compare-and-swap (CAS) to transition a listener from IDLE to SIGNALLED.
Only on a successful transition is a signal sent. If the listener is
already SIGNALLED or another notifier wins the race, no redundant signal
is sent. If the listener is in the PROCESSING state, the notifier will
also transition it to SIGNALLED to ensure the listener re-scans the
queue after its current work is done.
On the listener side, ProcessIncomingNotify first transitions its state
from SIGNALLED to PROCESSING. After reading notifications, it attempts
to transition from PROCESSING back to IDLE. If this CAS fails, it means
a new notification arrived during processing and a notifier has already
set the state back to SIGNALLED. The listener then simply re-latches
itself to process the new notifications, avoiding a tight loop.
The primary benefit is a significant reduction in syscall overhead and
unnecessary kernel wakeups in high-traffic scenarios. This dramatically
improves performance for workloads with many concurrent notifiers.
Benchmarks show a substantial increase in NOTIFY-only transaction
throughput, with gains exceeding 200% at higher
concurrency levels.
src/backend/commands/async.c | 209 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------
src/backend/tcop/postgres.c | 4 ++--
src/include/commands/async.h | 4 +++-
3 files changed, 185 insertions(+), 32 deletions(-)
--- BENCHMARK ---
The attached benchmark script does LISTEN on one connection,
and then uses pgbench to send NOTIFY on a varying number of
connections and jobs, to cause a high procsignal load.
I've run the benchmark on my MacBook Pro M3 Max,
10 seconds per run, 3 runs.
(I reused the same benchmark script as in the other thread, "Optimize ProcSignal to avoid redundant SIGUSR1 signals")
Connections=Jobs | TPS (master) | TPS (patch) | Relative Diff (%) | StdDev (master) | StdDev (patch)
------------------+--------------+-------------+-------------------+-----------------+----------------
1 | 118833 | 151510 | 27.50% | 484 | 923
2 | 156005 | 239051 | 53.23% | 3145 | 1596
4 | 177351 | 250910 | 41.48% | 4305 | 4891
8 | 116597 | 171944 | 47.47% | 1549 | 2752
16 | 40835 | 165482 | 305.25% | 2695 | 2825
32 | 37940 | 145150 | 282.58% | 2533 | 1566
64 | 35495 | 131836 | 271.42% | 1837 | 573
128 | 40193 | 121333 | 201.88% | 2254 | 874
(8 rows)
/Joel
Attachment | Content-Type | Size |
---|---|---|
0001-Optimize-NOTIFY-signaling-to-avoid-redundant-backend.patch | application/octet-stream | 14.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Japin Li | 2025-07-23 01:45:08 | Re: Re-archive the WAL on standby with archive_mode=always? |
Previous Message | Mircea Cadariu | 2025-07-23 01:35:55 | Re: Metadata and record block access stats for indexes |