Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Rishu Bagga" <rishu(dot)postgres(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "nik(at)postgres(dot)ai" <nik(at)postgres(dot)ai>
Subject: Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput
Date: 2025-07-19 15:35:25
Message-ID: ab1b986a-8ae2-469e-a680-11c1ce8fd4e8@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 18, 2025, at 19:06, Tom Lane wrote:
> "Joel Jacobson" <joel(at)compiler(dot)org> writes:
>> My patch improves NOTIFY TPS when many backends are listening on multiple
>> channels by eliminating unnecessary syscall wake‑ups, but it doesn't increase
>> the internal parallelism of the NOTIFY queue itself.
>
> After thinking about this for awhile, I have a rough idea of
> something we could do to improve parallelism of NOTIFY.
> As a bonus, this'd allow processes on hot standby servers to
> receive NOTIFYs from processes on the primary, which is a
> feature many have asked for.

LISTEN/NOTIFY on standbys sounds interesting on its own.

However, I don't think reducing the time we hold the exclusive lock,
would do anything at all, to help the users who have been
reporting problems with LISTEN/NOTIFY. I'll explain why I think so.

I assume Rishu in his original post, with "renewed attention"
was referring to the post "Postgres LISTEN/NOTIFY does not scale" [1]
that was on the front page of Hacker News with 319 comments [2].

I think the reported "still waiting for AccessExclusiveLock"
they saw in the logs, is probably just a *symptom* but not the *cause*
of their problems.

Unfortunately, the author of [1] jumped to conclusion and assumed
the global lock was the problem. I'm quite sure it is probably not,
because:

We know for sure, that current users do LISTEN and NOTIFY
in the same database. And there is no point in doing NOTIFY
unless you also do LISTEN.

Their plots show an y-axis with a few hundred "active sessions".
If we assume at least ~100 of them would be listening backends,
that would explain their problems, due to the syscall thundering
herd wake-up bomb, that each NOTIFY currently causes.

So instead of saying
"Postgres LISTEN/NOTIFY does not scale",
like in the article [1], I think it would be much more fair and meaningful to say
"Postgres LISTEN/NOTIFY does not scale, with the number of listening backends".

All my benchmarks support this hypothesis. I've already posted a lot of them,
but can of course provide more specific additional benchmarks if desired.

/Joel

[1] https://www.recall.ai/blog/postgres-listen-notify-does-not-scale
[2] https://news.ycombinator.com/item?id=44490510

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-07-19 15:36:15 Re: Upgrade from Fedora 40 to Fedora 42, or from PostgreSQL 16.3 to PostgreSQL 16.9
Previous Message Tom Lane 2025-07-19 15:27:51 Re: Improve error reporting in 027_stream_regress test