Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Rishu Bagga <rishu(dot)postgres(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joel Jacobson <joel(at)compiler(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "nik(at)postgres(dot)ai" <nik(at)postgres(dot)ai>
Subject: Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput
Date: 2025-09-10 23:30:56
Message-ID: CAD21AoDgzr-jQcn=rHFdTQccweaszyK6ur=w4k4qR1bidN8=Ew@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 4, 2025 at 3:53 PM Rishu Bagga <rishu(dot)postgres(at)gmail(dot)com> wrote:
>
> On Fri, Jul 18, 2025 at 10:06 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > After thinking about this for awhile, I have a rough idea of
> > something we could do to improve parallelism of NOTIFY.
> > As a bonus, this'd allow processes on hot standby servers to
> > receive NOTIFYs from processes on the primary, which is a
> > feature many have asked for.
> >
> > The core thought here was to steal some implementation ideas
> > from two-phase commit. I initially thought maybe we could
> > remove the SLRU queue entirely, and maybe we can still find
> > a way to do that, but in this sketch it's still there with
> > substantially reduced traffic.
> >
> > The idea basically is to use the WAL log rather than SLRU
> > as transport for notify messages.
> >
> > 1. In PreCommit_Notify(), gather up all the notifications this
> > transaction has emitted, and write them into a WAL log message.
> > Remember the LSN of this message. (I think this part should be
> > parallelizable, because of work that's previously been done to
> > allow parallel writes to WAL.)
> >
> > 2. When writing the transaction's commit WAL log entry, include
> > the LSN of the previous notify-data entry.
> >
> > 3. Concurrently with writing the commit entry, send a message
> > to the notify SLRU queue. This would be a small fixed-size
> > message with the transaction's XID, database ID, and the LSN
> > of the notify-data WAL entry. (The DBID is there to let
> > listeners quickly ignore traffic from senders in other DBs.)
> >
> > 4. Signal listening backends to check the queue, as we do now.
> >
> > 5. Listeners read the SLRU queue and then, if in same database,
> > pull the notify data out of the WAL log. (I believe we already
> > have enough infrastructure to make that cheap, because 2-phase
> > commit does it too.)
> >
> > In the simplest implementation of this idea, step 3 would still
> > require a global lock, to ensure that SLRU entries are made in
> > commit order. However, that lock only needs to be held for the
> > duration of step 3, which is much shorter than what happens now.
>
> Attached is an initial patch that implements this idea.
>
> There is still some
> work to be done around how to handle truncation / vacuum for the new
> approach, and testing replication of notifications onto a reader instance.
>
> That being said, I ran some basic benchmarking to stress concurrent
> notifications.
>
> With the following sql script, I ran
> pgbench -T 100 -c 100 -j 8 -f pgbench_transaction_notify.sql -d postgres
>
> BEGIN;
> INSERT INTO test VALUES(1);
> NOTIFY benchmark_channel, 'transaction_completed';
> COMMIT;
>
> With the patch 3 runs showed the following TPS:
>
> tps = 66372.705917
> tps = 63445.909465
> tps = 64412.544339
>
> Without the patch, we got the following TPS:
>
> tps = 30212.390982
> tps = 30908.865812
> tps = 29191.388601
>
> So, there is about a 2x increase in TPS at 100 connections, which establishes
> some promise in the approach.

Looks promising improvement.

>
> Additionally, this would help solve the issue being discussed in a
> separate thread [1],
> where listeners currently rely on the transaction log to verify if a
> transaction that it reads
> has indeed committed, but it is possible that the portion of the
> transaction log has
> been truncated by vacuum.

With your patch, since the backends get the notification by reading
WAL records do we need to prevent WAL records that potentially have
unconsumed notification from being removed by the checkpointer? Or we
can save unconsumed notifications in WAL records to somewhere during
the checkpoint as we do for 2PC transactions.

Also, could you add this patch to the next commit fest[1] if not yet?

Regards,

[1] https://commitfest.postgresql.org/56/

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-09-10 23:38:44 Re: Only one version can be installed when using extension_control_path
Previous Message Michael Paquier 2025-09-10 23:28:35 Re: BF mamba failure