Quick Links

Re: Parallel Apply

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ants Aasma <ants(at)cybertec(dot)at>
Subject:	Re: Parallel Apply
Date:	2025-08-13 15:27:02
Message-ID:	aJyuxlqx0-OSuGqC@momjian.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Aug 13, 2025 at 09:50:27AM +0530, Amit Kapila wrote:
> On Tue, Aug 12, 2025 at 10:40 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > Currently, PostgreSQL supports parallel apply only for large streaming
> > > transactions (streaming=parallel). This proposal aims to extend
> > > parallelism to non-streaming transactions, thereby improving
> > > replication performance in workloads dominated by smaller, frequent
> > > transactions.
> >
> > I thought the approach for improving WAL apply speed, for both binary
> > and logical, was pipelining:
> >
> > https://en.wikipedia.org/wiki/Instruction_pipelining
> >
> > rather than trying to do all the steps in parallel.
> >
>
> It is not clear to me how the speed for a mix of dependent and
> independent transactions can be improved using the technique you
> shared as we still need to follow the commit order for dependent
> transactions. Can you please elaborate more on the high-level idea of
> how this technique can be used to improve speed for applying logical
> WAL records?

This blog post from February I think has some good ideas for binary
replication pipelining:

https://www.cybertec-postgresql.com/en/end-of-the-road-for-postgresql-streaming-replication/

Surprisingly, what could be considered the actual replay work
seems to be a minority of the total workload. The largest parts
involve reading WAL and decoding page references from it, followed
by looking up those pages in the cache, and pinning them so they
are not evicted while in use. All of this work could be performed
concurrently with the replay loop. For example, a separate
read-ahead process could handle these tasks, ensuring that the
replay process receives a queue of transaction log records with
associated cache references already pinned, ready for application.

The beauty of the approach is that there is no need for dependency
tracking. I have CC'ed the author, Ants Aasma.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

In response to

Re: Parallel Apply at 2025-08-13 04:20:27 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jacob Champion	2025-08-13 15:30:19	Re: Annoying warning in SerializeClientConnectionInfo
Previous Message	Jingtang Zhang	2025-08-13 15:22:34	Re: Possible inaccurate description of wal_compression in docs