Re: [WIP] Pipelined Recovery

From: Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Pipelined Recovery
Date: 2026-02-03 07:25:39
Message-ID: CA+UBfakvVoCK+8Jz2qGL=LqLD=ogAccbAgjgyNoNURX-jO982w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

Just found this discussion where Bruce Momjian mentioned about
replication pipelining.

[1]: https://www.postgresql.org/message-id/aJyuxlqx0-OSuGqC%40momjian.us

Thanks
Imran Zaheer

On Fri, Jan 30, 2026 at 11:28 AM Imran Zaheer <imran(dot)zhir(at)gmail(dot)com> wrote:
>
> Hi,
>
> Based on a suggestion by my colleague Ants Aasma, I worked on this
> idea of adding parallelism to the WAL recovery process.
>
> The crux of this idea is to decode the WAL using parallel workers. Now
> the replay process can get the records from the shared memory queue
> directly. This way, we can decrease some CPU load on the recovery process.
>
> Implementing this idea yielded an improvement of around 20% in the
> recovery times, but results may differ based on workloads. I have
> attached some benchmarks for different workloads.
>
> Following are some recovery tests with the default configs. Here p1
> shows pipeline enabled. (db size) is the backup database size on
> which the recovery happens. You can see more detail related to the
> benchmarks in the attached file `recoveries-benchmark-v01`.
>
> elapsed (p0) elapsed (p1) % perf db size
>
> inserts.sql 272s 10ms 197s 570ms 27.37% 480 MB
> updates.sql 177s 420ms 117s 80ms 34.01% 480 MB
> hot-updates.sql 36s 940ms 29s 240ms 20.84% 480 MB
> nonhot.sql 36s 570ms 28s 980ms 20.75% 480 MB
> simple-update 20s 160ms 11s 580ms 42.56% 4913 MB
> tpcb-like 20s 590ms 13s 640ms 33.75% 4913 MB
>
> Similar approach was also suggested by Matthias van de Meent earlier in a
> separate thread [1]. Right now I am using one bgw for decoding and filling
> up the shared message queue, and the redo apply loop simply receives the decoded record
> from the queue. After the redo is finished, the consumer (startup
> process) can request a shutdown from the producer (pipeline bgw)
> before exiting recovery.
>
> This idea can be coupled with another idea of pinning the buffers in
> parallel before the recovery process needs them. This will try to
> parallelize most of the work being done in
> `XLogReadBufferForRedoExtended`. The Redo can simply receive
> the already pinned buffers from a queue, but for implementing
> this, we still need some R&D on that, as IPC and pinning/unpinning of
> buffers across two processes can be tricky.
>
> If someone wants to reproduce the benchmark, they can do so using
> these scripts [2].
>
> Looking forward to your reviews, comments, etc.
>
> [1]: https://www.postgresql.org/message-id/CAEze2Wh6C_QfxLii%2B%2BeZue5%3DKvbVXKkHyZW8PLmtLgyjmFzwCQ%40mail.gmail.com
> [2]: https://github.com/imranzaheer612/pg-recovery-testing
>
> --
> Regards,
> Imran Zaheer
> CYBERTEC PostgreSQL International GmbH

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Soumya S Murali 2026-02-03 07:24:48 Re: 001_password.pl fails with --without-readline