[WIP] Pipelined Recovery

From: Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [WIP] Pipelined Recovery
Date: 2026-01-30 06:28:47
Message-ID: CA+UBfa=vDV8wbmAV0pgrx-FuJh+x8YOW23vJ90Jzr=14rV+9jA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Based on a suggestion by my colleague Ants Aasma, I worked on this
idea of adding parallelism to the WAL recovery process.

The crux of this idea is to decode the WAL using parallel workers. Now
the replay process can get the records from the shared memory queue
directly. This way, we can decrease some CPU load on the recovery process.

Implementing this idea yielded an improvement of around 20% in the
recovery times, but results may differ based on workloads. I have
attached some benchmarks for different workloads.

Following are some recovery tests with the default configs. Here p1
shows pipeline enabled. (db size) is the backup database size on
which the recovery happens. You can see more detail related to the
benchmarks in the attached file `recoveries-benchmark-v01`.

elapsed (p0) elapsed (p1) % perf db
size

inserts.sql 272s 10ms 197s 570ms 27.37% 480 MB
updates.sql 177s 420ms 117s 80ms 34.01% 480 MB
hot-updates.sql 36s 940ms 29s 240ms 20.84% 480 MB
nonhot.sql 36s 570ms 28s 980ms 20.75% 480 MB
simple-update 20s 160ms 11s 580ms 42.56% 4913 MB
tpcb-like 20s 590ms 13s 640ms 33.75% 4913 MB

Similar approach was also suggested by Matthias van de Meent earlier in a
separate thread [1]. Right now I am using one bgw for decoding and filling
up the shared message queue, and the redo apply loop simply receives the
decoded record
from the queue. After the redo is finished, the consumer (startup
process) can request a shutdown from the producer (pipeline bgw)
before exiting recovery.

This idea can be coupled with another idea of pinning the buffers in
parallel before the recovery process needs them. This will try to
parallelize most of the work being done in
`XLogReadBufferForRedoExtended`. The Redo can simply receive
the already pinned buffers from a queue, but for implementing
this, we still need some R&D on that, as IPC and pinning/unpinning of
buffers across two processes can be tricky.

If someone wants to reproduce the benchmark, they can do so using
these scripts [2].

Looking forward to your reviews, comments, etc.

[1]:
https://www.postgresql.org/message-id/CAEze2Wh6C_QfxLii%2B%2BeZue5%3DKvbVXKkHyZW8PLmtLgyjmFzwCQ%40mail.gmail.com
[2]: https://github.com/imranzaheer612/pg-recovery-testing

--
Regards,
Imran Zaheer
CYBERTEC PostgreSQL International GmbH

Attachment Content-Type Size
v1-0001-Pipelined-Recoveries.patch application/octet-stream 46.7 KB
recoveries-becnhmark-v01.pdf application/pdf 48.6 KB
recoveries-benchmarks-v01.zip application/zip 2.7 MB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2026-01-30 06:35:59 Re: Proposal: Conflict log history table for Logical Replication
Previous Message Corey Huinker 2026-01-30 06:26:22 Re: Is there value in having optimizer stats for joins/foreignkeys?