Re: [WIP] Pipelined Recovery

From: Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>
To: assam258(at)gmail(dot)com
Cc: Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Pipelined Recovery
Date: 2026-04-08 11:14:02
Message-ID: CA+UBfakz7G5FH8PjxWyFLmF+sWdqMVcvQRRM0vURmznafqOjQQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

I am uploading the new version with the following fixes

* Rebased version.
* Skip serialization of decoded records. As pointed out by Henson,
there was no need to serialize the records again
for the sh_mq. We can simply pass the continuous bytes with minor
pointer fixing to the sh_mq

This time I am uploading the benchmarking results to drive and
attaching the link here. Otherwise my mail will get holded for
moderation (My guess is overall attachment size is greater than 1MB thats why).

I am still not sure whether my testing approach is good enough.
Because sometimes I am not able to get the same performance
improvement
with the pgbench builtin scripts as I got with the custom sql scripts.
Maybe pgbench is not creating enough WAL to test on
or maybe I am missing something.

Benchmarks: https://drive.google.com/file/d/1Y4SYVnrFEQRE5T2r87rrTr7SWC9m19Si/view?usp=sharing

Thanks & Regards
Imran Zaheer

Imran Zaheer

On Wed, Apr 8, 2026 at 1:46 PM Imran Zaheer <imran(dot)zhir(at)gmail(dot)com> wrote:
>
> >
> > Hi Xuneng, Imran, and everyone,
> >
>
> Hi Henson and Xuneng.
>
> Thanks for explaining the approaches to Xuneng.
>
> >
> > The two approaches target different bottlenecks. The current patch
> > parallelizes WAL decoding, which keeps the redo path single-threaded
> > and avoids the Hot Standby visibility problem entirely.
> >
>
> You are right both approaches
> target different bottlenecks. Pipeline patch aims to improve overall
> cpu throughput
> and to save CPU time by offloading the steps we can safely do in parallel with
> out causing synchronization problems.
>
> > One thing I am curious about in the current patch: WAL records are
> > already in a serialized format on disk. The producer decodes them and
> > then re-serializes into a different custom format for shm_mq. What is
> > the advantage of this second serialization format over simply passing
> > the raw WAL bytes after CRC validation and letting the consumer decode
> > directly? Offloading CRC to a separate core could still improve
> > throughput at the cost of higher total CPU usage, without needing the
> > custom format.
> >
>
> Thanks. You are right there was no need to serialize the decoded record again.
> I was not aware that we already have continuous bytes in memory. In my
> next patch
> I will remove this extra serialization step.
>
> > Koichi's approach parallelizes redo (buffer I/O) itself, which attacks
> > a larger cost — Jakub's flamegraphs show BufferAlloc ->
> > GetVictimBuffer -> FlushBuffer dominating in both p0 and p1 — but at
> > the expense of much harder concurrency problems.
> >
> > Whether the decode pipelining ceiling is high enough, or whether the
> > redo parallelization complexity is tractable, seems like the central
> > design question for this area.
>
> I still have to investigate the problem related to `GetVictimBuffer` that
> Jakub mentioned. But I was trying that how can we safely offload the work done
> by `XLogReadBufferForRedoExtended` to a separate
> pipeline worker, or maybe we can try prefetching the buffer header so
> the main redo
> loop doesn't have to spend time getting the buffer
>
> Thanks for the feedback. That was helpful.
>
>
> Regards,
> Imran Zaheer

Attachment Content-Type Size
v3-0002-Pipelined-Recovery-Consumer.patch application/octet-stream 47.7 KB
v3-0001-Pipelined-Recovery-Producer.patch application/octet-stream 37.7 KB
recoveries-becnhmark-v03-pdf.pdf application/pdf 48.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2026-04-08 11:24:00 Re: Import Statistics in postgres_fdw before resorting to sampling.
Previous Message Thomas Munro 2026-04-08 10:47:53 Re: Time to drop RADIUS support?