Re: [WIP] Pipelined Recovery

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>, assam258(at)gmail(dot)com
Cc: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Pipelined Recovery
Date: 2026-04-22 09:43:56
Message-ID: CABPTF7XABSSwUPbnS+UE9OyeH-z3ihmdp9tOt3UJ4XcWZkE1DA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Henson, Imran,

On Wed, Apr 8, 2026 at 7:14 PM Imran Zaheer <imran(dot)zhir(at)gmail(dot)com> wrote:
>
> Hi
>
> I am uploading the new version with the following fixes
>
> * Rebased version.
> * Skip serialization of decoded records. As pointed out by Henson,
> there was no need to serialize the records again
> for the sh_mq. We can simply pass the continuous bytes with minor
> pointer fixing to the sh_mq
>
> This time I am uploading the benchmarking results to drive and
> attaching the link here. Otherwise my mail will get holded for
> moderation (My guess is overall attachment size is greater than 1MB thats why).
>
> I am still not sure whether my testing approach is good enough.
> Because sometimes I am not able to get the same performance
> improvement
> with the pgbench builtin scripts as I got with the custom sql scripts.
> Maybe pgbench is not creating enough WAL to test on
> or maybe I am missing something.
>
> Benchmarks: https://drive.google.com/file/d/1Y4SYVnrFEQRE5T2r87rrTr7SWC9m19Si/view?usp=sharing
>
> Thanks & Regards
> Imran Zaheer
>
> Imran Zaheer
>
> On Wed, Apr 8, 2026 at 1:46 PM Imran Zaheer <imran(dot)zhir(at)gmail(dot)com> wrote:
> >
> > >
> > > Hi Xuneng, Imran, and everyone,
> > >
> >
> > Hi Henson and Xuneng.
> >
> > Thanks for explaining the approaches to Xuneng.
> >
> > >
> > > The two approaches target different bottlenecks. The current patch
> > > parallelizes WAL decoding, which keeps the redo path single-threaded
> > > and avoids the Hot Standby visibility problem entirely.
> > >
> >
> > You are right both approaches
> > target different bottlenecks. Pipeline patch aims to improve overall
> > cpu throughput
> > and to save CPU time by offloading the steps we can safely do in parallel with
> > out causing synchronization problems.
> >
> > > One thing I am curious about in the current patch: WAL records are
> > > already in a serialized format on disk. The producer decodes them and
> > > then re-serializes into a different custom format for shm_mq. What is
> > > the advantage of this second serialization format over simply passing
> > > the raw WAL bytes after CRC validation and letting the consumer decode
> > > directly? Offloading CRC to a separate core could still improve
> > > throughput at the cost of higher total CPU usage, without needing the
> > > custom format.
> > >
> >
> > Thanks. You are right there was no need to serialize the decoded record again.
> > I was not aware that we already have continuous bytes in memory. In my
> > next patch
> > I will remove this extra serialization step.
> >
> > > Koichi's approach parallelizes redo (buffer I/O) itself, which attacks
> > > a larger cost — Jakub's flamegraphs show BufferAlloc ->
> > > GetVictimBuffer -> FlushBuffer dominating in both p0 and p1 — but at
> > > the expense of much harder concurrency problems.
> > >
> > > Whether the decode pipelining ceiling is high enough, or whether the
> > > redo parallelization complexity is tractable, seems like the central
> > > design question for this area.
> >
> > I still have to investigate the problem related to `GetVictimBuffer` that
> > Jakub mentioned. But I was trying that how can we safely offload the work done
> > by `XLogReadBufferForRedoExtended` to a separate
> > pipeline worker, or maybe we can try prefetching the buffer header so
> > the main redo
> > loop doesn't have to spend time getting the buffer

Thanks for your clarification! I'll try to review this patch later.

--
Best,
Xuneng

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dragos Andriciuc 2026-04-22 09:45:49 Re: DOCS - Add introductory paragraph to Getting Started chapter
Previous Message Alvaro Herrera 2026-04-22 09:30:26 Re: Adding REPACK [concurrently]