Re: [WIP] Pipelined Recovery

From: Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>
To: assam258(at)gmail(dot)com
Cc: Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Pipelined Recovery
Date: 2026-04-08 08:46:04
Message-ID: CA+UBfa=qDfWB90w5AsmX4f3PbeeM++GbaoVagd9ff-DKQDLvWA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Hi Xuneng, Imran, and everyone,
>

Hi Henson and Xuneng.

Thanks for explaining the approaches to Xuneng.

>
> The two approaches target different bottlenecks. The current patch
> parallelizes WAL decoding, which keeps the redo path single-threaded
> and avoids the Hot Standby visibility problem entirely.
>

You are right both approaches
target different bottlenecks. Pipeline patch aims to improve overall
cpu throughput
and to save CPU time by offloading the steps we can safely do in parallel with
out causing synchronization problems.

> One thing I am curious about in the current patch: WAL records are
> already in a serialized format on disk. The producer decodes them and
> then re-serializes into a different custom format for shm_mq. What is
> the advantage of this second serialization format over simply passing
> the raw WAL bytes after CRC validation and letting the consumer decode
> directly? Offloading CRC to a separate core could still improve
> throughput at the cost of higher total CPU usage, without needing the
> custom format.
>

Thanks. You are right there was no need to serialize the decoded record again.
I was not aware that we already have continuous bytes in memory. In my
next patch
I will remove this extra serialization step.

> Koichi's approach parallelizes redo (buffer I/O) itself, which attacks
> a larger cost — Jakub's flamegraphs show BufferAlloc ->
> GetVictimBuffer -> FlushBuffer dominating in both p0 and p1 — but at
> the expense of much harder concurrency problems.
>
> Whether the decode pipelining ceiling is high enough, or whether the
> redo parallelization complexity is tractable, seems like the central
> design question for this area.

I still have to investigate the problem related to `GetVictimBuffer` that
Jakub mentioned. But I was trying that how can we safely offload the work done
by `XLogReadBufferForRedoExtended` to a separate
pipeline worker, or maybe we can try prefetching the buffer header so
the main redo
loop doesn't have to spend time getting the buffer

Thanks for the feedback. That was helpful.

Regards,
Imran Zaheer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-04-08 09:08:36 Use proc_exit() in WalRcvWaitForStartPosition
Previous Message Chao Li 2026-04-08 08:38:32 Re: Exit walsender before confirming remote flush in logical replication