Re: [WIP] Pipelined Recovery

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>
Cc: assam258(at)gmail(dot)com, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Pipelined Recovery
Date: 2026-06-25 07:47:20
Message-ID: CABPTF7Vz+p5dxUbKPxExaAVLujkbJjrpzXsOZbCqKMp_NvL3YA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Imran,

On Tue, Jun 23, 2026 at 9:27 PM Imran Zaheer <imran(dot)zhir(at)gmail(dot)com> wrote:
>
> Hi
>
> I am attaching the new series of patches.
>
> What has changed?
>
> * Rebased
>
> * The patch set is now split into two new patches. This will make the
> code easier to understand and review.
>
> * The v4-0003 patch contains code mostly related to keeping the
> recovery states synced between the startup process and the pipeline
> process. Most of these changes were required to make the streaming
> replication work.
>
> * The v4-0002 patch now only contains the consumer code that handles
> receiving the decoded records from the shmem queue and moving the redo
> loop forward.
>
> * The v4-0004 contains some basic tests to see if the pipeline worker
> is functioning as expected. More testing was done by passing
> PG_TEST_INITDB_EXTRA_OPTS="-c wal_pipeline=on" before running the
> recovery test suite.

+1 for splitting the patch set into smaller components to make the
review process smoother.

> * Other than that, the cpu overhead during deserialization is
> optimized by skipping multiple copies of the decoded record and
> directly passing the pointer to the shmem queue. There is still some
> overhead visible during serialization that could be improved at the
> producer end.
>
> * Signal handling for the pipeline worker is improved so that
> promotion signals are sent to both the startup process and the
> producer worker by the postmaster.
>
>
> You will also find the new benchmarks attached [1] and the pdf report
> overview. A simple cpu profiling on the pipelined startup process
> shows that the cpu overhead during reading records has now been
> removed and offloaded to the producer worker.
>
> Before pipelining:
>
> Around 50% of the cpu time is spent on fetching the wal record. Note that
> in this workload pipeline is off so don't worry about the new func
> ReceiveRecord(), it's just a wrapper around ReadRecord().
>
> Children Self Command Shared O Symbol
> - 98.85% 0.21% postgres postgres [.] PerformWalRecovery
> - 98.64% PerformWalRecovery
> - 51.00% ReceiveRecord
> - 50.78% ReadRecord
> - 50.52% XLogPrefetcherReadRecord
> - 49.61% XLogPrefetcherNextBlock
> + 25.33% XLogReadAhead
> + 22.32% PrefetchSharedBuffer
> + 0.76% smgropen
> - 46.68% ApplyWalRecord
> + 29.23% heap_redo
> + 9.51% heap2_redo
> + 4.74% btree_redo
> + 1.11% xlog_redo
> + 0.80% xact_redo
>
>
> After Pipelining:
>
> Here the only work needed to be done by the cpu is to get the decoded
> record from
> the queue. Other times (89.13%) cpu is worried about applying the wal record.
>
> Children Self Command Shared O Symbol
> - 98.74% 0.37% postgres postgres [.] PerformWalRecovery
> - 98.37% PerformWalRecovery
> - 89.13% ApplyWalRecord
> + 56.89% heap_redo
> + 18.28% heap2_redo
> + 8.01% btree_redo
> + 2.02% xlog_redo
> + 1.15% xact_redo
> - 7.80% ReceiveRecord
> + 7.63% WalPipeline_ReceiveRecord
>
> If the recovery process is not I/O bound then we would be able to test
> this cpu optimization. Doing pgbench on a workload that is fully in
> memory shows around 30% performance gains. You can see more
> benchmarking details in the attached drive link [1]

The perf result looks promising!

> Some comments related to attached pdf and benchmarking, it is showing
> that we can get more performance advantage out of the pipeline when
> most of the workload is running in memory i.e. we have enough shared
> buffers configured.
>
> If you want to do some experiments, please be my guest; I would be
> happy to see more testing. You can share what performance advantage
> you are getting from this. You can also refer to the benchmarking
> script that I have been using [2].
>
>
> Looking forward to your review, comments, etc.

I haven't had a chance for a meaningful review yet, but expect to do so soon.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-06-25 08:00:33 Re: [PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks
Previous Message Ashutosh Sharma 2026-06-25 07:31:04 Re: Report bytes and transactions actually sent downtream