Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting
Date: 2025-10-15 08:43:29
Message-ID: CABPTF7XNtg5vh8hJkcv5tnBRtbVzZLP9MQVyUbpK=zAxhczUjw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Oct 15, 2025 at 8:31 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Sat, Oct 11, 2025 at 11:02 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > The following is the split patch set. There are certain limitations to
> > this simplification effort, particularly in patch 2. The
> > read_local_xlog_page_guts callback demands more functionality from the
> > facility than the WAIT FOR patch — specifically, it must wait for WAL
> > flush events, though it does not require timeout handling. In some
> > sense, parts of patch 3 can be viewed as a superset of the WAIT FOR
> > patch, since it installs wake-up hooks in more locations. Unlike the
> > WAIT FOR patch, which only needs wake-ups triggered by replay,
> > read_local_xlog_page_guts must also handle wake-ups triggered by WAL
> > flushes.
> >
> > Workload characteristics play a key role here. A sorted dlist performs
> > well when insertions and removals occur in order, achieving O(1)
> > complexity in the best case. In synchronous replication, insertion
> > patterns seem generally monotonic with commit LSNs, though not
> > strictly ordered due to timing variations and contention. When most
> > insertions remain ordered, a dlist can be efficient. However, as the
> > number of elements grows and out-of-order insertions become more
> > frequent, the insertion cost can degrade to O(n) more often.
> >
> > By contrast, a pairing heap maintains stable O(1) insertion for both
> > ordered and disordered inputs, with amortized O(log n) removals. Since
> > LSNs in the WAIT FOR command are likely to arrive in a non-sequential
> > fashion, the pairing heap introduced in v6 provides more predictable
> > performance under such workloads.
> >
> > At this stage (v7), no consolidation between syncrep and xlogwait has
> > been implemented. This is mainly because the dlist and pairing heap
> > each works well under different workloads — neither is likely to be
> > universally optimal. Introducing the facility with a pairing heap
> > first seems reasonable, as it offers flexibility for future
> > refactoring: we could later replace dlist with a heap or adopt a
> > modular design depending on observed workload characteristics.
> >
>
> v8-0002 removed the early fast check before addLSNWaiter in WaitForLSNReplay,
> as the likelihood of a server state change is small compared to the
> branching cost and added code complexity.
>

Made minor changes to #include of xlogwait.h in patch2 to calm CF-bots down.

Best,
Xuneng

Attachment Content-Type Size
v9-0003-Improve-read_local_xlog_page_guts-by-replacing-po.patch application/octet-stream 9.0 KB
v9-0001-Add-pairingheap_initialize-for-shared-memory-usag.patch application/octet-stream 3.0 KB
v9-0002-Add-infrastructure-for-efficient-LSN-waiting.patch application/octet-stream 24.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Алена Васильева 2025-10-15 08:50:21 [PATCH] Handle out-of-range timestamps in timestamptz_to_str()
Previous Message Xuneng Zhou 2025-10-15 08:40:03 Re: Implement waiting for wal lsn replay: reloaded