Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting
Date: 2025-09-28 13:47:06
Message-ID: CABPTF7X7XmnkMBPD5EHXLy7kCB7pNq92wfciuXumG5DqjQnb-g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Aug 28, 2025 at 4:22 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi,
>
> Some changes in v3:
> 1) Update the note of xlogwait.c to reflect the extending of its use
> for flush waiting and internal use for both flush and replay waiting.
> 2) Update the comment above logical_read_xlog_page which describes the
> prior-change behavior of read_local_xlog_page.

In an off-list discussion, Alexander pointed out potential issues with
the current single-heap design for replay and flush when promotion
occurs concurrently with WAIT FOR. The following is a simple example
illustrating the problem:

During promotion, there's a window where we can have mixed waiter
types in the same heap:

T1: Process A calls read_local_xlog_page_guts on standby
T2: RecoveryInProgress() = TRUE, adds to heap as replay waiter
T3: Promotion begins
T4: EndRecovery() calls WaitLSNWakeup(InvalidXLogRecPtr)
T5: SharedRecoveryState = RECOVERY_STATE_DONE
T6: Process B calls read_local_xlog_page_guts
T7: RecoveryInProgress() = FALSE, adds to SAME heap as flush waiter

The problem is that replay LSNs and flush LSNs represent different
positions in the WAL stream. Having both types in the same heap can
lead to:
- Incorrect wakeup logic (comparing incomparable LSNs)
- Processes waiting forever
- Wrong waiters being woken up

To avoid this problem, patch v4 is updated to utilize two separate
heaps for flush and replay like Alexander suggested earlier. It also
introduces a new separate min LSN tracking field for flushing.

Best,
Xuneng

Attachment Content-Type Size
v4-0000-cover-letter.patch application/octet-stream 960 bytes
v11-0001-Implement-WAIT-FOR-command.patch application/octet-stream 63.5 KB
v4-0002-Improve-read_local_xlog_page_guts-by-replacing-po.patch application/octet-stream 27.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-09-28 14:20:29 Re: [PATCH] GROUP BY ALL
Previous Message David G. Johnston 2025-09-28 13:18:34 Re: Why cannot alter column type when a view depends on it?