From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Subject: | Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting |
Date: | 2025-10-03 13:50:18 |
Message-ID: | CABPTF7WV_FbRHpXWqJVQaW5uRtJfwmLDda9u4i1y_2P6q-D5wA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Michael,
Thanks for your review.
On Fri, Oct 3, 2025 at 2:24 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Oct 02, 2025 at 11:06:14PM +0800, Xuneng Zhou wrote:
> > v5-0002 separates the waitlsn_cmp() comparator function into two distinct
> > functions (waitlsn_replay_cmp and waitlsn_flush_cmp) for the replay
> > and flush heaps, respectively.
>
> The primary goal that you want to achieve here is a replacement of the
> wait/sleep logic of read_local_xlog_page_guts() with a condition
> variable, and design a new facility to make the callback more
> responsive on polls. That's a fine idea in itself. However I would
> suggest to implement something that does not depend entirely on WAIT
> FOR, which is how your patch is presented. Instead of having your
> patch depend on an entirely different feature, it seems to me that you
> should try to extract from this other feature the basics that you are
> looking for, and make them shared between the WAIT FOR patch and what
> you are trying to achieve here. You should not need something as
> complex as what the other feature needs for a page read callback in
> the backend.
>
> At the end, I suspect that you will reuse a slight portion of it (or
> perhaps nothing at all, actually, but I did not look at the full scope
> of it). You should try to present your patch so as it is in a
> reviewable state, so as others would be able to read it and understand
> it. WAIT FOR is much more complex than what you want to do here
> because it covers larger areas of the code base and needs to worry
> about more cases. So, you should implement things so as the basic
> pieces you want to build on top of are simpler, not more complicated.
> Simpler means easier to review and easier to catch problems, designed
> in a way that depends on how you want to fix your problem, not
> designed in a way that depends on how a completely different feature
> deals with its own problems.
The core infrastructure shared by both this patch and the WAIT FOR
command patch is primarily in xlogwait.c, which provides the mechanism
for waiting until a given LSN is reached. Other parts of the code in
the WAIT FOR patch—covering the SQL command implementation,
documentation, and tests—is not relevant for the current patch. What
we need is only the infrastructure in xlogwait.c, on which we can
implement the optimization for read_local_xlog_page_guts.
Regarding complexity: the initial optimization idea was to introduce
condition-variable based waiting, as Heikki suggested in his comment:
/*
* Loop waiting for xlog to be available if necessary
*
* TODO: The walsender has its own version of this function, which uses a
* condition variable to wake up whenever WAL is flushed. We could use the
* same infrastructure here, instead of the check/sleep/repeat style of
* loop.
*/
I reviewed the relevant code in WalSndWakeup and WalSndWait. While
these mechanisms reduce polling overhead, they don’t prevent false
wakeups. Addressing that would likely require a request queue that
maps waiters to their target LSNs and issues targeted wakeups—a much
more complex design. Given that read_local_xlog_page_guts is not as
performance-sensitive as its equivalents, this added complexity may
not be justified. So I implemented the initial version of the
optimization like WalSndWakeup and WalSndWait.
After this, I came across the WAIT FOR patch in the mailing list and
noticed that the infrastructure in xlogwait.c aligns well with our
needs. Based on that, I built the current patch using this shared
infra.
If we prioritise simplicity and can tolerate occasional false wakeups,
then waiting in read_local_xlog_page_guts can be implemented in a much
simpler way than the current version. At the same time, the WAIT FOR
command seems to be a valuable feature in its own right, and both
patches can naturally share the same infrastructure. We could also
extract the infra and implement the current patch on it, then Wait for
could utilize it as well. Personally, I don’t have a strong preference
between the two approaches.
Best,
Xuneng
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-10-03 14:04:32 | Re: Fixing a few minor misusages of bms_union() |
Previous Message | Bertrand Drouvot | 2025-10-03 13:47:34 | Re: Report bytes and transactions actually sent downtream |