Re: Checkpointer write combining

From: Soumya S Murali <soumyamurali(dot)work(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: "li(dot)evan(dot)chao(at)gmail(dot)com" <li(dot)evan(dot)chao(at)gmail(dot)com>, "byavuz81(at)gmail(dot)com" <byavuz81(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Checkpointer write combining
Date: 2025-12-17 05:53:08
Message-ID: CAMtXxw9KmFjjerx==hQLFOHf1f56+tcY-XoneOQCOwni0QPUag@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

On Tue, Dec 16, 2025 at 3:18 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Mon, Dec 15, 2025 at 4:36 AM Soumya S Murali
> <soumyamurali(dot)work(at)gmail(dot)com> wrote:
> >
> > With reference to the last patches (v11) I received [1] and while reviewing Melanie’s latest feedback, I understood that PageSetBatchChecksumInplace() is currently WIP and depends on upcoming changes to hint-bit locking. It will be contrary to the flow if I propose new functional changes to checksum batching at this time. So for now I will focus on preparatory or documentation improvements until I get the updates on dependencies.
> > Regarding my patch attached, the patch introduces write-combining during checkpoints by batching contiguous buffers and allowing them to be written using vectorized I/O. My patch includes write-combining for checkpoint buffer flushes, contiguous buffer batching, Preserved WAL ordering, locking, and buffer state invariants. The change is currently limited to the checkpointer path (BufferSync()). So far I tested my implementation and found that all the regression (233 tests) and isolation tests (121 tests) got passed, the manual pgbench validation completed successfully and also verified pg_stat_bgwriter counters before and after checkpoints. So far the implementation is stable in my system.
>
> Can you explain how your implementation differs from what was posted
> in v11 0006 [1]? That implements checkpointer write combining. I'm
> open to ideas for improving the code, but I don't understand how your
> patch is supposed to fit into the ongoing work on this thread.
>
> - Melanie
>
> [1] https://www.postgresql.org/message-id/CAAKRu_ZiEpE_EHww3S3-E3iznybdnX8mXSO7Wsuru7%3DP9Y%3DczQ%40mail.gmail.com

Thank you for the question.
My patch is not intended to replace or redesign v11-0006. I am fully
aligned with that patch and treated it as the baseline for my work.
The work I sent is intentionally incremental, rather than introducing
a new batching logic.
v11-0006 already implements the core checkpointer write-combining
logic (batch formation, contiguity checks, WAL ordering, pin limits,
and IO issuance). I did not change that structure.
My changes focus on correctness around existing CleanVictimBuffer()
ensuring content locks are always released on early exit paths and
making the shared exclusive lock transitions explicit. This directly
addresses the lock-handling issue you pointed out earlier in the
thread. And the BufferNeedsWALFlush() clarifying semantics so that an
LSN is only returned when the buffer is logged (BM_PERMANENT),
otherwise explicitly setting it to InvalidXLogRecPtr. This matches the
direction you mentioned about avoiding confusing or unsafe LSN
propagation.
I intentionally did not modify PageSetBatchChecksumInplace(), since it
is clearly marked WIP and depended on the hint-bit locking work as you
mentioned.
I just validated the v11-0006 design on a fresh tree, done a few small
correctness cleanups that do not alter behavior and done the testing
like make check, isolation tests and manual checkpoint validation for
confirmation. I hope you find this useful.
Thank you for the guidance and patience. Looking forward to more feedback.

Regards,
Soumya

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-12-17 05:54:23 Re: Skipping schema changes in publication
Previous Message jian he 2025-12-17 05:49:36 Re: misleading error message in ProcessUtilitySlow T_CreateStatsStmt