Re: Checkpointer write combining

From: Soumya S Murali <soumyamurali(dot)work(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Checkpointer write combining
Date: 2026-01-23 12:17:38
Message-ID: CAMtXxw9cqxgNH6=8NDAA2o11GoF=4P4JO=7-FCkhr=vJCmQiJA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

> Thank you all for the patches.
> I am keeping this as a single patch because the refactoring, batching
> behavior and instrumentation are tightly coupled and all serve one
> purpose to reduce checkpoint writeback overhead while making the
> effect observable. Due to version and context differences, the patches
> did not apply cleanly in my development environment. Instead, I
> studied the patches and went through the logic in detail and then
> implemented the same ideas directly in my current tree adapting them
> wherever needed. The implementation was then validated with
> instrumentation and measurements.
>
> Before batching:
> 2026-01-22 17:27:26.969 IST [148738] LOG: checkpoint complete: wrote
> 15419 buffers (94.1%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0
> removed, 25 recycled; write=0.325 s, sync=0.284 s, total=0.754 s; sync
> files=30, longest=0.227 s, average=0.010 s; distance=407573 kB,
> estimate=407573 kB; lsn=0/1A5B8E30, redo lsn=0/1A5B8DD8
>
> After batching:
> 2026-01-22 17:31:36.165 IST [148738] LOG: checkpoint complete: wrote
> 13537 buffers (82.6%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0
> removed, 25 recycled; write=0.260 s, sync=0.211 s, total=0.625 s; sync
> files=3, longest=0.205 s, average=0.070 s; distance=404310 kB,
> estimate=407247 kB; lsn=0/3308E738, redo lsn=0/3308E6E0
>
> Debug instrumentation with (batch size = 16) confirms the batching
> behavior itself,
> buffers_written = 6196
> writeback_calls = 389
> On average: I am getting 15.9 i.e approx 16 buffers per writeback
> This shows that writebacks are issued per batch rather than per
> buffer, while WAL ordering and durability semantics remain unchanged.
> The change remains localized to BufferSync() and is intended to be a
> conservative and measurable improvement to checkpoint I/O behavior. I
> am attaching the patches herewith for review.
> I am happy to adjust the approach if there are concerns or
> suggestions. Looking forward to more feedback.
>

With reference to my previous patch related to the batching behavior,
I evaluated batch sizes 8, 16, and 32 under identical workloads. I am
attaching the log for 8, 16 and 32. All conclusions are based on
actual checkpoint logs and DEBUG BufferSync statistics:

Batch size = 8
LOG: checkpoint complete: wrote 12622 buffers (77.0%); write=0.113 s,
sync=0.195 s, total=0.485 s; sync files=37
DEBUG: checkpoint BufferSync stats: buffers_written=9923, writeback_calls=1242
Avg: 7.989 approx 8 buffers per writeback.

Batch size = 16
LOG: checkpoint complete: wrote 13537 buffers (82.6%); write=0.260 s,
sync=0.211 s, total=0.625 s; sync files=3
DEBUG: checkpoint BufferSync stats: buffers_written=6196, writeback_calls=389
Avg: 15.9 approx 16 buffers per writeback.

Batch size = 32
LOG: checkpoint complete: wrote 12914 buffers (78.8%); write=0.116 s,
sync=0.136 s, total=0.442 s; sync files=5
DEBUG: checkpoint BufferSync stats: buffers_written=12914, writeback_calls=1616
Avg: 7.99 approx 8 buffers per writeback.

Batch 16 significantly reduces sync fan-out (as low as 3 files per
checkpoint), but this comes at the cost of longer individual sync
operations, resulting in higher total checkpoint time (≈0.625 s).
Batch 32 provides a better balance, maintaining low sync fragmentation
while avoiding long sync stalls, yielding the lowest overall
checkpoint time (≈0.442 s). I am attaching the patch with batch size
fixed as 32 for now for further review.
Please let me know if further workloads or instrumentation would be useful.

Regards
Soumya

Attachment Content-Type Size
0001-Checkpointer-batch-data-writeback-during-BufferSync.patch text/x-patch 868 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2026-01-23 12:19:25 Re: WIP - xmlvalidate implementation from TODO list
Previous Message Fujii Masao 2026-01-23 12:13:58 Is abort() still needed in WalSndShutdown()?