| From: | Soumya S Murali <soumyamurali(dot)work(at)gmail(dot)com> |
|---|---|
| To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
| Subject: | Re: Checkpointer write combining |
| Date: | 2026-01-23 12:17:38 |
| Message-ID: | CAMtXxw9cqxgNH6=8NDAA2o11GoF=4P4JO=7-FCkhr=vJCmQiJA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi all,
> Thank you all for the patches.
> I am keeping this as a single patch because the refactoring, batching
> behavior and instrumentation are tightly coupled and all serve one
> purpose to reduce checkpoint writeback overhead while making the
> effect observable. Due to version and context differences, the patches
> did not apply cleanly in my development environment. Instead, I
> studied the patches and went through the logic in detail and then
> implemented the same ideas directly in my current tree adapting them
> wherever needed. The implementation was then validated with
> instrumentation and measurements.
>
> Before batching:
> 2026-01-22 17:27:26.969 IST [148738] LOG: checkpoint complete: wrote
> 15419 buffers (94.1%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0
> removed, 25 recycled; write=0.325 s, sync=0.284 s, total=0.754 s; sync
> files=30, longest=0.227 s, average=0.010 s; distance=407573 kB,
> estimate=407573 kB; lsn=0/1A5B8E30, redo lsn=0/1A5B8DD8
>
> After batching:
> 2026-01-22 17:31:36.165 IST [148738] LOG: checkpoint complete: wrote
> 13537 buffers (82.6%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0
> removed, 25 recycled; write=0.260 s, sync=0.211 s, total=0.625 s; sync
> files=3, longest=0.205 s, average=0.070 s; distance=404310 kB,
> estimate=407247 kB; lsn=0/3308E738, redo lsn=0/3308E6E0
>
> Debug instrumentation with (batch size = 16) confirms the batching
> behavior itself,
> buffers_written = 6196
> writeback_calls = 389
> On average: I am getting 15.9 i.e approx 16 buffers per writeback
> This shows that writebacks are issued per batch rather than per
> buffer, while WAL ordering and durability semantics remain unchanged.
> The change remains localized to BufferSync() and is intended to be a
> conservative and measurable improvement to checkpoint I/O behavior. I
> am attaching the patches herewith for review.
> I am happy to adjust the approach if there are concerns or
> suggestions. Looking forward to more feedback.
>
With reference to my previous patch related to the batching behavior,
I evaluated batch sizes 8, 16, and 32 under identical workloads. I am
attaching the log for 8, 16 and 32. All conclusions are based on
actual checkpoint logs and DEBUG BufferSync statistics:
Batch size = 8
LOG: checkpoint complete: wrote 12622 buffers (77.0%); write=0.113 s,
sync=0.195 s, total=0.485 s; sync files=37
DEBUG: checkpoint BufferSync stats: buffers_written=9923, writeback_calls=1242
Avg: 7.989 approx 8 buffers per writeback.
Batch size = 16
LOG: checkpoint complete: wrote 13537 buffers (82.6%); write=0.260 s,
sync=0.211 s, total=0.625 s; sync files=3
DEBUG: checkpoint BufferSync stats: buffers_written=6196, writeback_calls=389
Avg: 15.9 approx 16 buffers per writeback.
Batch size = 32
LOG: checkpoint complete: wrote 12914 buffers (78.8%); write=0.116 s,
sync=0.136 s, total=0.442 s; sync files=5
DEBUG: checkpoint BufferSync stats: buffers_written=12914, writeback_calls=1616
Avg: 7.99 approx 8 buffers per writeback.
Batch 16 significantly reduces sync fan-out (as low as 3 files per
checkpoint), but this comes at the cost of longer individual sync
operations, resulting in higher total checkpoint time (≈0.625 s).
Batch 32 provides a better balance, maintaining low sync fragmentation
while avoiding long sync stalls, yielding the lowest overall
checkpoint time (≈0.442 s). I am attaching the patch with batch size
fixed as 32 for now for further review.
Please let me know if further workloads or instrumentation would be useful.
Regards
Soumya
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Checkpointer-batch-data-writeback-during-BufferSync.patch | text/x-patch | 868 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jim Jones | 2026-01-23 12:19:25 | Re: WIP - xmlvalidate implementation from TODO list |
| Previous Message | Fujii Masao | 2026-01-23 12:13:58 | Is abort() still needed in WalSndShutdown()? |