| From: | Andy Pogrebnoi <andrew(dot)pogrebnoi(at)percona(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
| Subject: | Re: Lowering the default wal_blocksize to 4K |
| Date: | 2026-02-16 08:04:37 |
| Message-ID: | E25A9AD2-EAD3-4372-AFD2-2627E4D5E3C5@percona.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello,
> On Oct 10, 2023, at 02:08, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> I've mentioned this to a few people before, but forgot to start an actual
> thread. So here we go:
>
> I think we should lower the default wal_blocksize / XLOG_BLCKSZ to 4096, from
> the current 8192.
I prepared a patch in case we want to move with the default 4kb XLOG_BLCKSZ.
Regarding reducing the page headers' size, the benefits of 4Kb wal_blocks
outweight disadvantages of the proportionally bigger header in my opinion.
Since we recycle WAL segments, the added size won't go to the disk usage but
rather cause a bit more freqent segment. And maybe this is what is also worth
looking at regarding XLOG_BLCKSZ. I wanted to look into WAL segments
preallocation after an off-the-list conversation with Andres anyway. But the
added overhead is not that significant.
> One thing I noticed is that our auto-configuration of wal_buffers leads to
> different wal_buffers settings for different XLOG_BLCKSZ, which doesn't seem
> great.
I don't think it's an issue as wal_buffers are in block units, not bytes. Even
though the auto-tuned number may change, the total amount of bytes still remains
the same with different XLOG_BLCKSZ.
> For some example numbers, I ran a very simple insert workload with a varying
> number of clients with both a wal_blocksize=4096 and wal_blocksize=8192
> cluster, and measured the amount of bytes written before/after.
I've also run some simple tests on my local machine (Ubuntu in Vagrant on M1
Mac). I run a sysbench write-only load for 20s with different amounts of threads
(and tables equal to the number of threads num) and measured disk writes with
iostat. I recreated tables and did a checkpoint before each run. These are my
results:
8Kb XLOG_BLCKSZ
====
Threads tps kB_wrtn
1 535.34 207288
5 1457.24 591708
10 1441.85 574700
15 823.98 388732
4Kb XLOG_BLCKSZ
====
Threads tps kB_wrtn
1 542.02 153544
5 1556.83 393444
10 1288.00 339648
15 975.32 255708
I will run more benchmarks on proper hardware. For example, interesting what
happens to performance with >4K writes. But what else do you think has to be
done to move this patch forward?
---
Cheers,
Andy
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andy Pogrebnoi | 2026-02-16 08:19:21 | Re: Lowering the default wal_blocksize to 4K |
| Previous Message | Michael Paquier | 2026-02-16 07:59:49 | Re: pgstat include expansion |