Quick Links

Re: pg_stat_io_histogram

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pg_stat_io_histogram
Date:	2026-01-29 16:27:30
Message-ID:	rcn6dkcetfy2esyon3bppdolwyuvlmtnlhiqfme4maxd66rvdi@c4kvuj4oj333
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2026-01-28 12:12:10 +0100, Jakub Wartak wrote:
> On Tue, Jan 27, 2026 at 1:06 PM Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
> > Not yet, I first wanted to hear if I'm not sailing into some plain stupid
> > direction somewhere with this idea or implementation (e.g.
> > that INSTR_TIME_GET_MICROSEC() was a really stupid omission from my side).
> >
> > I'll try to perform this test overhead measurement hopefully with v3 once
> > we settle on how to do that bit shifting/clz().
> >
>
> [..]
> Here's the answer: on properly isolated perf test run (my
> old&legacy&predictiable
> 4s32c64t NUMA box, s_b=8GB, DB size 16GB, hugepages, no turboboost, proper
> warmup,
> no THP, cpupower D0, no physical I/O, ~22k pread64() calls/sec combined to
> VFS
> cache)
> and started on just using single NUMA: numactl --membind=0
> --cpunodebind=0
> measured using: pgbench -M prepared -c 4 -j 4 postgres -T 20 -P 1 -S
>
> master+track_io_timings=on, 60s warmup and then 3x runs
> tps = 44615.603668
> tps = 44556.191492
> tps = 44813.793981
> avg = 44662
>
> master+track_io_timings=on+patch, , 60s warmup and then 3x runs
> tps = 44441.879384
> tps = 44403.101737
> tps = 45036.747418
> avg = 44627
>
> so that's like 99.921% (so literally no overhead) and yields picture like:

I don't think that's a particularly useful assurance, unfortunately:

1) Using pgbench with an in-memory readonly workload is typically limited by
context switch overhead and per-statement overhead. After a short while you
have at most one IO per statement (the heap page), which obviously isn't
going to be affected by a small per-IO overhead.

2) The per-core memory bandwidth on that old machine, if it's the quite old
EDB machine I think it is, is so low, that you'd be bottlenecked by memory
bandwidth well before you're going to be bottlenecked by actual CPU stuff
(which the bucket computation is).

I think you'd have to test something like pg_prewarm(), with
io_combine_limit=1, on a modern *client* CPU (client CPUs typically have much
higher per-core memory bandwidth than the more scalable server CPUs).

Greetings,

Andres Freund

In response to

Re: pg_stat_io_histogram at 2026-01-28 11:12:10 from Jakub Wartak

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2026-01-29 16:33:14	Re: Decoupling our alignment assumptions about int64 and double
Previous Message	Bear Giles	2026-01-29 16:25:26	Proposed: extend github pages for documentation's sample code