Re: pg_stat_io_histogram

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stat_io_histogram
Date: 2026-01-29 16:27:30
Message-ID: rcn6dkcetfy2esyon3bppdolwyuvlmtnlhiqfme4maxd66rvdi@c4kvuj4oj333
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-01-28 12:12:10 +0100, Jakub Wartak wrote:
> On Tue, Jan 27, 2026 at 1:06 PM Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
> > Not yet, I first wanted to hear if I'm not sailing into some plain stupid
> > direction somewhere with this idea or implementation (e.g.
> > that INSTR_TIME_GET_MICROSEC() was a really stupid omission from my side).
> >
> > I'll try to perform this test overhead measurement hopefully with v3 once
> > we settle on how to do that bit shifting/clz().
> >
>
> [..]
> Here's the answer: on properly isolated perf test run (my
> old&legacy&predictiable
> 4s32c64t NUMA box, s_b=8GB, DB size 16GB, hugepages, no turboboost, proper
> warmup,
> no THP, cpupower D0, no physical I/O, ~22k pread64() calls/sec combined to
> VFS
> cache)
> and started on just using single NUMA: numactl --membind=0
> --cpunodebind=0
> measured using: pgbench -M prepared -c 4 -j 4 postgres -T 20 -P 1 -S
>
> master+track_io_timings=on, 60s warmup and then 3x runs
> tps = 44615.603668
> tps = 44556.191492
> tps = 44813.793981
> avg = 44662
>
> master+track_io_timings=on+patch, , 60s warmup and then 3x runs
> tps = 44441.879384
> tps = 44403.101737
> tps = 45036.747418
> avg = 44627
>
> so that's like 99.921% (so literally no overhead) and yields picture like:

I don't think that's a particularly useful assurance, unfortunately:

1) Using pgbench with an in-memory readonly workload is typically limited by
context switch overhead and per-statement overhead. After a short while you
have at most one IO per statement (the heap page), which obviously isn't
going to be affected by a small per-IO overhead.

2) The per-core memory bandwidth on that old machine, if it's the quite old
EDB machine I think it is, is so low, that you'd be bottlenecked by memory
bandwidth well before you're going to be bottlenecked by actual CPU stuff
(which the bucket computation is).

I think you'd have to test something like pg_prewarm(), with
io_combine_limit=1, on a modern *client* CPU (client CPUs typically have much
higher per-core memory bandwidth than the more scalable server CPUs).

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-01-29 16:33:14 Re: Decoupling our alignment assumptions about int64 and double
Previous Message Bear Giles 2026-01-29 16:25:26 Proposed: extend github pages for documentation's sample code