Re: pg_stat_io_histogram

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stat_io_histogram
Date: 2026-05-08 09:57:25
Message-ID: CAKZiRmx3Z+fxunEc4CDGrpZVFwD4MAEDfGVM_v-CW-=xTLeF_Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 19, 2026 at 11:16 AM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
>
> On Wed, Mar 18, 2026 at 2:29 PM Jakub Wartak
> <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> >
> > On Tue, Mar 17, 2026 at 3:17 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2026-03-17 13:13:59 +0100, Jakub Wartak wrote:
> > > > 1. Concerns about memory use. With v7 I had couple of ideas, and with those
> > > > the memory use is really minimized as long as the code is still simple
> > > > (so nothing fancy, just some ideas to trim stuff and dynamically allocate
> > > > memory). I hope those reduce memory footprint to acceptable levels, see my
> > > > earlier description for v7.
> > >
> > > Personally I unfortunately continue to think that storing lots of values that
> > > are never anything but zero isn't a good idea once you have more than a
> > > handful of kB. Storing pointless data is something different than increasing
> > > memory usage with actual information.
> > >
> > > I still think you should just count the number of histograms needed, have an
> > > array [object][context][op] with the associated histogram "offset" and then
> > > increment the associated offset. It'll add an indirection at count time, but
> > > no additional branches.
> >
> > Great idea, thanks, I haven't thought about that! Attached v9 attempts to do
> > that for pending backend I/O struct, which minimizes the (backend) memory
> > footprint for client backends to just about ~5kB.
> >
> > I have been pulling my hair trying to achieve the same for shared-memory, but I
> > have failed to do that w/o sinking into complexity [..]
>
> OK, I've made it done too with indirect offset on shared memory, it wasn't easy
> at least for me, but now we have two approaches/patchsets:
>
[..]
> v9b: with more code and build complexity but that should address concern of not
> used memory
>
> 'Shared Memory Stats' allocated size:
> master - uses ~308kB for shm
> v9a-000[12]: 578kB shm
> v9a-000[123]: 507kB shm
> v9a-000[1234]: 471kB shm (+~163kB more)
>
> v9b-000[123]: 361kB shm
>
> v9a-000[12] are identical to v9b-00[12], but included just for
> patchset completeness.
>
> In v9b meson/autoconf (for adding pgstat_io_genstats) build most of
> the time what
> they need, but probably that needs some cleanups and better dependency
> tracking. I'm
> not sure about correctnes of those changes as especially
> autoconf/Makefile is a lot
> like brainf**k to me and that area would need some help...
>
> I think now we could even increase max resolution of buckets to cover
> max those maximum
> of 32s+ (at the cost of one extra 64-byte cacheline for pending IO
> stats, so go with
> PGSTAT_IO_HIST_BUCKETS from 16 to 24)

Good morning all,

Ok here comes v10, which is bit like earlier v9b (so has reduced shared memory
footprint using Yours idea about indirect offsets idea), but now with shm memory
sized and allocated on startup by postmaster. There are 3 patches:
- 0001, one to introduce view and bucketting, no changes since quite some time
- 0002, saves some private (backend) memory
- 0003, main meat, saving shared memory (main problem raised earlier),
now switched
to simply dynamically size shared memory based on those pgstat_track_io*()
logic

The problem with the 0003 earlier was that I wanted to absolutley avoid further
complexiy/alterations in struct PgStat_IO related to dynamic shared memory
allocation for hist_time_buckets_slots[PGSTAT_IO_HIST_BUCKET_SLOTS]
[PGSTAT_IO_HIST_BUCKETS] (I was afraid to touch that shm code, it
looks complex),
so I had to come out with something that would tell us how many slots
(PGSTAT_IO_HIST_BUCKET_SLOTS) we need, I wish we had C++'s `constexpr` that
would do all of that. I've tried three aproaches (like in v9b but that hit
some serious cross-compiling obstacles, also had perl doing that, but that
had lots of code duplication), so in the end I had to alter the pgstat_io
shm allocation which is now in 0003.

Summary of changes in 0003 since v9b / earlier post:
- Fixed potential race condition (touch via memset/memcpy() only histogram
slots under LWLock)
- Fixed/removed the PGSTAT_IO_HIST_BUCKET_SLOTS macro
- Removed pgstat_io_genslots.c (first idea, above) and abandonded attempt to
fixup some cross compilation woes on MSVC/mingw
- Bumped PGSTAT_FILE_FORMAT_ID
- Move/optimize pending_off in pgstat_io_flush_cb out of hot loop
- Document that hist_time_buckets_offsets should be the last member of
PgStat_BktypeIO
- Be defensive - added some asserts()
- Adjust _bucket_offsets from uint64 to just int to save memory (offsets are low
numbers)
- and finally moved to dynamic shm allocation of PgStat_IO stuff during
startup

At the end of the day, I'll squeze 000[123] into just one, but wanted
to ease the
review first a bit. Of course this is material for PG20.

-J.

Attachment Content-Type Size
v10f-0002-Lower-pg_stat_io_histogram-private-backend-memo.patch text/x-patch 8.6 KB
v10f-0001-Add-pg_stat_io_histogram-view-to-provide-more-d.patch text/x-patch 39.7 KB
v10f-0003-Lower-pg_stat_io_histogram-shared-memory-use-by.patch text/x-patch 19.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2026-05-08 10:04:15 Re: Broken build on macOS (Universal / Intel): cpuid instruction not available
Previous Message Jim Jones 2026-05-08 09:20:20 Re: Fix wrong error message from pg_get_tablespace_ddl()