Re: pg_stat_io_histogram

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stat_io_histogram
Date: 2026-03-06 14:48:38
Message-ID: CAKZiRmz-KH2a_qcwGeFwsoajCsQEd8jXyFLx0B0yKgkK=LhBLA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 2, 2026 at 4:06 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

Hi Andres,

> On 2026-03-02 09:01:05 +0100, Jakub Wartak wrote:
> > On Thu, Feb 26, 2026 at 5:13 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > > > but I think having it in PgStat_BktypeIO is not great. This makes
> > > > > > PgStat_IO 30k*BACKEND_NUM_TYPES bigger, or ~ 0.5MB. Having a stats snapshot
> > > > > > be half a megabyte bigger for no reason seems too wasteful.
> > > > >
> > > > > Yea, that's not awesome.
> > > >
> > > > Guys, question, could You please explain me what are the drawbacks of having
> > > > this semi-big (internal-only) stat snapshot of 0.5MB? I'm struggling to
> > > > understand two things:
> > > > a) 0.5MB is not a lot those days (ok my 286 had 1MB in the day ;))
> > >
> > > I don't really agree with that, I guess. And even if I did, it's one thing to
> > > use 0.5MB when you actually use it, it's quite another when most of that
> > > memory is never used.
> > >
> > >
> > > With the patch, *every* backend ends up with a substantially larger
> > > pgStatLocal. Before:
> > >
> > > nm -t d --size-sort -r -S src/backend/postgres|head -n20|less
> > > (the second column is the decimal size, third the type of the symbol)
> > >
> > > 0000000004131808 0000000000297456 r yy_transition
> > > ...
> > > 0000000003916352 0000000000054744 r UnicodeDecompMain
> > > 0000000021004896 0000000000052824 B pgStatLocal
> > > 0000000003850592 0000000000040416 r unicode_categories
> > > ...
> > >
> > > after:
> > > 0000000023220512 0000000000329304 B pgStatLocal
> > > 0000000018531648 0000000000297456 r yy_transition
> > > ...
> > >
> > > And because pgStatLocal is zero initialized data, it'll be on-demand-allocated
> > > in every single backend (whereas e.g. yy_transition is read-only shared). So
> > > you're not talking a single time increase, you're multiplying it by the numer
> > > of active connections
> > >
> > > Now, it's true that most backend won't ever touch pgStatLocal. However, most
> > > backends will touch Pending[Backend]IOStats, which also increased noticably:
> > >
> > > before:
> > > 0000000021060960 0000000000002880 b PendingIOStats
> > > 0000000021057792 0000000000002880 b PendingBackendStats
> > >
> > > after:
> > > 0000000023568416 0000000000018240 b PendingIOStats
> > > 0000000023549888 0000000000018240 b PendingBackendStats
> > >
> > >
> > > Again, I think some increase here doesn't have to be fatal, but increasing
> > > with mainly impossible-to-use memory seems just too much waste to mee.
> > >
> > >
> > > This also increases the shared-memory usage of pgstats: Before it used ~300kB
> > > on a small system. That nearly doubles with this patch. But that's perhaps
> > > less concerning, given it's per-system, rather than per-backend memory usage.
> > >
> > >
> > >
> > > > b) how does it affect anything, because testing show it's not?
> > >
> > > Which of your testing would conceivably show the effect? The concern here
> > > isn't really performance, it's that it increases our memory usage, which you'd
> > > only see having an effect if you are tight on memory or have a workload that
> > > is cache sensitive.
> > >
> >
> > Oh ok, now I get understand the problem about pgStatLocal properly,
> > thanks for detailed
> > explanation! (but I'm somewhat I'm still lost a little in the woods of
> > pgstat infra). Anyway, I
> > agree that PgStat_IO started to be way too big especially when the
> > pg_stat_io(_histogram)
> > views wouldn't be really accessed.
> >
> > How about the attached v6-0002? It now dynamically allocates PgStat_IO
> > memory to avoid
> > the memory cost (only allocated if pgstat_io_snapshot_cb() is used).Is
> > that the right path? And
> > if so, perhaps it should allocate it from mxct
> > pgStatLocal.snapshot.context instead?
>
> I think even the per-backend pending IO stats are too big. And for both
> pending stats, stored stats and snapshots, I still don't think I am OK with
> storing so many histograms that are not possible to use. I think that needs
> to be fixed first.

v7-0001: no changes since quite some time

Memory reduction stuff (I didn't want to squash it, so for now they are
separate)

v7-0002:
As PendingBackendStats (per individual backend IO stats) was not collecting
latency buckets at all (but it was sharing the the same struct/typedef),
I cloned the struct without those latency buckets. This reduces struct back
again from 18240, back to 2880 bytes per backend (BSS) as on master.

v7-0003:
Sadly I couldn't easily make backend-local side recording inside
PendingIOStats dynamically from within pgstat_count_io_op_time() on first
use of specific IO traffic type, so that is for each
[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES] as any
MemoryContextAlloc() from there happens to be used as part of critical
sections and this blows up.

It's just +15kB per backend, so I hope that is ok when
we just allocate if we have really desire to use it
(track_io/wal_io_timings on) -- so nm(1) reports just 2888 (so just
+8b pointer). The drawback of this is that setting GUCs locally won't be
effective for histogram collection immediatley, but only for newly spawned
backends. This means that I had to switch it to TAP test instead, so
it can be tested. I don't have strong opinion if that saving +15kB is
worth it or not for users not running with track_[io/wal_io]_timings.

v7-0004:
(This was already sent with previous message) With orginal v5 every backend
had big pgStatLocal (0000000000329304 B pgStatLocal) that was there but not
used at all if pg_stat_io(_histogram) views wouldn't be really accessed. Now
it is (0000000000000984 B pgStatLocal) and allocates
PgStat_Snapshot.PgStat_IO only when quering those views.

So with all 3 above combined we have back:
0000000011573376 0000000000002888 B PendingIOStats
0000000011570304 0000000000002880 b PendingBackendStats
0000000011569184 0000000000000984 B pgStatLocal

That's actual saving over master itself:
0000000011577344 0000000000052824 B pgStatLocal
0000000011633408 0000000000002880 b PendingIOStats
0000000011630304 0000000000002880 b PendingBackendStats

> This also increases the shared-memory usage of pgstats: Before it used ~300kB
> on a small system. That nearly doubles with this patch. But that's perhaps
> less concerning, given it's per-system, rather than per-backend memory usage.

v7-0005:
Skipping 4 backend types out of of 17 makes it ignoring ~23% of backend
types and with simple array , I can get this down from ~592384 down to
~519424 _total_ memory allocated for'Shared Memory Stats' shm (this one
was sent earlier).

v7-0006:
We could reduce total pgstats shm down to ~482944b if we would eliminate
tracking of two further IMHO useless types: autovacuum_launcher and
standalone_backend. Master is @ 315904 (so that's just 163kb more according
to pg_shm_allocations).

Patches probably need some squash and pgident, etc.

-J.

Attachment Content-Type Size
v7-0001-Add-pg_stat_io_histogram-view-to-provide-more-det.patch application/x-patch 34.8 KB
v7-0002-PendingBackendStats-save-memory.patch application/x-patch 2.4 KB
v7-0004-Convert-PgStat_IO-to-pointer-to-avoid-huge-static.patch application/x-patch 3.5 KB
v7-0003-PendingIOStats-save-memory.patch application/x-patch 9.0 KB
v7-0005-Condense-PgStat_IO.stats-BACKEND_NUM_TYPES-array-.patch application/x-patch 7.7 KB
v7-0006-Further-condense-and-reduce-memory-used-by-pgstat.patch application/x-patch 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2026-03-06 15:01:04 Re: Improve checks for GUC recovery_target_xid
Previous Message David G. Johnston 2026-03-06 14:46:24 Re: pg_plan_advice