Re: Stack-based tracking of per-node WAL/buffer usage

From: Lukas Fittl <lukas(at)fittl(dot)com>
To: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Stack-based tracking of per-node WAL/buffer usage
Date: 2026-03-14 20:49:23
Message-ID: CAP53PkxFP7i7-wy98ZmEJ11edYq-RrPvJoa4kzGhBBjERA4Nyw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 10, 2026 at 1:13 AM Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> wrote:
> > ResourceOwnerForgetInstrumentation directly follows the call to
> > ExecFinalizeNodeInstrumentation in standard_ExecutorFinish, so I'm not
> > sure which error case you're thinking of?
>
> There are a few pallocs between them, so OOM is possible, even if
> unlikely. I mainly mentioned this because even if unlikely it can
> happen in theory, and the fix seems simple to me.

Ah, yeah, I didn't consider the pallocs in InstrFinalizeNode causing
an OOM that would cause an abort - good thinking!

I've adjusted this to use a dlist instead of an slist and
InstrFinalizeNode now deletes the node from the list.

> > I don't think that's a permanent leak, since it would be in the memory
> > context of the caller, i.e. the per-query memory context
>
> Yes, it's definitely not permanent, but could be bad with many tuples.
>
> > and so
> > doing the ResOwner dance for each tuple is probably not ideal.
>
> These approaches are interesting, but also add complexity, so I'm
> unsure which is better for this, the pfree calls add one line and
> solve the main issue with the current code.

Yeah, fair point on complexity - I've just added the pfree for now.

> > Basically it should be this instead, I think,
> > so we correctly call the table AM's table_index_fetch_tuple again if
> > call_again gets set:
>
> Right, this code will be better.

Implemented this fix in IndexNext, and also expanded the tracking of
table access to IndexNextWithReorder.

Regarding Index-Only Scans, I did not add instrumentation for table
access yet - I might add that in a follow-up revision or we could also
do it in a follow-on patch.

> > I don't know if the extra allocations
> > really matter, but I can see your point.
>
> Yeah, probably doesn't matter that much, but the code also wasn't that
> nice in that form. I didn't try to actually modify it, but by just
> looking at it the grouped option seemed cleaner to me, and the output
> should also be self-explanatory and logical to users.

Yep, fair point - I've now added two groups "Table Buffers" and "Table
I/O Timings" that get used in structured output.

---

See attached v8 rebased on latest master, that also fixes the issues
Zsolt pointed out in 0005 and 0007.

Additionally, two other minor changes in 0005 (the commit that adds
stack-based instrumentation):

1) Change the allocation for query/node instrumentation so that we
only use top-level memory context when WAL/buffer usage is requested
(i.e. instrumentation stack is needed) - easy enough to do, and makes
the timing/rows only case a bit cheaper.

2) Fix a missing addition to the remaining pgWalUsage global in the
parallel query case, when instrumentation is used. For context, to
avoid double counting we don't have the parallel workers call
ExecFinalizeNodeInstrumentation (instead the per-node numbers get
reported back to the leader, and that one bubbles them up), but that
also means that InstrAccumParallelQuery doesn't see the per-node
activity. This now gets added in ExecParallelRetrieveInstrumentation.

All other patches as before.

Thanks,
Lukas

--
Lukas Fittl

Attachment Content-Type Size
v8-0001-instrumentation-Separate-trigger-logic-from-other.patch application/octet-stream 9.7 KB
v8-0004-instrumentation-Add-additional-regression-tests-c.patch application/octet-stream 23.5 KB
v8-0002-instrumentation-Separate-per-node-logic-from-othe.patch application/octet-stream 26.3 KB
v8-0003-instrumentation-Replace-direct-changes-of-pgBuffe.patch application/octet-stream 9.9 KB
v8-0005-Optimize-measuring-WAL-buffer-usage-through-stack.patch application/octet-stream 68.2 KB
v8-0008-Add-pg_session_buffer_usage-contrib-module.patch application/octet-stream 25.5 KB
v8-0007-Index-scans-Show-table-buffer-accesses-separately.patch application/octet-stream 17.6 KB
v8-0006-instrumentation-Optimize-ExecProcNodeInstr-instru.patch application/octet-stream 11.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2026-03-14 21:13:19 Re: Add starelid, attnum to pg_stats and leverage this in pg_dump
Previous Message Corey Huinker 2026-03-14 20:05:41 Re: Import Statistics in postgres_fdw before resorting to sampling.