Re: Stack-based tracking of per-node WAL/buffer usage

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Lukas Fittl <lukas(at)fittl(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Stack-based tracking of per-node WAL/buffer usage
Date: 2026-03-25 10:47:01
Message-ID: a1edb578-8a54-4f7a-ad74-11ce9cef291a@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24/03/2026 08:03, Lukas Fittl wrote:
> Instead I've tried introducing a memory context for instrumentation
> managed as a resource owner, and I am now (for now) convinced that
> this is the right trade-off for the problem at hand.

Yes, that seems better.

This patch could use an overview README file, I'm struggling to
understand how the this all works. Here's my understanding so far,
please correct me if I'm wrong:

There are *two* data structures tracking the Instrumentation nodes. The
patch only talks about a stack, but I think there's also implicitly a
tree in there.

Tree
----

All Instrumentation nodes are part of a tree. For example, if you have
two portals open, the tree might look like this:

Session - Query A - NestLoop - Seq Scan A
- Seq Scan B

- Query B - Seq Scan C

When a node is "finalized", its counters are added to its parent.

This tree is a somewhat implicit in the patch. Each QueryInstrumentation
has a list of child nodes, but only unfinalized ones. Don't we need that
at the session level too? When a Query is released on abort, its
counters need to be added to the parent too. If I understand correctly,
the patch tries to use the stack for that, but it's confusing.

I think it would make the patch more clear to talk explicitly about the
tree, and represent it explicitly in the Instrumentation nodes. I.e. add
a "parent" pointer, or a "children" list, or both to the Instrumentation
struct.

Stack
-----

At all times, there's a stack that tracks what is the Instrumentation in
the tree that is *currently* executing. For example, while executing the
Seq Scan B, the stack would look like this:

0: Session
1: Query A
2: NestLoop
3: Seq Scan B

And when the code is sending a result row back to the client, while the
query is being executed, the stack would be just:

0: Session

In the patch, the stack is represented by an array. It could also be
implemented with a CurrentInstrumentation global variable, similar to
CurrentMemoryContext and CurrentResourceOwner.

Abort handling
--------------

On abort, two things need to happen:

1. Reset the stack to the appropriate level. This ensures that any we
don't later try to update the counters on an Instrumentation nodes that
is going away with the abort. In the above example, the stack would be
reset to the 0: Session level.

2. Finalize all the Instrumentation nodes as part of the ResourceOwner
cleanup. All Instrumentation nodes that are released roll up their
counters to their parents.

Questions:

Is the stack always a path from the root of the tree, down to some node?
Or could you have e.g. recursion like A -> B -> C -> A? (I don't know if
it makes a difference, just wondering)

What happens if you release e.g. the NestLoop before its children? All
the Instrumentation nodes belonging to a query would usually be part of
the same ResourceOwner and there's no guarantee on what order the
resources are released.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ed Behn 2026-03-25 10:47:06 Re: access numeric data in module
Previous Message Jim Jones 2026-03-25 10:32:09 Re: VACUUM FULL, CLUSTER, and REPACK block on other sessions' temp tables