From: | Lukas Fittl <lukas(at)fittl(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Stack-based tracking of per-node WAL/buffer usage |
Date: | 2025-09-09 19:35:43 |
Message-ID: | CAP53Pkw1RpyAVsvJwUOMgGnVoQUibMk+EB+w-8WqY8kzsC4WjA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Andres,
Thanks for the review!
Attached an updated patch set that addresses the feedback, and also adds
the complete removal of the global pgBufferUsage variable in later patches
(0005-0007), to avoid counting both the stack and the variable.
FWIW, pgWalUsage would also be nice to remove, but it has some interesting
interactions with the cumulative statistics system
and heap_page_prune_and_freeze, and seems less performance critical.
On Thu, Sep 4, 2025 at 1:23 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > 0001: Separate node instrumentation from other use of Instrumentation
> struct
> ...
> It's mildly odd that the two types of instrumentation have a different
> definition of 'total' (one double, one instr_time).
>
Yeah, agreed. I added in a new 0001 patch that changes this to instr_time
consistently. I don't see a good reason to keep the transformation to
seconds in the Instrumentation logic, since all in-tree callers convert it
to milliseconds anyway.
> > 0003: Introduce stack for tracking per-node WAL/buffer usage
>
> Why are we doing Instr{Start,Stop}Query when not using instrumentation?
> Resowner stuff ain't free, so I'd skip them when not necessary.
>
Makes sense, I've adjusted that to be conditional (in the now renamed 0004
patch).
In the updated patch I've also decided to piggyback on QueryDesc totaltime
as the "owning" Instrumentation here for the query's lifetime. It seems
simpler that way and avoids having special purpose methods. To go along
with that I've changed the general purpose Instrumentation struct to use
stack-based instrumentation at the same time.
> diff --git a/src/backend/executor/execProcnode.c
> b/src/backend/executor/execProcnode.c
> > index d286471254b..7436f307994 100644
> > --- a/src/backend/executor/execProcnode.c
> > +++ b/src/backend/executor/execProcnode.c
> > @@ -823,8 +823,17 @@ ExecShutdownNode_walker(PlanState *node, void
> *context)
> ...
> > + InstrNodeAddToCurrent(&node->instrument->stack);
> ...
>
> Can we rely on this being reached? Note that ExecutePlan() calls
> ExecShutdownNode() conditionally:
You are of course correct, I didn't consider cursors correctly here.
It seems there isn't an existing executor node walk that could be
repurposed, so I added a new one in ExecutorFinish
("ExecAccumNodeInstrumentation"). From my read of the code there are no use
cases where we need aggregated instrumentation data before ExecutorFinish.
> +static void
> > +ResOwnerReleaseInstrStack(Datum res)
> > +{
> > + /*
> > + * XXX: Registered resources are *not* called in reverse order,
> i.e. we'll
> > + * get what was first registered first at shutdown. To avoid
> handling
> > + * that, we are resetting the stack here on abort (instead of
> recovering
> > + * to previous).
> > + */
> > + pgInstrStack = NULL;
> > +}
>
> Hm, doesn't that mean we loose track of instrumentation if you e.g. do an
> EXPLAIN ANALYZE of a query that executes a function, which internally
> triggers
> an error and catches it?
>
> I wonder if the solution could be to walk the stack and search for the
> to-be-released element.
>
Yes, good point, I did not consider that case. I've addressed this by only
updating the current stack if its not already a parent of the element being
released. We are also always adding the element's statistics to the
(updated) active stack element at abort.
Thanks,
Lukas
--
Lukas Fittl
Attachment | Content-Type | Size |
---|---|---|
v2-0006-Introduce-alternate-Instrumentation-stack-mechani.patch | application/octet-stream | 4.8 KB |
v2-0007-Convert-remaining-users-of-pgBufferUsage-to-use-I.patch | application/octet-stream | 17.7 KB |
v2-0001-Instrumentation-Keep-time-fields-as-instrtime-req.patch | application/octet-stream | 7.3 KB |
v2-0005-Use-Instrumentation-stack-for-parallel-query-aggr.patch | application/octet-stream | 9.5 KB |
v2-0004-Introduce-stack-for-tracking-per-node-WAL-buffer-.patch | application/octet-stream | 20.5 KB |
v2-0003-Replace-direct-changes-of-pgBufferUsage-pgWalUsag.patch | application/octet-stream | 9.0 KB |
v2-0002-Separate-node-instrumentation-from-other-use-of-I.patch | application/octet-stream | 21.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-09-09 19:41:53 | Re: shmem_startup_hook called twice on Windows |
Previous Message | Robert Haas | 2025-09-09 19:26:08 | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |