Quick Links

Re: RFC: Logging plan of the running query

From:	torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Atsushi Torikoshi <torikoshia(dot)tech(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, samimseih(at)gmail(dot)com, destrex271(at)gmail(dot)com
Subject:	Re: RFC: Logging plan of the running query
Date:	2025-09-16 13:30:31
Message-ID:	6143f00af4bfdba95a85cf866f4acb41@oss.nttdata.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2025-09-10 02:59, Robert Haas wrote:
> I haven't had time to review this in a while -- sorry about that --
> and I have only limited time now, but let me try to give you some
> comments in the time that I do have.

Thank you very much for taking the time to review this!

> I bet some users would really like a feature that allows them to get
> the plans for their own running queries. They will be sad with the
> limitation of this feature, that it only writes to the logs. We should
> probably just live with that limitation, because doing anything else
> seems a lot more complicated. Still, it's not great.

I agree.
Regarding output in a form other than the logs, I think the idea
proposed in [1] — making it possible to inspect memory contexts of other
backends via a view — would be a useful reference.

[1]
https://www.postgresql.org/message-id/flat/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw(at)mail(dot)gmail(dot)com

> The first of these messages ends with a period, which is contrary to
> style guidelines.

Removed the period.

> All of them do errhidestmt() and errhidecontext(),
> which is an unusual choice. I don't see why it's appropriate here.
> Wanting to see both the query and the query plan seems pretty
> reasonable to me.

Removed errhidestmt() and errhidecontext().

> The first two messages seem fairly unhelpful to me:
> the user isn't going to understand the distinction between those two
> states and it's unclear why we should give them that information. I'm
> not sure if we should log a generic message in these kinds of cases or
> log nothing at all, but I feel like this is too much technical
> information.

I'm not sure if this is the best approach, but I changed them to log
nothing in these cases.
In addition to the reason you mentioned, I found that when
pg_log_query_plan() is executed repeatedly at very short intervals
(e.g., every 0.01 seconds), ereport() could lead to
ProcessLogQueryPlanInterrupt() being invoked again inside errfinish via
CFI(), which resulted in a stack overflow.
While this could be processed with check_stack_depth(), it doesn’t seem
worthwhile to use it just to emit a message of limited usefulness.
Additionally, there are other cases where the plan cannot be logged
(e.g., when the query is in a state which ExecProcNode will not be
invoked any further).
So instead, I added the following note to the documentation:

Note that depending on the execution state of the query,
it may not be possible to log the plan.

> Also, I think that we don't normally put the PID of the
> process that is performing an action in the primary error message,
> because the user can include %p in log_line_prefix if they so desire.

That makes sense, but
I had used the following message from pg_log_backend_memory_contexts()
as a reference
and put the PID in the message:

errmsg("logging memory contexts of PID %d", MyProcPid)));

Since both pg_log_query_plan() and pg_log_backend_memory_contexts()
output information about another other backend, it feels slightly odd
that their log messages would be inconsistent.
Perhaps should we consider changing pg_log_backend_memory_contexts() for
consistency as well?

> But there's another, subtler point here, which is that I feel like "is
> not running a query or a subtransaction is aborted" is pointing to a
> design defect in the patch. When we enter an inner query, we switch
> activeQueryDesc to point to the inner QueryDesc. When we abort a
> subtransaction, instead of resetting it to the outer query's
> QueryDesc, we reset it to NULL. I don't really think that's
> acceptable. Consider this:
>
> SELECT some_slow_function_that_uses_a_subtransaction_which_aborts(g)
> FROM generate_series(1,1000000) g;
>
> What's going to happen here is that for 99.9999% of the execution time
> of this function, you can't print the query plan. And that won't be
> because of any fundamental thing is preventing you from so doing -- it
> will just be because the patch doesn't have the right bookkeeping. If
> you added a QueryDesc * pointer to TransactionStateData, then
> ExecutorRun could (1) save the current value of that pointer from the
> topmost element of the transaction state stack, (2) update the pointer
> value to the new QueryDesc, and (3) put the old pointer back at the
> end.

Thank you for pointing out the issue and suggesting a fix.
Attached patch added a QueryDesc pointer to TransactionStateData and
updated it accordingly.
With this change, while running the query below, pg_log_query_plan() is
now able to output the plan for SELECT gen_subtxn(g) FROM
generate_series(1,1000000) g as below:

(session1)=# CREATE OR REPLACE FUNCTION gen_subtxn(i int)
RETURNS int LANGUAGE plpgsql AS $$
DECLARE
result int;
BEGIN
BEGIN
PERFORM 1 / 0;
EXCEPTION
WHEN division_by_zero THEN
result := -i;
END;
RETURN result;
END;
$$;

=# SELECT gen_subtxn(g) from generate_series(1,1000000) g;

(session2)=# SELECT pg_log_query_plan(pid) FROM pg_stat_activity WHERE
backend_type = 'client backend';
(session2)=# \watch 0.1

(in the log)
LOG: 00000: query and its plan running on the backend are:
Query Text: SELECT gen_subtxn(g) from generate_series(1,1000000) g;
Function Scan on pg_catalog.generate_series g (cost=0.00..260000.00
rows=1000000 width=4)
Output: gen_subtxn(g)
Function Call: generate_series(1, 1000000)

> Then, you wouldn't need any cleanup in AbortSubTransaction, and
> this value would always be right, even after an outer query continues
> after an abort, because the subtransaction abort would pop the
> transaction state stack, leaving the right thing on top.

It seems that simply making the above change is not sufficient.
In particular, when raising an error inside a subtransaction, I
sometimes observed segfaults:

BEGIN;
savepoint x;
CREATE TABLE koju (a INT UNIQUE);
INSERT INTO koju VALUES (1);
INSERT INTO koju VALUES (1); -- key duplicate error

Inspecting the crash showed that the argument *ps to
ExecSetExecProcNodeRecurse() was 0x7f7f7f....

(lldb) f 0
frame #0: 0x000000010102ec40
postgres`ExecSetExecProcNodeRecurse(ps=0x7f7f7f7f7f7f7f7f) at
execProcnode.c:601:30
600 {
-> 601 ExecSetExecProcNode(ps, ps->ExecProcNodeReal);

This happens when the plan-logging function is called after an ERROR is
raised but before the transaction state stack is popped.
So in the attached patch, as in the previous version, I reset QueryDesc
to NULL during that interval.
I have also confirmed that even with this change, the above test (SELECT
gen_subtxn(g) from generate_series(1,1000000) g;) can still have its
plan logged successfully.

> The changes to miscadmin.h delete a blank line. They probably
> shouldn't.
Fixed.

> The only change to pquery.h is to delete a blank line. That hunk needs
> reverting.
Fixed.

> +-- Note that we're using a slightly more complex query here because
> +-- a simple 'SELECT pg_log_query_plan(pg_backend_pid())' would finish
> +-- before it reaches the code path that actually outputs the plan.
> +SELECT result FROM (SELECT pg_log_query_plan(pg_backend_pid()))
> result;
> + result
> +--------
> + (t)
> +(1 row)
>
> I seriously doubt that this will be stable across the entire
> buildfarm. You're going to need a different approach here.

Changed it to the following query:

WITH t AS MATERIALIZED (SELECT pg_log_query_plan(pg_backend_pid()))
SELECT * FROM t;

Also since when the target query is using CTE, pg_log_query_plan() had
few chance to log the plan, attached patch added T_CteScan handling.

> Is there any real reason to rename regress_log_memory to
> regress_log_function, vs. just introducing a separate regress_log_plan
> role?

I don’t have any particular reason to insist on unifying them.
In the attached patch, I created a regress_log_plan role and used that
to run the same test.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Attachment	Content-Type	Size
v47-0001-Add-function-to-log-the-plan-of-the-currently-ru.patch	text/x-diff	34.5 KB

In response to

Re: RFC: Logging plan of the running query at 2025-09-09 17:59:46 from Robert Haas

Responses

Re: RFC: Logging plan of the running query at 2025-09-16 15:05:57 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2025-09-16 13:34:45	Re: --with-llvm on 32-bit platforms?
Previous Message	Mircea Cadariu	2025-09-16 13:05:45	Re: vacuumdb --analyze-only does not need to issue VACUUM (ONLY_DATABASE_STATS) ?