From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Atsushi Torikoshi <torikoshia(dot)tech(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, samimseih(at)gmail(dot)com, destrex271(at)gmail(dot)com |
Subject: | Re: RFC: Logging plan of the running query |
Date: | 2025-09-16 13:30:31 |
Message-ID: | 6143f00af4bfdba95a85cf866f4acb41@oss.nttdata.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2025-09-10 02:59, Robert Haas wrote:
> I haven't had time to review this in a while -- sorry about that --
> and I have only limited time now, but let me try to give you some
> comments in the time that I do have.
Thank you very much for taking the time to review this!
> I bet some users would really like a feature that allows them to get
> the plans for their own running queries. They will be sad with the
> limitation of this feature, that it only writes to the logs. We should
> probably just live with that limitation, because doing anything else
> seems a lot more complicated. Still, it's not great.
I agree.
Regarding output in a form other than the logs, I think the idea
proposed in [1] — making it possible to inspect memory contexts of other
backends via a view — would be a useful reference.
> The first of these messages ends with a period, which is contrary to
> style guidelines.
Removed the period.
> All of them do errhidestmt() and errhidecontext(),
> which is an unusual choice. I don't see why it's appropriate here.
> Wanting to see both the query and the query plan seems pretty
> reasonable to me.
Removed errhidestmt() and errhidecontext().
> The first two messages seem fairly unhelpful to me:
> the user isn't going to understand the distinction between those two
> states and it's unclear why we should give them that information. I'm
> not sure if we should log a generic message in these kinds of cases or
> log nothing at all, but I feel like this is too much technical
> information.
I'm not sure if this is the best approach, but I changed them to log
nothing in these cases.
In addition to the reason you mentioned, I found that when
pg_log_query_plan() is executed repeatedly at very short intervals
(e.g., every 0.01 seconds), ereport() could lead to
ProcessLogQueryPlanInterrupt() being invoked again inside errfinish via
CFI(), which resulted in a stack overflow.
While this could be processed with check_stack_depth(), it doesn’t seem
worthwhile to use it just to emit a message of limited usefulness.
Additionally, there are other cases where the plan cannot be logged
(e.g., when the query is in a state which ExecProcNode will not be
invoked any further).
So instead, I added the following note to the documentation:
Note that depending on the execution state of the query,
it may not be possible to log the plan.
> Also, I think that we don't normally put the PID of the
> process that is performing an action in the primary error message,
> because the user can include %p in log_line_prefix if they so desire.
That makes sense, but
I had used the following message from pg_log_backend_memory_contexts()
as a reference
and put the PID in the message:
errmsg("logging memory contexts of PID %d", MyProcPid)));
Since both pg_log_query_plan() and pg_log_backend_memory_contexts()
output information about another other backend, it feels slightly odd
that their log messages would be inconsistent.
Perhaps should we consider changing pg_log_backend_memory_contexts() for
consistency as well?
> But there's another, subtler point here, which is that I feel like "is
> not running a query or a subtransaction is aborted" is pointing to a
> design defect in the patch. When we enter an inner query, we switch
> activeQueryDesc to point to the inner QueryDesc. When we abort a
> subtransaction, instead of resetting it to the outer query's
> QueryDesc, we reset it to NULL. I don't really think that's
> acceptable. Consider this:
>
> SELECT some_slow_function_that_uses_a_subtransaction_which_aborts(g)
> FROM generate_series(1,1000000) g;
>
> What's going to happen here is that for 99.9999% of the execution time
> of this function, you can't print the query plan. And that won't be
> because of any fundamental thing is preventing you from so doing -- it
> will just be because the patch doesn't have the right bookkeeping. If
> you added a QueryDesc * pointer to TransactionStateData, then
> ExecutorRun could (1) save the current value of that pointer from the
> topmost element of the transaction state stack, (2) update the pointer
> value to the new QueryDesc, and (3) put the old pointer back at the
> end.
Thank you for pointing out the issue and suggesting a fix.
Attached patch added a QueryDesc pointer to TransactionStateData and
updated it accordingly.
With this change, while running the query below, pg_log_query_plan() is
now able to output the plan for SELECT gen_subtxn(g) FROM
generate_series(1,1000000) g as below:
(session1)=# CREATE OR REPLACE FUNCTION gen_subtxn(i int)
RETURNS int LANGUAGE plpgsql AS $$
DECLARE
result int;
BEGIN
BEGIN
PERFORM 1 / 0;
EXCEPTION
WHEN division_by_zero THEN
result := -i;
END;
RETURN result;
END;
$$;
=# SELECT gen_subtxn(g) from generate_series(1,1000000) g;
(session2)=# SELECT pg_log_query_plan(pid) FROM pg_stat_activity WHERE
backend_type = 'client backend';
(session2)=# \watch 0.1
(in the log)
LOG: 00000: query and its plan running on the backend are:
Query Text: SELECT gen_subtxn(g) from generate_series(1,1000000) g;
Function Scan on pg_catalog.generate_series g (cost=0.00..260000.00
rows=1000000 width=4)
Output: gen_subtxn(g)
Function Call: generate_series(1, 1000000)
> Then, you wouldn't need any cleanup in AbortSubTransaction, and
> this value would always be right, even after an outer query continues
> after an abort, because the subtransaction abort would pop the
> transaction state stack, leaving the right thing on top.
It seems that simply making the above change is not sufficient.
In particular, when raising an error inside a subtransaction, I
sometimes observed segfaults:
BEGIN;
savepoint x;
CREATE TABLE koju (a INT UNIQUE);
INSERT INTO koju VALUES (1);
INSERT INTO koju VALUES (1); -- key duplicate error
Inspecting the crash showed that the argument *ps to
ExecSetExecProcNodeRecurse() was 0x7f7f7f....
(lldb) f 0
frame #0: 0x000000010102ec40
postgres`ExecSetExecProcNodeRecurse(ps=0x7f7f7f7f7f7f7f7f) at
execProcnode.c:601:30
600 {
-> 601 ExecSetExecProcNode(ps, ps->ExecProcNodeReal);
This happens when the plan-logging function is called after an ERROR is
raised but before the transaction state stack is popped.
So in the attached patch, as in the previous version, I reset QueryDesc
to NULL during that interval.
I have also confirmed that even with this change, the above test (SELECT
gen_subtxn(g) from generate_series(1,1000000) g;) can still have its
plan logged successfully.
> The changes to miscadmin.h delete a blank line. They probably
> shouldn't.
Fixed.
> The only change to pquery.h is to delete a blank line. That hunk needs
> reverting.
Fixed.
> +-- Note that we're using a slightly more complex query here because
> +-- a simple 'SELECT pg_log_query_plan(pg_backend_pid())' would finish
> +-- before it reaches the code path that actually outputs the plan.
> +SELECT result FROM (SELECT pg_log_query_plan(pg_backend_pid()))
> result;
> + result
> +--------
> + (t)
> +(1 row)
>
> I seriously doubt that this will be stable across the entire
> buildfarm. You're going to need a different approach here.
Changed it to the following query:
WITH t AS MATERIALIZED (SELECT pg_log_query_plan(pg_backend_pid()))
SELECT * FROM t;
Also since when the target query is using CTE, pg_log_query_plan() had
few chance to log the plan, attached patch added T_CteScan handling.
> Is there any real reason to rename regress_log_memory to
> regress_log_function, vs. just introducing a separate regress_log_plan
> role?
I don’t have any particular reason to insist on unifying them.
In the attached patch, I created a regress_log_plan role and used that
to run the same test.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Attachment | Content-Type | Size |
---|---|---|
v47-0001-Add-function-to-log-the-plan-of-the-currently-ru.patch | text/x-diff | 34.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2025-09-16 13:34:45 | Re: --with-llvm on 32-bit platforms? |
Previous Message | Mircea Cadariu | 2025-09-16 13:05:45 | Re: vacuumdb --analyze-only does not need to issue VACUUM (ONLY_DATABASE_STATS) ? |