| From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
|---|---|
| To: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
| Cc: | Lukas Fittl <lukas(at)fittl(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Atsushi Torikoshi <torikoshia(dot)tech(at)gmail(dot)com>, samimseih(at)gmail(dot)com, destrex271(at)gmail(dot)com |
| Subject: | Re: RFC: Logging plan of the running query |
| Date: | 2026-07-01 10:42:53 |
| Message-ID: | f9bce9c3-d920-4b2e-965a-a7a019c61011@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 29/06/2026 06:23, torikoshia wrote:
> I think this behavior is specific to pg_log_query_plan(), so I am
> starting to wonder whether something should be mentioned in the
> documentation.
I took a closer look at the solution, and it looks quite good.
1. EXPLAIN needs a more or less consistent execution plan state. And this patch
decides to cut in the executor right before a node should start producing the
next tuple.
I’ve never seen any races or other issues with this approach in Postgres forks,
except that users sometimes felt irritated because a HashJoin gets stuck
rebalancing hash batches and does not respond for a long time, or when an OLAP
query causes a situation close to OOM, and the user can’t get an explain because
of a massive memory operation.
The bad example of such a feature can be found in the AQO project [1]. In this
first attempt, we used the built-in RegisterTimeout() to stop execution and take
an EXPLAIN. And from time to time, it fails due to the executor's inconsistent
state. So, the current way looks right.
2. Adding another event processing type to the ProcessInterrupts() routine is a
brilliant idea. This way, overhead is only added when someone requests an
injection into the execution process.
3. But the idea of the ExecSetExecProcNodeRecurse() seems fragile to me. It
depends on the PlanState tree structure and a choice on the current QueryDesc.
When adding a new node, we will need to remember to update this code, too. I’d
propose introducing a tiny hook (register callback?) into ExecProcNode() and
just setting it in ProcessLogQueryPlanInterrupt. Next call of the ExecProcNode()
will cause the EXPLAIN machinery whenever it happens. It might also simplify
this function's code, since races won't be a danger anymore. Just don't forget
to restore the hook state after the EXPLAIN preparation or top-level query end.
In addition, such ExecProcNode_hook may serve the longstanding desire of
extension developers to look inside the execution of a query: large queries run
for seconds, minutes, or even longer. And it is quite a frequent dilemma: wait a
little more or interrupt and replan the query? Even with this feature being
committed, we still might want to get an EXPLAIN of the top-level query, not a
nested one, that the GetCurrentQueryDesc provides us with. Exactly this way was
implemented in the replan feature [2] - unfortunately, it is still closed source
code.
4. Also, I’m not sure it is safe to store the QueryDesc pointer at all at the
ProcessLogQueryPlanInterrupt. Maybe the next call to ExecProcNode() will be made
from an external query (or from a deeper one). The current implementation will
not produce any EXPLAIN in this case, but query still executes.
I think, combination of a global hook and queryDesc pointer, saved somewhere in
the EState may solve the problem. It is available at each executor's node, and
the EXPLAIN machinery might use an exact descriptor related to the currently
executing query.
5. The phrase ‘only the plan of the most deeply nested query is logged’ is
missed in the docs. So the current decision has not yet been documented.
[1]
https://github.com/postgrespro/aqo/blob/4a7fcd7291a8c4209c9102d1736f9f754d311cf2/postprocessing.c#L621
[2] https://postgrespro.com/docs/enterprise/16/realtime-query-replanning
--
regards, Andrei Lepikhov,
pgEdge
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ranier Vilela | 2026-07-01 10:44:59 | Re: Refactor function pg_get_multixact_stats (src/backend/utils/adt/multixactfuncs.c) |
| Previous Message | Michael Paquier | 2026-07-01 10:31:39 | Re: Fix jsonpath .decimal() to honor silent mode |