Re: RFC: Logging plan of the running query

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Lukas Fittl <lukas(at)fittl(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Atsushi Torikoshi <torikoshia(dot)tech(at)gmail(dot)com>, samimseih(at)gmail(dot)com, destrex271(at)gmail(dot)com
Subject: Re: RFC: Logging plan of the running query
Date: 2026-06-29 04:23:57
Message-ID: 82f73973f6fbdb1d2ae31e7c40e4a4f1@oss.nttdata.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 24, 2026 at 11:35 PM Andrei Lepikhov <lepihov(at)gmail(dot)com>
wrote:
Thanks for your review!

> 2. GetCurrentQueryDesc, SetCurrentQueryDesc - need 'extern' in the .h
> file
> 3. ExplainPrintPlan, ExplainPrintTriggers - have exact duplicates in
> the explain.h
> 4. ExplainStringAssemble() has '0' input for a boolean parameter.
> 5. ExecSetExecProcNodeRecurse: I don't see PlanState::initPlan. Is this
> intentional?
> 6. INJECTION_POINT("log-query-interrupt", ...) - it is called inside a
> signal-safe zone. In tests, the caller might do a lot of things in the
> attached
> routine, causing hidden problems later.

I agree that these points should be addressed.
I'll post an updated patch.

I've also received some off-list feedback, and I'm planning to
incorporate
those comments into the updated patch as well.

On 2026-06-26 21:50, Robert Haas wrote:
> On Thu, Jun 25, 2026 at 6:33 AM Andrei Lepikhov <lepihov(at)gmail(dot)com>
> wrote:
>> Ok, so just mention this behaviour in the documentation - let people
>> know that
>> they should potentially wait for a minute or two more to see the
>> EXPLAIN.
>> Real-life with huge queries and big machines provide us with examples
>> where a
>> backend might delay a response to a signal for quite a substantial
>> time.
>>
>> Sometimes nothing happens after such an async operation, and we need
>> to identify
>> the problem: has the backend stalled, is the ‘logging plan’ algorithm
>> ineffective, or is something else happening?
>
> I don't think we document this in other, similar cases. Many things
> can be delayed if the machine is overloaded, but unless I am missing
> something, having to wait over a minute for this to work would be
> *extremely* unusual and only happen in case of a machine that is
> absolutely being crushed by the load. And in that case everything else
> will be slow, too.

I think a delay caused by signal handling can also occur with the
existing
functions, such as pg_cancel_backend() and
pg_log_backend_memory_contexts().

Since the documentation does not specifically mention such delays for
those
functions, I thought it may not be necessary to document this point for
pg_log_query_plan() either.

On the other hand, since pg_log_query_plan() emits the log output at the
next time ExecProcNodeFirst() is executed, if there is a delay before
ExecProcNodeFirst() is called, there would also be a delay before the
plan is logged.

For example, this can happen with a query such as SELECT pg_sleep(100),
or when executing a time-consuming sort operation like the following:

BEGIN;
DECLARE sort_probe NO SCROLL CURSOR FOR
SELECT g, md5(g::text) AS k
FROM generate_series(1, 100000000) AS g
ORDER BY k;

FETCH 1 FROM sort_probe;

I think this behavior is specific to pg_log_query_plan(), so I am
starting to wonder whether something should be mentioned in the
documentation.

Thanks,

--
Atsushi Torikoshi
Seconded from NTT DATA CORPORATION to SRA OSS K.K.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-06-29 04:27:54 Re: Bug in ALTER SUBSCRIPTION ... SERVER / ... CONNECTION with broken old server
Previous Message Peter Smith 2026-06-29 04:20:41 Re: Make the logical replication conflict messages more like each other