| From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
|---|---|
| To: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | Lukas Fittl <lukas(at)fittl(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Atsushi Torikoshi <torikoshia(dot)tech(at)gmail(dot)com>, samimseih(at)gmail(dot)com, destrex271(at)gmail(dot)com |
| Subject: | Re: RFC: Logging plan of the running query |
| Date: | 2026-06-29 04:23:57 |
| Message-ID: | 82f73973f6fbdb1d2ae31e7c40e4a4f1@oss.nttdata.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jun 24, 2026 at 11:35 PM Andrei Lepikhov <lepihov(at)gmail(dot)com>
wrote:
Thanks for your review!
> 2. GetCurrentQueryDesc, SetCurrentQueryDesc - need 'extern' in the .h
> file
> 3. ExplainPrintPlan, ExplainPrintTriggers - have exact duplicates in
> the explain.h
> 4. ExplainStringAssemble() has '0' input for a boolean parameter.
> 5. ExecSetExecProcNodeRecurse: I don't see PlanState::initPlan. Is this
> intentional?
> 6. INJECTION_POINT("log-query-interrupt", ...) - it is called inside a
> signal-safe zone. In tests, the caller might do a lot of things in the
> attached
> routine, causing hidden problems later.
I agree that these points should be addressed.
I'll post an updated patch.
I've also received some off-list feedback, and I'm planning to
incorporate
those comments into the updated patch as well.
On 2026-06-26 21:50, Robert Haas wrote:
> On Thu, Jun 25, 2026 at 6:33 AM Andrei Lepikhov <lepihov(at)gmail(dot)com>
> wrote:
>> Ok, so just mention this behaviour in the documentation - let people
>> know that
>> they should potentially wait for a minute or two more to see the
>> EXPLAIN.
>> Real-life with huge queries and big machines provide us with examples
>> where a
>> backend might delay a response to a signal for quite a substantial
>> time.
>>
>> Sometimes nothing happens after such an async operation, and we need
>> to identify
>> the problem: has the backend stalled, is the ‘logging plan’ algorithm
>> ineffective, or is something else happening?
>
> I don't think we document this in other, similar cases. Many things
> can be delayed if the machine is overloaded, but unless I am missing
> something, having to wait over a minute for this to work would be
> *extremely* unusual and only happen in case of a machine that is
> absolutely being crushed by the load. And in that case everything else
> will be slow, too.
I think a delay caused by signal handling can also occur with the
existing
functions, such as pg_cancel_backend() and
pg_log_backend_memory_contexts().
Since the documentation does not specifically mention such delays for
those
functions, I thought it may not be necessary to document this point for
pg_log_query_plan() either.
On the other hand, since pg_log_query_plan() emits the log output at the
next time ExecProcNodeFirst() is executed, if there is a delay before
ExecProcNodeFirst() is called, there would also be a delay before the
plan is logged.
For example, this can happen with a query such as SELECT pg_sleep(100),
or when executing a time-consuming sort operation like the following:
BEGIN;
DECLARE sort_probe NO SCROLL CURSOR FOR
SELECT g, md5(g::text) AS k
FROM generate_series(1, 100000000) AS g
ORDER BY k;
FETCH 1 FROM sort_probe;
I think this behavior is specific to pg_log_query_plan(), so I am
starting to wonder whether something should be mentioned in the
documentation.
Thanks,
--
Atsushi Torikoshi
Seconded from NTT DATA CORPORATION to SRA OSS K.K.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2026-06-29 04:27:54 | Re: Bug in ALTER SUBSCRIPTION ... SERVER / ... CONNECTION with broken old server |
| Previous Message | Peter Smith | 2026-06-29 04:20:41 | Re: Make the logical replication conflict messages more like each other |