From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Atsushi Torikoshi <torikoshia(dot)tech(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, samimseih(at)gmail(dot)com, destrex271(at)gmail(dot)com |
Subject: | Re: RFC: Logging plan of the running query |
Date: | 2025-10-01 09:11:23 |
Message-ID: | 7c7ca3bc345b28fa4b903bd49fdce519@oss.nttdata.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2025-09-25 01:34, Robert Haas wrote:
> On Fri, Sep 19, 2025 at 8:42 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> wrote:
>> > Yes, it seems as though we should set a flag to prevent reentrancy
>> > here -- if we are already in the code path where we log the query
>> > plan, we shouldn't accept an interrupt telling us to log the query
>> > plan. I haven't looked at the updated patch so maybe that's not
>> > necessary for some reason, but in general reentrancy is a concern with
>> > features of this type.
>>
>> Agreed. In the attached patch added a flag.
>
> This doesn't look safe. If we error out of the section where the flag
> is set to true, it will remain true forever.
Ugh.. Updated the patch to reset the flag even when an error occurs,
using PG_FINALLY(), similar to [1].
[1]
https://www.postgresql.org/message-id/27da56de-17c4-4af2-8032-3c58ae0c7b00%40oss.nttdata.com
>> So I think it would be fine to remove the PID from
>> pg_log_backend_memory_contexts()’s log message when this patch is
>> merged.
>
> Let's leave the PID in there for now for consistency with the existing
> message. We can discuss changing it at another time.
Agreed.
>> To detect “when the current (sub)transaction is (sub)aborted,”
>> attached
>> patch uses IsTransactionState().
>> With that, the logging code is not executed when the transaction state
>> is anything other than TRANS_INPROGRESS, and in some cases this works
>> as
>> expected.
>> On the other hand, when calling pg_log_query_plan() repeatedly during
>> make installcheck, I still encountered cases where segfaults occurred.
>>
>> Specifically, this seems to happen when, after a (sub)abort, the next
>> query starts in the same session and a CFI() occurs after
>> StartTransaction() has been executed but before Executor() runs. In
>> that case, the transaction state has already been changed to
>> TRANS_INPROGRESS by StartTransaction(), but the QueryDesc from the
>> aborted transaction is still present, which causes the problem.
>>
>> To address this, attached patch initializes QueryDesc within
>> CleanupTransaction().
>> Since this function already resets various members of
>> CurrentTransactionState, I feel it’s not so unreasonable to initialize
>> QueryDesc there as well.
>> Does this approach make sense, or is it problematic?
>
> Hmm, I think it looks quite odd. I believe it's quite intentional that
> the very last thing that CleanupTransaction does is reset s->state, so
> I don't think we should be doing this after that. But also, this leads
> to an odd inconsistency between CleanupTransaction and
> CleanupSubTransaction.
Make sense.
> What I'm wondering is whether we should instead
> reset s->queryDesc inside AbortTransaction() and
> AbortSubTransaction(). Perhaps if we do that, then we don't need an
> InTransactionState() test elsewhere. What do you think?
Resetting QueryDesc in AbortTransaction() and AbortSubTransaction() was
what we did in an earlier version of the patch (before v46), so that
doesn’t feel strange to me.
Updated the patch and confirmed that we can still log the parent’s plan
in cases where subtransactions continuously abort, following the same
steps as [2].
[2]
https://www.postgresql.org/message-id/6143f00af4bfdba95a85cf866f4acb41%40oss.nttdata.com
> \> > > Changed it to the following query:
>> > >
>> > > WITH t AS MATERIALIZED (SELECT pg_log_query_plan(pg_backend_pid()))
>> > > SELECT * FROM t;
>> >
>> > I don't see why that wouldn't have a race condition?
>>
>> The idea is that (1) pg_log_query_plan() in the WITH clause wraps
>> ExecProcNode, and then (2) SELECT * FROM t invokes ExecProcNode, which
>> causes the plan to be logged. I think that’s what currently happens
>> on
>> HEAD.
>> When you mention a "race condition", do you mean that if the timing of
>> the wrapping in step (1) -- that is, when CFI() is called -- changes
>> in
>> the future, then the logging might not work?
>
> I don't think we're guaranteed that the signal is delivered instantly,
> so it seems possible to me that (1) would happen after (2).
True, but I haven’t found a reliable way to trigger actual logging
within a single backend query.
I was also considering using an isolation test and injection points,
like in the attached PoC patch. The main steps are:
In session1, set an injection point to wait during query execution.
In session1, run a query that waits at the injection point.
In session2, call pg_log_query_plan().
In session2, wake up session1.
However, as you pointed out, since the signals are not guaranteed to be
delivered instantly, I’m worried that the query could complete before
the signal is delivered, causing test instability.
Additionally, for a similar function that logs information via signal
delivery, pg_log_backend_memory_contexts(), the existing tests only
verify that the function succeeds and that permission checks work as
described in the comment below; I'm a bit concerned that this might be
overkill.
-- pg_log_backend_memory_contexts()
--
-- Memory contexts are logged and they are not returned to the
function.
-- Furthermore, their contents can vary depending on the timing.
However,
-- we can at least verify that the code doesn't fail, and that the
-- permissions are set properly.
If we align with that test strategy, just executing SELECT
pg_log_query_plan(pg_backend_pid()) and verifying that it succeeds would
be enough.
However, unlike pg_log_backend_memory_contexts(), this function doesn’t
log anything for this query. I’m not sure if that’s acceptable.
I’d appreciate any thoughts or suggestions on what kind of test coverage
would be appropriate here.
>> BTW I haven’t looked into this in detail yet, but I’m a little curious
>> whether the absence of T_CteScan in functions like
>> planstate_tree_walker_impl() is intentional.
>
> I don't think that function needs any special handling for T_CteScan.
> planstate_tree_walker_impl() and other functions need special handling
> for cases where a node has children that are not in the lefttree,
> righttree, initPlan list, or subPlan list; but a CteScan has no extra
> Plan pointer:
>
> typedef struct CteScan
> {
> Scan scan;
> /* ID of init SubPlan for CTE */
> int ctePlanId;
> /* ID of Param representing CTE output */
> int cteParam;
> } CteScan;
Thanks for looking into this! understood.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Attachment | Content-Type | Size |
---|---|---|
v49-0001-Add-function-to-log-the-plan-of-the-currently-ru.patch | text/x-diff | 34.9 KB |
PoC-injection-point-test-for-pg_log_query_plan.txt | text/x-diff | 3.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Anthonin Bonnefoy | 2025-10-01 09:25:00 | Infinite loop in pgbench when running COPY command |
Previous Message | Dmitry Dolgov | 2025-10-01 09:10:28 | Re: Changing shared_buffers without restart |