Re: Auto explain after query timeout

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Gurjeet <singh(dot)gurjeet(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Auto explain after query timeout
Date: 2022-09-20 19:05:59
Message-ID: CA+TgmoYW_rSOW4JMQ9_0Df9PKQ=sQDOKUGA4Gc9D8w4wui8fSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 20, 2022 at 2:35 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> Either I'm missing something (and/or this was fixed in a later PG
> version), but I don't think this is how it works. We have this
> specific problem now: we set auto_explain.log_min_duration to 200 (ms)
> and statement_timeout set to 30s, but when a statement times out we do
> not get the plan logged with auto-explain.

I think you're correct. auto_explain uses the ExecutorEnd hook, but
that hook will not be fired in the case of an error. Indeed, if an
error has already been thrown, it would be unsafe to try to
auto-explain anything. For instance -- and this is just one problem of
probably many -- ExplainTargetRel() performs catalog lookups, which is
not OK in a failed transaction.

To make this work, I think you'd need a hook that fires just BEFORE
the error is thrown. However, previous attempts to introduce hooks
into ProcessInterrupts() have not met with a wam response from Tom, so
it might be a tough sell. And maybe for good reason. I see at least
two problems. The first is that explaining a query is a pretty
complicated operation that involves catalog lookups and lots of
complicated stuff, and I don't think that it would be safe to do all
of that at any arbitrary point in the code where ProcessInterrupts()
happened to fire. What if one of the syscache lookups misses the cache
and we have to open the underlying catalog? Then
AcceptInvalidationMessages() will fire, but we don't currently assume
that any old CHECK_FOR_INTERRUPTS() can process invalidations. What if
the running query has called a user-defined function or procedure
which is running DDL which is in the middle of changing catalog state
for some relation involved in the query at the moment that the
statement timeout arrives? I feel like there are big problems here.

The other problem I see is that ProcessInterrupts() is our mechanism
for allowing people to escape from queries that would otherwise run
forever by hitting ^C. But what if the explain code goes crazy and
itself needs to be interrupted, when we're already inside
ProcessInterrupts()? Maybe that would work out OK... or maybe it
wouldn't. I'm not really sure. But it's another reason to be very,
very cautious about putting complicated logic inside
ProcessInterrupts().

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet 2022-09-20 19:10:07 Re: Auto explain after query timeout
Previous Message Tom Lane 2022-09-20 18:56:47 Re: A question about wording in messages