Re: RFC: Logging plan of the running query

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Étienne BERSAC <etienne(dot)bersac(at)dalibo(dot)com>, jtc331(at)gmail(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, rafaelthca(at)gmail(dot)com, jian(dot)universality(at)gmail(dot)com
Subject: Re: RFC: Logging plan of the running query
Date: 2024-02-15 18:59:11
Message-ID: 20240215185911.v4o6fo444md6a3w7@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-02-15 14:42:11 +0530, Robert Haas wrote:
> I think the issue is very general. We have lots of subsystems that
> both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS().
> If we process an interrupt while that code is in the middle of
> manipulating its global variables and then again re-enter that code,
> bad things might happen. We could try to rule that out by analyzing
> all such subsystems and all places where CHECK_FOR_INTERRUPTS() may
> appear, but that's very difficult. Suppose we took the alternative
> approach recommended by Andrey Lepikhov in
> http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com
> and instead set a flag that gets handled in some suitable place in the
> executor code, like ExecProcNode(). If we did that, then we would know
> that we're not in the middle of a syscache lookup, a catcache lookup,
> a heavyweight lock acquisition, an ereport, or any other low-level
> subsystem call that might create problems for the EXPLAIN code.
>
> In that design, the hack shown above would go away, and we could be
> much more certain that we don't need any other similar hacks for other
> subsystems. The only downside is that we might experience a slightly
> longer delay before a requested EXPLAIN plan actually shows up, but
> that seems like a pretty small price to pay for being able to reason
> about the behavior of the system.

I am very wary of adding overhead to ExecProcNode() - I'm quite sure that
adding code there would trigger visible overhead for query times.

If we went with something like tht approach, I think we'd have to do something
like redirecting node->ExecProcNode to a wrapper, presumably from within a
CFI. That wrapper could then implement the explain support, without slowing
down the normal execution path.

> I don't *think* there are any cases where we run in the executor for a
> particularly long time without a new call to ExecProcNode().

I guess it depends on what you call a long time. A large sort, for example,
could spend a fair amount of time inside tuplesort, similarly, a gather node
might need to wait for a worker for a while etc.

> It's really hard for me to accept that the heavyweight lock problem
> for which the patch contains a workaround is the only one that exists.
> I can't see any reason why that should be true.

I suspect you're right.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maiquel Grassi 2024-02-15 19:47:05 RE: Psql meta-command conninfo+
Previous Message Corey Huinker 2024-02-15 18:41:57 Re: Document efficient self-joins / UPDATE LIMIT techniques.