Re: RFC: Logging plan of the running query

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Étienne BERSAC <etienne(dot)bersac(at)dalibo(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, jtc331(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: RFC: Logging plan of the running query
Date: 2023-11-06 06:32:12
Message-ID: CAExHW5t+xh95_LZjkUZARDbV=gS5pBJghG8x=Xnjb2psWr0R6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 3, 2023 at 7:31 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> I have following questions related to the functionality. (Please point
> me to the relevant discussion if this has been already discussed.)
>
> 1. When a backend is running nested queries, we will see the plan of
> the innermost query. That query may not be the actual culprit if the
> user query is running slowly. E.g a query being run as part of inner
> side nested loop when the nested loop itself is the bottleneck. I
> think it will be useful to print plans of all the whole query stack.

To further explain this point, consider following scenario

-- A completely useless function which executes two SQL statements
which take small amount individually
#create function multstmt() returns int
language sql as $$
select count(*) from pg_class where reltype = 12345;
select count(*) from pg_type limit 10;
$$;

-- force a suboptimal plan
#set enable_hashjoin to false;
#set enable_mergejoin to false;

-- A completely useless but long running query
#select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid)
= multstmt(t.oid);

This take a few minutes on my laptop.

In a separate session query pg_stat_activity. We will see the original query
#select pid, query, backend_type from pg_stat_activity where pid = 349330;
pid | query
| backend_type
--------+-----------------------------------------------------------------------------------------+----------------
349330 | select c.oid, t.oid from pg_class c, pg_type t where
multstmt(c.oid) = multstmt(t.oid); | client backend
(1 row)

Run the plan output function a few times
#select pg_log_query_plan(349330);

You will observe different plans based on which of the innermost query
is runnning
LOG: query plan running on backend with PID 349330 is:
Query Text:
select count(*) from pg_class where reltype = typeid;
select count(*) from pg_type where oid = typeid;

Query Parameters: $1 = '600'
Aggregate (cost=18.18..18.19 rows=1 width=8)
Output: count(*)
-> Seq Scan on pg_catalog.pg_class (cost=0.00..18.18 rows=2 width=0)
Output: oid, relname, relnamespace, reltype,
reloftype, relowner, relam, relfilenode, reltablespace, relpages,
reltuples, relallvisible, reltoastrelid, relhasindex, relisshared,
relpersistence, relkind, relnatts, relchecks, relhasrules,
relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity,
relispopulated, relreplident, relispartition, relrewrite,
relfrozenxid, relminmxid, relacl, reloptions, relpartbound
Filter: (pg_class.reltype = $1)
Settings: enable_hashjoin = 'off', enable_mergejoin = 'off'
2023-11-06 11:52:25.610 IST [349330] LOG: query plan running on
backend with PID 349330 is:
Query Text:
select count(*) from pg_class where reltype = typeid;
select count(*) from pg_type where oid = typeid;

Query Parameters: $1 = '2203'
Aggregate (cost=4.30..4.31 rows=1 width=4)
Output: count(*)
-> Index Only Scan using pg_type_oid_index on
pg_catalog.pg_type (cost=0.28..4.29 rows=1 width=0)
Output: oid
Index Cond: (pg_type.oid = $1)
Settings: enable_hashjoin = 'off', enable_mergejoin = 'off'

Individual pieces are confusing here since the query run by the
backend is not the one being EXPLAINed. A user may not know that the
queries being EXPLAINed arise from the function call multstmt(). So
they won't be able to stitch the pieces together unless they see plan
of the outermost query with loops and costs. What might help if we
explain each query in the hierarchy.

I think we can start with what auto_explain is doing. Always print the
plan of the outermost query; the query found in pg_stat_activity. In a
later version we might find a way to print plans of all the queries in
the stack and do so in a readable manner.

This makes tracking activeQueryDesc a bit tricky. My guess is that the
outermost query's descriptor will be available through ActivePortal
most of the time. But there are cases when ExecutorRun is called by
passing a queryDesc directly. So may be there are some cases where
that's not true.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-11-06 06:35:28 Re: Add recovery to pg_control and remove backup_label
Previous Message Shlok Kyal 2023-11-06 06:19:13 Re: POC: Extension for adding distributed tracing - pg_tracing