Re: RFC: Logging plan of the running query

From: Rafael Thofehrn Castro <rafaelthca(at)gmail(dot)com>
To: torikoshia(at)oss(dot)nttdata(dot)com
Cc: Étienne BERSAC <etienne(dot)bersac(at)dalibo(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, jtc331(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com
Subject: Re: RFC: Logging plan of the running query
Date: 2023-12-06 23:33:56
Message-ID: CAG0ozMqmjpUG65sDQ+RA41mhr_1+Wf2Kk88V4_3MDCy9Xk4-+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

Last Saturday I submitted a patch to the pgsql-hackers list with the title
"Proposal: In-flight explain logging" with a patch proposing a feature very
similar to the one being worked on in this thread. I should have done a
better
search in the commitfest before implementing something from scratch.

So, as recommended by Ashutosh, I am sending an incremental patch containing
an additional feature I personally think we should include: logging the plan
with instrumentation details if enabled.

When targeting a query with instrumentation PG should log the complete
EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each
node. This gives the user not only the ability to see what the plan is
but also what was executed so far, which is super useful when
troubleshooting queries that never finish.

Considering that the query is in progress the output will include the
statement (never executed) for nodes that weren't touched yet (or may
never be). This feature is already present in the current ExplainNode
implementation.

I added a new statement (in progress) for nodes currently being executed,
ie,
InstrStartNode was called and clock is ticking there.

Back-reading this thread I saw the discussion about extending
pg_log_query_plan
to parallel workers or not. I added it in my incremental patch as the new
capability of logging instrumentation makes it useful to be able to see
what parallel workers are currently doing.

# DEMONSTRATION:

postgres=# select pid, backend_type,pg_log_query_plan(pid)
postgres=# from pg_stat_activity
postgres=# where (backend_type = 'client backend' and pid !=
pg_backend_pid())
postgres=# or backend_type = 'parallel worker';
pid | backend_type | pg_log_query_plan
-------+-----------------+-------------------
33833 | client backend | t
47202 | parallel worker | t
47203 | parallel worker | t
(3 rows)

2023-12-06 23:14:41.756 UTC [33833] LOG: query plan running on backend
with PID 33833 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Gather (cost=70894.63..202643.27 rows=1000000 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20)
(never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667
width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667
width=8) (actual time=2500.914..2625.922 rows=250412 loops=1) (in progress)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=755 read=2175, temp read=1473
written=1860
-> Parallel Seq Scan on public.t1 b
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..20.904
rows=331492 loops=1)
Output: b.c1
Buffers: shared hit=602 read=865
-> Parallel Hash (cost=8591.67..8591.67 rows=416667
width=4) (actual time=1745.107..1745.107 rows=330638 loops=1)
Output: a.c1
Buffers: shared hit=153 read=1310, temp written=868
-> Parallel Seq Scan on public.t2 a
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.042..27.695
rows=330638 loops=1)
Output: a.c1
Buffers: shared hit=153 read=1310
-> Parallel Hash (cost=32202.28..32202.28 rows=416667
width=8) (actual time=2450.489..2450.489 rows=407941 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1141 read=1833, temp read=1938
written=2836
-> Parallel Hash Join (cost=15428.00..32202.28
rows=416667 width=8) (actual time=1323.422..1575.245 rows=407941 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1141 read=1833, temp read=1938
written=1864
-> Parallel Seq Scan on public.t1 c
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.026..22.223
rows=336238 loops=1)
Output: c.c1
Buffers: shared hit=590 read=898
-> Parallel Hash (cost=8591.67..8591.67
rows=416667 width=4) (actual time=653.306..653.306 rows=335836 loops=1)
Output: d.c1
Buffers: shared hit=551 read=935, temp
written=872
-> Parallel Seq Scan on public.t1 d
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..23.127
rows=335836 loops=1)
Output: d.c1
Buffers: shared hit=551 read=935
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4)
(actual time=590.086..590.086 rows=343696 loops=1)
Output: e.c1
Buffers: shared hit=519 read=1002, temp written=896
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67
rows=416667 width=4) (actual time=0.066..21.797 rows=343696 loops=1)
Output: e.c1
Buffers: shared hit=519 read=1002
2023-12-06 23:14:41.757 UTC [47203] LOG: query plan running on backend
with PID 47203 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never
executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16)
(never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667
width=8) (actual time=2464.367..2628.476 rows=258293 loops=1)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=704 read=2213, temp read=1497 written=1860
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67
rows=416667 width=4) (actual time=0.032..22.247 rows=330412 loops=1)
Output: b.c1
Buffers: shared hit=594 read=868
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4)
(actual time=1745.093..1745.093 rows=328830 loops=1)
Output: a.c1
Buffers: shared hit=110 read=1345, temp written=868
-> Parallel Seq Scan on public.t2 a
(cost=0.00..8591.67 rows=416667 width=4) (actual time=1.141..30.128
rows=328830 loops=1)
Output: a.c1
Buffers: shared hit=110 read=1345
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8)
(actual time=2449.694..2449.694 rows=295462 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1161 read=1844, temp read=1971 written=2872
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667
width=8) (actual time=1378.678..1577.182 rows=295462 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1161 read=1844, temp read=1971
written=1856
-> Parallel Seq Scan on public.t1 c
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.031..22.459
rows=336288 loops=1)
Output: c.c1
Buffers: shared hit=590 read=898
-> Parallel Hash (cost=8591.67..8591.67 rows=416667
width=4) (actual time=651.048..651.048 rows=328378 loops=1)
Output: d.c1
Buffers: shared hit=507 read=946, temp written=860
-> Parallel Seq Scan on public.t1 d
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.034..24.462
rows=328378 loops=1)
Output: d.c1
Buffers: shared hit=507 read=946
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual
time=575.417..575.417 rows=327926 loops=1)
Output: e.c1
Buffers: shared hit=522 read=929, temp written=860
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67
rows=416667 width=4) (actual time=0.786..23.114 rows=327926 loops=1)
Output: e.c1
Buffers: shared hit=522 read=929
2023-12-06 23:14:41.758 UTC [47202] LOG: query plan running on backend
with PID 47202 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never
executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16)
(never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667
width=8) (actual time=2500.138..2631.855 rows=254125 loops=1)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=749 read=2254, temp read=1483 written=1896
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67
rows=416667 width=4) (actual time=0.027..22.224 rows=338096 loops=1)
Output: b.c1
Buffers: shared hit=628 read=868
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4)
(actual time=1744.616..1744.616 rows=340532 loops=1)
Output: a.c1
Buffers: shared hit=121 read=1386, temp written=892
-> Parallel Seq Scan on public.t2 a
(cost=0.00..8591.67 rows=416667 width=4) (actual time=1.138..30.003
rows=340532 loops=1)
Output: a.c1
Buffers: shared hit=121 read=1386
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8)
(actual time=2450.470..2450.470 rows=296597 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1186 read=1813, temp read=1610 written=2872
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667
width=8) (actual time=1380.286..1561.412 rows=296597 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1186 read=1813, temp read=1610
written=1848
-> Parallel Seq Scan on public.t1 c
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..22.170
rows=327474 loops=1)
Output: c.c1
Buffers: shared hit=548 read=901
-> Parallel Hash (cost=8591.67..8591.67 rows=416667
width=4) (actual time=653.297..653.297 rows=335786 loops=1)
Output: d.c1
Buffers: shared hit=574 read=912, temp written=868
-> Parallel Seq Scan on public.t1 d
(cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..24.210
rows=335786 loops=1)
Output: d.c1
Buffers: shared hit=574 read=912
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual
time=574.553..574.553 rows=328378 loops=1)
Output: e.c1
Buffers: shared hit=495 read=958, temp written=864
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67
rows=416667 width=4) (actual time=0.717..23.270 rows=328378 loops=1)
Output: e.c1
Buffers: shared hit=495 read=958

The thread of my initial patch contains all important implementation
details. Sharing the ones relevant to my incremental patch here:

- Safely printing signaled plans with instrumentation

A plan string is built in function ExplainNode here (
https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/commands/explain.c#L1178
)
which is called at the end of a query execution when EXPLAIN is used.
That function performs logic using a PlanState (part of QueryDesc) of
the running query and a ExplainState.

The main challenge there is that ExplainNode calls InstrEndLoop which
changes values in Instrumentation. This is ok for a regular EXPLAIN
where the query is already complete but not ok for the new feature with
signaled plan logging.

So the new code has custom logic to clone Instrumentation instance of
the current node. The cloned object can be safely written.

Function InstrEndLoop has a safety rule here (
https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/executor/instrument.c#L148
)
that prevents adjusting instrumentation details in a running node. This
never happens in the current code logic but with the new feature
it will happen very often.

I didn't want to remove this safety rule as InstrEndLoop gets called in
other places too (even in auto_explain) so the solution was to keep
InstrEndLoop and have a new InstrEndLoopForce for the signaled
plan logging with instrumentation. Both InstrEndLoop and InstrEndLoopForce
call a new internal InstrEndLoopInternal to avoid duplicating the code.

- Memory management

Considering that pg_log_query_plan() creates its own memory context before
calling ExplainAssembleLogOutput, the new logic that allocates memory
to clone instrumentation doesn't need to free anything.

# FINAL CONSIDERATIONS

Let me know what you think about this incremental patch. If you think it
is useful but needs adjustments I will be happy to change as needed.

Kind Regards,

Rafael Castro.

Attachment Content-Type Size
v34-0001-Add-function-to-log-the-plan-of-the-query.patch application/octet-stream 30.3 KB
v34-0002-Log-plan-along-with-instrumentation-details.patch application/octet-stream 11.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2023-12-06 23:38:32 Re: Emitting JSON to file using COPY TO
Previous Message Michael Paquier 2023-12-06 23:32:45 Re: PATCH: Add REINDEX tag to event triggers