Re: RFC: Logging plan of the running query

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: Logging plan of the running query
Date: 2021-09-07 03:39:46
Message-ID: 3d345c49561f31a1d8ff60ea86f88840@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-08-20 01:12, Fujii Masao wrote:
> On 2021/08/11 21:14, torikoshia wrote:
>> As far as I looked into, pg_log_current_plan() can call InstrEndLoop()
>> through ExplainNode().
>> I added a flag to ExplainState to avoid calling InstrEndLoop() when
>> ExplainNode() is called from pg_log_current_plan().
>
> Thanks for updating the patch!
> I tried to test the patch again and encountered two issues.

Thanks for finding these issues!

>
> (1)
> The following WITH RECURSIVE query failed with the error
> "ERROR: failed to find plan for CTE sg" when I ran
> pg_log_current_query_plan() against the backend executing that query.
> Is this a bug?
>
> create table graph0( f int, t int, label text );
> insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 ->
> 3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4, 5, 'arc 4 -> 5');
>
> with recursive search_graph(f, t, label, i) as (
> select *, 1||pg_sleep(1)::text from graph0 g
> union distinct
> select g.*,1||pg_sleep(1)::text
> from graph0 g, search_graph sg
> where g.f = sg.t
> ) search breadth first by f, t set seq
> select * from search_graph order by seq;

This ERROR occurred without applying the patch and just calling
EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST.

I'm going to make another thread to discuss it.

> (2)
> When I ran pg_log_current_query_plan() while "make installcheck" test
> was running, I got the following assertion failure.
>
> TRAP: FailedAssertion("!IsPageLockHeld || (locktag->locktag_type ==
> LOCKTAG_RELATION_EXTEND)", File: "lock.c", Line: 894, PID: 61512)
>
> 0 postgres 0x000000010ec23557
> ExceptionalCondition + 231
> 1 postgres 0x000000010e9eff15
> LockAcquireExtended + 1461
> 2 postgres 0x000000010e9ed14d
> LockRelationOid + 61
> 3 postgres 0x000000010e41251b
> relation_open + 91
> 4 postgres 0x000000010e509679 table_open +
> 25
> 5 postgres 0x000000010ebf9462
> SearchCatCacheMiss + 274
> 6 postgres 0x000000010ebf5979
> SearchCatCacheInternal + 761
> 7 postgres 0x000000010ebf566c
> SearchCatCache + 60
> 8 postgres 0x000000010ec1a9e0
> SearchSysCache + 144
> 9 postgres 0x000000010ec1ae03
> SearchSysCacheExists + 51
> 10 postgres 0x000000010e58ce35
> TypeIsVisible + 437
> 11 postgres 0x000000010ea98e4c
> format_type_extended + 1964
> 12 postgres 0x000000010ea9900e
> format_type_with_typemod + 30
> 13 postgres 0x000000010eb78d76
> get_const_expr + 742
> 14 postgres 0x000000010eb79bc8
> get_rule_expr + 232
> 15 postgres 0x000000010eb8140f
> get_func_expr + 1247
> 16 postgres 0x000000010eb79dcd
> get_rule_expr + 749
> 17 postgres 0x000000010eb81688
> get_rule_expr_paren + 136
> 18 postgres 0x000000010eb7bf38
> get_rule_expr + 9304
> 19 postgres 0x000000010eb72ad5
> deparse_expression_pretty + 149
> 20 postgres 0x000000010eb73463
> deparse_expression + 83
> 21 postgres 0x000000010e68eaf1
> show_plan_tlist + 353
> 22 postgres 0x000000010e68adaf ExplainNode
> + 4991
> 23 postgres 0x000000010e688b4b
> ExplainPrintPlan + 283
> 24 postgres 0x000000010e68e1aa
> ProcessLogCurrentPlanInterrupt + 266
> 25 postgres 0x000000010ea133bb
> ProcessInterrupts + 3435
> 26 postgres 0x000000010e738c97
> vacuum_delay_point + 55
> 27 postgres 0x000000010e42bb4b
> ginInsertCleanup + 1531
> 28 postgres 0x000000010e42d418
> gin_clean_pending_list + 776
> 29 postgres 0x000000010e74955a
> ExecInterpExpr + 2522
> 30 postgres 0x000000010e7487e2
> ExecInterpExprStillValid + 82
> 31 postgres 0x000000010e7ae83b
> ExecEvalExprSwitchContext + 59
> 32 postgres 0x000000010e7ae7be ExecProject
> + 78
> 33 postgres 0x000000010e7ae4e9 ExecResult +
> 345
> 34 postgres 0x000000010e764e42
> ExecProcNodeFirst + 82
> 35 postgres 0x000000010e75ccb2 ExecProcNode
> + 50
> 36 postgres 0x000000010e758301 ExecutePlan
> + 193
> 37 postgres 0x000000010e7581d1
> standard_ExecutorRun + 609
> 38 auto_explain.so 0x000000010f1df3a7
> explain_ExecutorRun + 247
> 39 postgres 0x000000010e757f3b ExecutorRun
> + 91
> 40 postgres 0x000000010ea1cb49
> PortalRunSelect + 313
> 41 postgres 0x000000010ea1c4dd PortalRun +
> 861
> 42 postgres 0x000000010ea17474
> exec_simple_query + 1540
> 43 postgres 0x000000010ea164d4 PostgresMain
> + 2580
> 44 postgres 0x000000010e91d159 BackendRun +
> 89
> 45 postgres 0x000000010e91c6a5
> BackendStartup + 565
> 46 postgres 0x000000010e91b3fe ServerLoop +
> 638
> 47 postgres 0x000000010e918b9d
> PostmasterMain + 6717
> 48 postgres 0x000000010e7efd43 main + 819
> 49 libdyld.dylib 0x00007fff6a46e3d5 start + 1
> 50 ??? 0x0000000000000003 0x0 + 3
>
> LOG: server process (PID 61512) was terminated by signal 6: Abort
> trap: 6
> DETAIL: Failed process was running: select
> gin_clean_pending_list('t_gin_test_tbl_i_j_idx') is not null;

As far as I understand, since explaining plans can acquire heavyweight
lock for example to get column names, when page lock is held at the time
of the interrupt, this assertion error occurs.

The attached patch tries to avoid this by checking each LocalLock entry
and when finding even one, giving up logging the plan.

Thoughts?

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment Content-Type Size
v9-0001-log-running-query-plan.patch text/x-diff 24.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2021-09-07 03:41:02 EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails
Previous Message Fujii Masao 2021-09-07 03:30:00 Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)