Re: RFC: Logging plan of the running query

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: Logging plan of the running query
Date: 2021-09-08 12:06:26
Message-ID: c8b2b9f005e3d7ed51f6d70abfd641c4@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-09-07 12:39, torikoshia wrote:
> On 2021-08-20 01:12, Fujii Masao wrote:
>> On 2021/08/11 21:14, torikoshia wrote:
>>> As far as I looked into, pg_log_current_plan() can call
>>> InstrEndLoop() through ExplainNode().
>>> I added a flag to ExplainState to avoid calling InstrEndLoop() when
>>> ExplainNode() is called from pg_log_current_plan().
>>
>> Thanks for updating the patch!
>> I tried to test the patch again and encountered two issues.
>
> Thanks for finding these issues!
>
>>
>> (1)
>> The following WITH RECURSIVE query failed with the error
>> "ERROR: failed to find plan for CTE sg" when I ran
>> pg_log_current_query_plan() against the backend executing that query.
>> Is this a bug?
>>
>> create table graph0( f int, t int, label text );
>> insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 ->
>> 3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4, 5, 'arc 4 -> 5');
>>
>> with recursive search_graph(f, t, label, i) as (
>> select *, 1||pg_sleep(1)::text from graph0 g
>> union distinct
>> select g.*,1||pg_sleep(1)::text
>> from graph0 g, search_graph sg
>> where g.f = sg.t
>> ) search breadth first by f, t set seq
>> select * from search_graph order by seq;
>
> This ERROR occurred without applying the patch and just calling
> EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST.
>
> I'm going to make another thread to discuss it.
>
>> (2)
>> When I ran pg_log_current_query_plan() while "make installcheck" test
>> was running, I got the following assertion failure.
>>
>> TRAP: FailedAssertion("!IsPageLockHeld || (locktag->locktag_type ==
>> LOCKTAG_RELATION_EXTEND)", File: "lock.c", Line: 894, PID: 61512)
>>
>> 0 postgres 0x000000010ec23557
>> ExceptionalCondition + 231
>> 1 postgres 0x000000010e9eff15
>> LockAcquireExtended + 1461
>> 2 postgres 0x000000010e9ed14d
>> LockRelationOid + 61
>> 3 postgres 0x000000010e41251b
>> relation_open + 91
>> 4 postgres 0x000000010e509679 table_open
>> + 25
>> 5 postgres 0x000000010ebf9462
>> SearchCatCacheMiss + 274
>> 6 postgres 0x000000010ebf5979
>> SearchCatCacheInternal + 761
>> 7 postgres 0x000000010ebf566c
>> SearchCatCache + 60
>> 8 postgres 0x000000010ec1a9e0
>> SearchSysCache + 144
>> 9 postgres 0x000000010ec1ae03
>> SearchSysCacheExists + 51
>> 10 postgres 0x000000010e58ce35
>> TypeIsVisible + 437
>> 11 postgres 0x000000010ea98e4c
>> format_type_extended + 1964
>> 12 postgres 0x000000010ea9900e
>> format_type_with_typemod + 30
>> 13 postgres 0x000000010eb78d76
>> get_const_expr + 742
>> 14 postgres 0x000000010eb79bc8
>> get_rule_expr + 232
>> 15 postgres 0x000000010eb8140f
>> get_func_expr + 1247
>> 16 postgres 0x000000010eb79dcd
>> get_rule_expr + 749
>> 17 postgres 0x000000010eb81688
>> get_rule_expr_paren + 136
>> 18 postgres 0x000000010eb7bf38
>> get_rule_expr + 9304
>> 19 postgres 0x000000010eb72ad5
>> deparse_expression_pretty + 149
>> 20 postgres 0x000000010eb73463
>> deparse_expression + 83
>> 21 postgres 0x000000010e68eaf1
>> show_plan_tlist + 353
>> 22 postgres 0x000000010e68adaf ExplainNode
>> + 4991
>> 23 postgres 0x000000010e688b4b
>> ExplainPrintPlan + 283
>> 24 postgres 0x000000010e68e1aa
>> ProcessLogCurrentPlanInterrupt + 266
>> 25 postgres 0x000000010ea133bb
>> ProcessInterrupts + 3435
>> 26 postgres 0x000000010e738c97
>> vacuum_delay_point + 55
>> 27 postgres 0x000000010e42bb4b
>> ginInsertCleanup + 1531
>> 28 postgres 0x000000010e42d418
>> gin_clean_pending_list + 776
>> 29 postgres 0x000000010e74955a
>> ExecInterpExpr + 2522
>> 30 postgres 0x000000010e7487e2
>> ExecInterpExprStillValid + 82
>> 31 postgres 0x000000010e7ae83b
>> ExecEvalExprSwitchContext + 59
>> 32 postgres 0x000000010e7ae7be ExecProject
>> + 78
>> 33 postgres 0x000000010e7ae4e9 ExecResult
>> + 345
>> 34 postgres 0x000000010e764e42
>> ExecProcNodeFirst + 82
>> 35 postgres 0x000000010e75ccb2
>> ExecProcNode + 50
>> 36 postgres 0x000000010e758301 ExecutePlan
>> + 193
>> 37 postgres 0x000000010e7581d1
>> standard_ExecutorRun + 609
>> 38 auto_explain.so 0x000000010f1df3a7
>> explain_ExecutorRun + 247
>> 39 postgres 0x000000010e757f3b ExecutorRun
>> + 91
>> 40 postgres 0x000000010ea1cb49
>> PortalRunSelect + 313
>> 41 postgres 0x000000010ea1c4dd PortalRun +
>> 861
>> 42 postgres 0x000000010ea17474
>> exec_simple_query + 1540
>> 43 postgres 0x000000010ea164d4
>> PostgresMain + 2580
>> 44 postgres 0x000000010e91d159 BackendRun
>> + 89
>> 45 postgres 0x000000010e91c6a5
>> BackendStartup + 565
>> 46 postgres 0x000000010e91b3fe ServerLoop
>> + 638
>> 47 postgres 0x000000010e918b9d
>> PostmasterMain + 6717
>> 48 postgres 0x000000010e7efd43 main + 819
>> 49 libdyld.dylib 0x00007fff6a46e3d5 start + 1
>> 50 ??? 0x0000000000000003 0x0 + 3
>>
>> LOG: server process (PID 61512) was terminated by signal 6: Abort
>> trap: 6
>> DETAIL: Failed process was running: select
>> gin_clean_pending_list('t_gin_test_tbl_i_j_idx') is not null;
>
> As far as I understand, since explaining plans can acquire heavyweight
> lock for example to get column names, when page lock is held at the
> time of the interrupt, this assertion error occurs.
>
> The attached patch tries to avoid this by checking each LocalLock
> entry and when finding even one, giving up logging the plan.
>
> Thoughts?

Regression tests failed on windows.
Updated patch attached.

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment Content-Type Size
v10-0001-log-running-query-plan.patch text/x-diff 25.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-09-08 12:16:37 Re: row filtering for logical replication
Previous Message vignesh C 2021-09-08 11:45:50 Re: Added schema level support for publication.