Re: BUG #15677: Crash while deleting from partitioned table

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: infernorb(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #15677: Crash while deleting from partitioned table
Date: 2019-03-11 03:02:42
Message-ID: 3ad5ba71-d200-96da-f903-7e3b16416140@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2019/03/08 16:29, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 15677
> Logged by: Norbert Benkocs
> Email address: infernorb(at)gmail(dot)com
> PostgreSQL version: 11.2
> Operating system: CentOS Linux release 7.4.1708 (Core)
> Description:
>
> Version: PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
> 20150623 (Red Hat 4.8.5-36), 64-bit
> OS: CentOS Linux release 7.4.1708 (Core)
>
> Hello,
>
> We have an insert/update/delete query on a partitioned table (multiple
> CTE-s) that causes our PostgreSQL server to crash once every few days. We
> haven't been able to reproduce this crash so far, and re-running the same
> query with the same parameters didn't result in a crash either. The table in
> question is updated thousands of times each hour, and most of these work
> fine.
> Previously this table was not partitioned, we started seeing the crash after
> partitioning the table.

Thanks for the report and for providing detailed information which was
useful for diagnosing the bug.

I looked at this:

> (gdb) bt
> #0 ExecInitModifyTable (node=node(at)entry=0x2568180,
> estate=estate(at)entry=0x35f1440, eflags=eflags(at)entry=0) at
> nodeModifyTable.c:2327
> #1 0x000000000060af88 in ExecInitNode (node=0x2568180,
> estate=estate(at)entry=0x35f1440, eflags=eflags(at)entry=0) at
> execProcnode.c:174
> #2 0x0000000000606fdd in EvalPlanQualStart (epqstate=0x3773848,
> epqstate=0x3773848, planTree=0x36c3f08, parentestate=0xa6) at
> execMain.c:3257

note: ExecInitModifyTable() being called from EvalPlanQualStart().

and:

> (gdb) p *mtstate
> $4 = {ps = {type = T_ModifyTableState, plan = 0x2568180, state = 0x35f1440,
> ExecProcNode = 0x626e30 <ExecModifyTable>, ExecProcNodeReal = 0x0,
> instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual
> = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0,
> chgParam = 0x0, ps_ResultTupleSlot = 0x0, ps_ExprContext = 0x0,
> ps_ProjInfo = 0x0, scandesc = 0x0}, operation = CMD_DELETE, canSetTag =
> false, mt_done = false, mt_plans = 0x39c8088, mt_nplans = 15, mt_whichplan =
> 0, resultRelInfo = 0x35f3f78, rootResultRelInfo = 0xc0, mt_arowmarks =

note: rootResultRelInfo = 0xc0

and:

> (gdb) p *estate
> $7 = {type = T_EState, es_direction = ForwardScanDirection, es_snapshot =
> 0x208ba70, es_crosscheck_snapshot = 0x0, es_range_table = 0x282af48,
> es_plannedstmt = 0x2829e98, es_sourceText = 0x0, es_junkFilter = 0x0,
> es_output_cid = 0, es_result_relations = 0x35f3378, es_num_result_relations
> = 34, es_result_relation_info = 0x0,
> es_root_result_relations = 0x0, es_num_root_result_relations = 0,

note: es_root_result_relations = 0x0

From the above, I could conclude that EvalPlanQualStart() is not copying
the value of es_root_result_relations from the parent EState. That means
ExecInitModifyTable called in the context of EvalPlanQual() checking has
the wrong value of es_root_result_relations to begin with, so the value it
computes for rootResultRelInfo for the ModifyTableState it's initializing
is wrong (0xc0 as seen above).

To reproduce, use these steps (needs 2 sessions to invoke EvalPlanQual at
all):

Setup:

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
insert into p values (1);

Session 1:

begin;
update p set a = a;

Session 2:

with u as (update p set a = a returning p.*) update p set a = u.a from u;
<blocks>

Session 1:
commit;

Session 2:
<invokes-EvalPlanQual-and-crashes>
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

This can be fixed by the attached patch, which modifies EvalPlanQualStart
to copy the value of es_root_result_relations from its parent EState.

Thanks,
Amit

Attachment Content-Type Size
EvalPlanQualStart-bug-partition-resultrel-init.patch text/plain 1.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2019-03-11 03:55:05 Re: BUG #15677: Crash while deleting from partitioned table
Previous Message Amit Langote 2019-03-11 00:32:46 Re: BUG #15684: Server crash on DROP partitioned table