Re: [BUG] segfault during delete

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG] segfault during delete
Date: 2021-02-24 09:11:44
Message-ID: 1240b37c-6515-07c6-3779-cdb33d6c4701@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit,

On 2/24/21 9:12 AM, Amit Langote wrote:
> Hi Bertrand,
>
> On Wed, Feb 24, 2021 at 5:56 AM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
>> Hi hackers,
>>
>> Here is a scenario that segfault during delete (with version >= 12):
>>
>> create table parent (
>> col1 text primary key
>> );
>>
>> create table child (
>> col1 text primary key,
>> FOREIGN KEY (col1) REFERENCES parent(col1) on delete cascade
>> );
>>
>> CREATE or replace FUNCTION trigger_function()
>> RETURNS TRIGGER
>> LANGUAGE PLPGSQL
>> AS $$
>> BEGIN
>> raise notice 'trigger = %, old table = %',
>> TG_NAME,
>> (select string_agg(old_table::text, ', ' order by col1) from old_table);
>> return NULL;
>> END;
>> $$
>> ;
>>
>> create trigger bdt_trigger after delete on child REFERENCING OLD TABLE AS old_table for each statement EXECUTE function trigger_function();
>>
>> alter table child add column col2 text not null default 'tutu';
>> insert into parent(col1) values ('1');
>> insert into child(col1) values ('1');
>> insert into parent(col1) values ('2');
>> insert into child(col1) values ('2');
>> delete from parent;
>>
>> produces:
>>
>> CREATE TABLE
>> CREATE TABLE
>> CREATE FUNCTION
>> CREATE TRIGGER
>> ALTER TABLE
>> INSERT 0 1
>> INSERT 0 1
>> INSERT 0 1
>> INSERT 0 1
>> psql:./segfault.repro.sql:35: server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> psql:./segfault.repro.sql:35: fatal: connection to server was lost
>>
>> The column being added to the child relation needs to have a default value for the segfault to be triggered.
>>
>> The stack is the following:
>>
>> #0 0x000000000049aa59 in execute_attr_map_slot (attrMap=0x11736c8, in_slot=0x1179c90, out_slot=0x1179da8) at tupconvert.c:193
>> #1 0x0000000000700ec0 in AfterTriggerSaveEvent (estate=0x1171820, relinfo=0x1171cc8, event=1, row_trigger=true, oldslot=0x1179c90, newslot=0x0, recheckIndexes=0x0, modifiedCols=0x0, transition_capture=0x1172438) at trigger.c:5488
>> #2 0x00000000006fc6a8 in ExecARDeleteTriggers (estate=0x1171820, relinfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, fdw_trigtuple=0x0, transition_capture=0x1172438) at trigger.c:2565
>> #3 0x0000000000770794 in ExecDelete (mtstate=0x1171a90, resultRelInfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, oldtuple=0x0, planSlot=0x1173630, epqstate=0x1171b88, estate=0x1171820, processReturning=true, canSetTag=true, changingPart=false, tupleDeleted=0x0, epqreturnslot=0x0)
>> at nodeModifyTable.c:1128
>> #4 0x00000000007724cc in ExecModifyTable (pstate=0x1171a90) at nodeModifyTable.c:2259
>>
>> I had a look at it, and it looks to me that the slot being reused in AfterTriggerSaveEvent() (when (map != NULL) and not (!storeslot)) has previously been damaged by FreeExecutorState() in standard_ExecutorEnd().
> Right.

Thanks for reviewing the issue and the patch!

>
>> Then I ended up with the enclosed patch proposal that does not make the repro segfaulting and that is passing make check too.
>>
>> Not sure that what is being done in the patch is the right/best approach to solve the issue though.
> Hmm, I don't think we should be *freshly* allocating the
> TupleTableSlot every time. Not having to do that is why the commit
> ff11e7f4b9ae0 seems to have invented AfterTriggersTableData.storeslot
> in the first place, that is, to cache the slot once created.
Oh, ok thanks for the explanation.
>
> Problem with the current way as you've discovered is that the slot
> gets allocated in the execution-span memory context, whereas the
> AfterTriggersTableData instance, of which the slot is a part, is
> supposed to last the entire (sub-)transaction.
Right.
> So the correct fix I
> think is to allocate the slot to be stored in
> AfterTriggersTableData.storeslot in the transaction-span memory
> context as well.
Right, that makes more sense that what my patch was doing.
> Having taken care of that, another problem with the current way is
> that it adds the slot to es_tupleTable by calling
> ExecAllocTupleSlot(), which means that the slot will be released when
> ExecResetTupleTable() is called on es_tupleTable as part of
> ExecEndPlan(). That would defeat the point of allocating the slot in
> the transaction-span context. So let's use MakeSingleTableSlot() to
> allocate a standalone slot to be stored in
> AfterTriggersTableData.storeslot and have AfterTriggerFreeQuery() call
> ExecDropSingleTupleTableSlot() to release it.
>
+1
>> PS: the same repro with a foreign key with update cascade, an update trigger and an update on the col1 column would segfault too (but does not with the enclosed patch proposal).
> Actually, we also need to fix similar code in the block for populating
> the transition NEW TABLE, because without doing so that block too can
> crash similarly:
>
> if (!TupIsNull(newslot) &&
> ((event == TRIGGER_EVENT_INSERT && insert_new_table) ||
> (event == TRIGGER_EVENT_UPDATE && update_new_table)))
>
> I've attached a patch with my suggested fixes and also test cases.
> Please take a look.

I had a look and it looks good to me. Also the new regression tests are
doing it right and are segfaulting without your patch.

That's all good to me.

That fix should be back ported to 12 and 13, do you agree?

Thanks a lot for your help and explanations!

Bertrand

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-24 09:38:50 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Amit Kapila 2021-02-24 09:03:46 Re: Parallel INSERT (INTO ... SELECT ...)