Re: a misbehavior of partition row movement (?)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Arne Roland <A(dot)Roland(at)index(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a misbehavior of partition row movement (?)
Date: 2021-01-20 10:03:56
Message-ID: CA+HiwqFYaTZL9_950e-8cVL7FUwk5EX_dJVTPRJcjBT=pUVg2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 20, 2021 at 4:13 PM Peter Eisentraut
<peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
> On 2021-01-08 09:54, Amit Langote wrote:
> >>> I don't quite recall if the decision to implement it like this was
> >>> based on assuming that this is what users would like to see happen in
> >>> this case or the perceived difficulty of implementing it the other way
> >>> around, that is, of firing AFTER UPDATE triggers in this case.
> >> I tried to look that up, but I couldn't find any discussion about this. Do you have any ideas in which thread that was handled?
> > It was discussed here:
> >
> > https://www.postgresql.org/message-id/flat/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com
> >
> > It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
> > relevant emails. You might notice that the developers who
> > participated in that discussion gave various opinions and what we have
> > today got there as a result of a majority of them voting for the
> > current approach. Someone also said this during the discussion:
> > "Regarding the trigger issue, I can't claim to have a terribly strong
> > opinion on this. I think that practically anything we do here might
> > upset somebody, but probably any halfway-reasonable thing we choose to
> > do will be OK for most people." So what we've got is that
> > "halfway-reasonable" thing, YMMV. :)
>
> Could you summarize here what you are trying to do with respect to what
> was decided before? I'm a bit confused, looking through the patches you
> have posted. The first patch you posted hard-coded FK trigger OIDs
> specifically, other patches talk about foreign key triggers in general
> or special case internal triggers or talk about all triggers.

The original problem statement is this: the way we generally fire
row-level triggers of a partitioned table can lead to some unexpected
behaviors of the foreign keys pointing to that partitioned table
during its cross-partition updates.

Let's start with an example that shows how triggers are fired during a
cross-partition update:

create table p (a numeric primary key) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig1 before update on p for each row execute function
report_trig_details();
create trigger trig2 after update on p for each row execute function
report_trig_details();
create trigger trig3 before delete on p for each row execute function
report_trig_details();
create trigger trig4 after delete on p for each row execute function
report_trig_details();
create trigger trig5 before insert on p for each row execute function
report_trig_details();
create trigger trig6 after insert on p for each row execute function
report_trig_details();

insert into p values (1);
update p set a = 2;
NOTICE: BEFORE UPDATE on p1
NOTICE: BEFORE DELETE on p1
NOTICE: BEFORE INSERT on p2
NOTICE: AFTER DELETE on p1
NOTICE: AFTER INSERT on p2
UPDATE 1

(AR update triggers are not fired.)

For contrast, here is an intra-partition update:

update p set a = a;
NOTICE: BEFORE UPDATE on p2
NOTICE: AFTER UPDATE on p2
UPDATE 1

Now, the trigger machinery makes no distinction between user-defined
and internal triggers, which has implications for the foreign key
enforcing triggers on partitions. Consider the following example:

create table q (a bigint references p);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
ERROR: update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL: Key (a)=(2) is still referenced from table "q".

So the RI delete trigger (NOT update) on p2 prevents the DELETE that
occurs as part of the row movement. One might make the updates
cascade and expect that to prevent the error:

drop table q;
create table q (a bigint references p on update cascade);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
ERROR: update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL: Key (a)=(2) is still referenced from table "q".

No luck, because again it's the RI delete trigger on p2 that gets
fired. If you make deletes cascade too, an even worse thing happens:

drop table q;
create table q (a bigint references p on update cascade on delete cascade);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
NOTICE: AFTER DELETE on p2
NOTICE: AFTER INSERT on p1
UPDATE 1
select * from q;
a
---
(0 rows)

The RI delete trigger deleted 2 from q, whereas the expected result
would've been for q to be updated to change 2 to 1.

This shows that the way we've made these triggers behave in general
can cause some unintended behaviors for foreign keys during
cross-partition updates. I started this thread to do something about
that and sent a patch to prevent cross-partition updates at all when
there are foreign keys pointing to it. As others pointed out, that's
not a great long-term solution to the problem, but that's what we may
have to do in the back-branches if anything at all.

So I wrote another patch targeting the dev branch to make the
cross-partition updates work while producing a sane foreign key
behavior. The idea of the patch is to tweak the firing of AFTER
triggers such that unintended RI triggers don't get fired, that is,
those corresponding to DELETE and INSERT occurring internally as part
of a cross-partition update. Instead we now fire the AFTER UPDATE
triggers, passing the root table as the target result relation (not
the source partition because the new row doesn't belong to it). This
results in the same foreign key behavior as when no partitioning is
involved at all.

Then, Arne came along and suggested that we do this kind of firing for
*all* triggers, not just the internal RI triggers, or at least that's
what I understood Arne as saying. That however would be changing the
original design of cross-partition updates and will change the
documented user-visible trigger behavior. Changing this for internal
triggers like the patch does changes no user-visible behavior, AFAIK,
other than fixing the foreign key annoyance. So I said if we do want
to go the way that Arne is suggesting, it should be its own discussion
and that's that.

Sorry for a long "summary", but I hope it helps clarify things somewhat.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-01-20 10:17:01 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message Greg Nancarrow 2021-01-20 09:57:41 Re: Parallel INSERT (INTO ... SELECT ...)