Re: a misbehavior of partition row movement (?)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Arne Roland <A(dot)Roland(at)index(dot)de>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: a misbehavior of partition row movement (?)
Date: 2021-12-20 13:10:30
Message-ID: CA+HiwqEi9S2_1qcG-v98jPEEoNhs+Goonfz+Cjb4nqwf4CkcpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 14, 2021 at 6:00 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Mon, Sep 20, 2021 at 3:32 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > The problem was that the tuplestore
> > (afterTriggers.query_stack[query_level].tuplestore) that I decided to
> > use to store the AFTER trigger tuples of a partitioned table that is
> > the target of an cross-partition update lives only for the duration of
> > a given query. So that approach wouldn't work if the foreign key
> > pointing into that partitioned table is marked INITIALLY DEFERRED. To
> > fix, I added a List field to AfterTriggersData that stores the
> > tuplestores to store the tuples of partitioned tables that undergo
> > cross-partition updates in a transaction and are pointed to by
> > INITIALLY DEFERRED foreign key constraints. I couldn't understand one
> > comment about why using a tuplestore for such cases *might not* work,
> > which as follows:
> >
> > * Note that we need triggers on foreign tables to be fired in exactly the
> > * order they were queued, so that the tuples come out of the tuplestore in
> > * the right order. To ensure that, we forbid deferrable (constraint)
> > * triggers on foreign tables. This also ensures that such triggers do not
> > * get deferred into outer trigger query levels, meaning that it's okay to
> > * destroy the tuplestore at the end of the query level.
> >
> > I tried to break the approach using various test cases (some can be
> > seen added by the patch to foreign_key.sql), but couldn't see the
> > issue alluded to in the above comment. So I've marked the comment
> > with an XXX note as follows:
> >
> > - * Note that we need triggers on foreign tables to be fired in exactly the
> > - * order they were queued, so that the tuples come out of the tuplestore in
> > - * the right order. To ensure that, we forbid deferrable (constraint)
> > - * triggers on foreign tables. This also ensures that such triggers do not
> > - * get deferred into outer trigger query levels, meaning that it's okay to
> > - * destroy the tuplestore at the end of the query level.
> > + * Note that we need triggers on foreign and partitioned tables to be fired in
> > + * exactly the order they were queued, so that the tuples come out of the
> > + * tuplestore in the right order. To ensure that, we forbid deferrable
> > + * (constraint) triggers on foreign tables. This also ensures that such
> > + * triggers do not get deferred into outer trigger query levels, meaning that
> > + * it's okay to destroy the tuplestore at the end of the query level.
> > + * XXX - update this paragraph if the new approach, whereby tuplestores in
> > + * afterTriggers.deferred_tuplestores outlive any given query, can be proven
> > + * to not really break any assumptions mentioned here.
> >
> > If anyone reading this can think of the issue the original comment
> > seems to be talking about, please let me know.
>
> I brought this up in an off-list discussion with Robert and he asked
> why I hadn't considered not using tuplestores to remember the tuples
> in the first place, specifically pointing out that it may lead to
> unnecessarily consuming a lot of memory when such updates move a bunch
> of rows around. Like, we could simply remember the tuples to be
> passed to the trigger function using their CTIDs as is done for normal
> (single-heap-relation) updates, though in this case also remember the
> OIDs of the source and the destination partitions of a particular
> cross-partition update.
>
> I had considered that idea before but I think I had overestimated the
> complexity of that approach so didn't go with it. I tried and the
> resulting patch doesn't look all that complicated, with the added
> bonus that the bulk update case shouldn't consume so much memory with
> this approach like the previous tuplestore version would have.
>
> Updated patches attached.

Patch 0001 conflicted with some pg_dump changes that were recently
committed, so rebased.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v11-0002-Enforce-foreign-key-correctly-during-cross-parti.patch application/octet-stream 58.0 KB
v11-0001-Create-foreign-key-triggers-in-partitioned-table.patch application/octet-stream 41.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-12-20 13:17:16 Re: simplifying foreign key/RI checks
Previous Message Masahiko Sawada 2021-12-20 12:59:15 Re: parallel vacuum comments