Re: Problem with transition tables on partitioned tables with foreign-table partitions

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Problem with transition tables on partitioned tables with foreign-table partitions
Date: 2025-07-02 13:05:09
Message-ID: CA+HiwqGAEP8auzZmP4Q7e9p7uR=J+WqMhyzYyZqO+xLc4s_YUw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 2, 2025 at 7:05 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> On Tue, Jul 1, 2025 at 4:42 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > On Tue, Jul 1, 2025 at 11:55 AM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> > > While working on something else, I noticed that while we disallow
> > > transition tables on foreign tables, we allow transition tables on
> > > partitioned tables with foreign-table partitions, which produces
> > > incorrect results. Here is an example using postgres_fdw:
> > >
> > > create table parent (a text, b int) partition by list (a);
> > > create table loct (a text, b int);
> > > create foreign table child (a text, b int)
> > > server loopback options (table_name 'loct');
> > > alter table parent attach partition child for values in ('AAA');
> > >
> > > create function dump_insert() returns trigger language plpgsql as
> > > $$
> > > begin
> > > raise notice 'trigger = %, new table = %',
> > > TG_NAME,
> > > (select string_agg(new_table::text, ', ' order by a)
> > > from new_table);
> > > return null;
> > > end;
> > > $$;
> > > create trigger parent_insert_trig
> > > after insert on parent referencing new table as new_table
> > > for each statement execute procedure dump_insert();
> > >
> > > create function intercept_insert() returns trigger language plpgsql as
> > > $$
> > > begin
> > > new.b = new.b + 1000;
> > > return new;
> > > end;
> > > $$;
> > > create trigger intercept_insert_loct
> > > before insert on loct
> > > for each row execute procedure intercept_insert();
> > >
> > > insert into parent values ('AAA', 42);
> > > NOTICE: trigger = parent_insert_trig, new table = (AAA,42)
> > > INSERT 0 1
> > >
> > > The trigger shows the original tuple created by the core, not the
> > > actual tuple inserted into the foreign-table partition, as
> > > postgres_fdw does not collect the actual tuple, of course!
> >
> > Maybe I'm missing something, but given that the intercept_insert()
> > function is applied during the "remote" operation, isn't it expected
> > that the parent table's trigger for a "local" operation shows the
> > original tuple?
>
> That is the question of how we define the after image of a row
> inserted into a foreign table, but consider the case where the
> partition is a plain table:
>
> create table parent (a text, b int) partition by list (a);
> create table child partition of parent for values in ('AAA');
> create trigger intercept_insert_child
> before insert on child
> for each row execute procedure intercept_insert();
> insert into parent values ('AAA', 42);
> NOTICE: trigger = parent_insert_trig, new table = (AAA,1042)
> INSERT 0 1
>
> The trigger shows the final tuple, not the original tuple. So from a
> consistency perspective, I thought it would be good if the trigger
> does so even in the case where the partition is a foreign table.

Ok, but if you define the trigger on the foreign table partition
(child) as follows, you do get what I think is the expected result?

create trigger intercept_insert_foreign_child
before insert on child
for each row execute procedure intercept_insert();

insert into parent values ('AAA', 42);
NOTICE: trigger = parent_insert_trig, new table = (AAA,1042)

-- 2042, because row modified by both triggers
table parent;
a | b
-----+------
AAA | 2042
(1 row)

Or perhaps you're saying that the row returned by this line in ExecInsert():

/*
* insert into foreign table: let the FDW do it
*/
slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
resultRelInfo,
slot,
planSlot);

is not the expected "after image", and thus should not be added to the
parent's transition table?

IIUC, to prevent that, we now hit the following error in:

void
ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo,
TupleTableSlot *slot, List *recheckIndexes,
TransitionCaptureState *transition_capture)
{
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;

if (relinfo->ri_FdwRoutine && transition_capture &&
transition_capture->tcs_insert_new_table)
{
Assert(relinfo->ri_RootResultRelInfo);
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot collect transition tuples from child
foreign tables")));
}

> > > So I would
> > > like to propose to fix this by the following: 1) disable using direct
> > > modify to modify foreign-table partitions if there are any
> > > transition-table triggers on the partitioned table, and then 2) throw
> > > an error in ExecARInsertTriggers()/ExecARUpdateTriggers()/ExecARDeleteTriggers()
> > > if they collects transition tuple(s) from a foreign-table partition.
> >
> > Is (2) intended to catch cases that occur during a foreign insert and
> > foreign/non-direct update/delete?
>
> That is right; the patch forces the FDW to perform ExecForeign*
> functions, and then throws an error in ExecAR* functions. One good
> thing about this is that we are able to avoid throwing the error when
> local/remote row-level BEFORE triggers return NULL.

Given my question above, I’m not sure I fully understand the intention
behind adding these checks.

--
Thanks, Amit Langote

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2025-07-02 13:24:49 Re: Don't keep closed WAL segment in page cache after replay
Previous Message Ivan Kush 2025-07-02 12:45:30 Re: [PoC] Federated Authn/z with OAUTHBEARER