Re: bug in update tuple routing with foreign partitions

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug in update tuple routing with foreign partitions
Date: 2019-04-11 05:33:57
Message-ID: 390bfb5d-4b82-18e8-8e42-dca94cd55341@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujita-san,

Thanks for the review.

On 2019/04/10 17:38, Etsuro Fujita wrote:
> (2019/03/06 18:33), Amit Langote wrote:
>> I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
>> to use for partition routing targets.  Specifically, the bug occurs when
>> UPDATE targets include a foreign partition that is locally modified (as
>> opposed to being modified directly on the remove server) and its
>> ResultRelInfo (called subplan result rel in the source code) is reused for
>> tuple routing:
>
>> The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
>> subplan result rel if its ri_FdwState is already set.
>
> I'm not sure that is a good idea, because that requires to create a new
> ResultRelInfo, which is not free; I think it requires a lot of work.

After considering this a bit more and studying your proposed solution, I
tend to agree. Beside the performance considerations you point out that I
too think are valid, I realize that going with my approach would mean
embedding some assumptions in the core code about ri_FdwState; in this
case, assuming that a subplan result rel's ri_FdwState is not to be
re-purposed for tuple routing insert, which is only based on looking at
postgres_fdw's implementation. Not to mention the ensuing complexity in
the core tuple routing code -- it now needs to account for the fact that
not all subplan result rels may have been re-purposed for tuple-routing,
something that's too complicated to handle in PG 11.

IOW, let's call this a postgres_fdw bug and fix it there as you propose.

> Another solution to avoid that would be to store the fmstate created by
> postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
> ri_FdwState, like the attached.  This would not need any changes to the
> core, and this would not cause any overheads either, IIUC.  What do you
> think about that?

+1. Just one comment:

+ /*
+ * If the given resultRelInfo already has PgFdwModifyState set, it means
+ * the foreign table is an UPDATE subplan resultrel; in which case, store
+ * the resulting state into the PgFdwModifyState.
+ */

The last "PgFdwModifyState" should be something else? Maybe,
sub-PgModifyState or sub_fmstate?

>> I've also added a
>> test in postgres_fdw.sql to exercise this test case.
>
> Thanks again, but the test case you added works well without any fix:

Oops, I should really adopt a habit of adding a test case before code.

> +-- Check case where the foreign partition is a subplan target rel and
> +-- foreign parttion is locally modified (target table being joined
> +-- locally prevents a direct/remote modification plan).
> +explain (verbose, costs off)
> +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
> returning *;
> +                                  QUERY PLAN
> +------------------------------------------------------------------------------
>
> + Update on public.utrtest
> +   Output: remp.a, remp.b, "*VALUES*".column1
> +   Foreign Update on public.remp
> +     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING
> a, b
> +   Update on public.locp
> +   ->  Hash Join
> +         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
> +         Hash Cond: (remp.a = "*VALUES*".column1)
> +         ->  Foreign Scan on public.remp
> +               Output: remp.b, remp.ctid, remp.a
> +               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
> +         ->  Hash
> +               Output: "*VALUES*".*, "*VALUES*".column1
> +               ->  Values Scan on "*VALUES*"
> +                     Output: "*VALUES*".*, "*VALUES*".column1
> +   ->  Hash Join
> +         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
> +         Hash Cond: (locp.a = "*VALUES*".column1)
> +         ->  Seq Scan on public.locp
> +               Output: locp.b, locp.ctid, locp.a
> +         ->  Hash
> +               Output: "*VALUES*".*, "*VALUES*".column1
> +               ->  Values Scan on "*VALUES*"
> +                     Output: "*VALUES*".*, "*VALUES*".column1
> +(24 rows)
>
> In this test case, the foreign partition is updated before ri_FdwState is
> overwritten by postgresBeginForeignInsert() that's invoked by the tuple
> routing code, so it would work without any fix.  So I modified this test
> case as such.

Thanks for fixing that. I hadn't noticed that foreign partition remp is
updated before local partition locp; should've added a locp0 for values
(0) so that remp appears second in the ModifyTable's subplans. Or, well,
make the foreign table remp appear later by making it a partition for
values in (3) as your patch does.

BTW, have you noticed that the RETURNING clause returns the same row twice?

+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
returning *;
+ a | b | x
+---+-------------------+---
+ 3 | qux triggered ! | 2
+ 3 | xyzzy triggered ! | 3
+ 3 | qux triggered ! | 3
+(3 rows)

You can see that the row that's moved into remp is returned twice (one
with "qux triggered!" in b column), whereas it should've been only once?
I checked the behavior with moving into a local partition just to be sure
and the changed row is returned only once.

create table p (a int, b text) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
insert into p values (1, 'p1'), (2, 'p2');

select tableoid::regclass, * from p;
tableoid │ a │ b
──────────┼───┼────
p1 │ 1 │ p1
p2 │ 2 │ p2
(2 rows)

update p set a = 2 returning tableoid::regclass, *;
tableoid │ a │ b
──────────┼───┼────
p2 │ 2 │ p1
p2 │ 2 │ p2
(2 rows)

Add a foreign table partition in the mix and that's no longer true.

create extension postgres_fdw ;
create server loopback foreign data wrapper postgres_fdw;
create user mapping for current_user server loopback;
create table p3base (a int check (a = 3), b text);
create foreign table p3 partition of p for values in (3) server loopback
options (table_name 'p3base');
delete from p;
insert into p values (1, 'p1'), (2, 'p2'), (3, 'p3');

select tableoid::regclass, * from p;
tableoid │ a │ b
──────────┼───┼────
p1 │ 1 │ p1
p2 │ 2 │ p2
p3 │ 3 │ p3
(3 rows)

update p set a = 3 returning tableoid::regclass, *;
tableoid │ a │ b
──────────┼───┼────
p3 │ 3 │ p1
p3 │ 3 │ p2
p3 │ 3 │ p3
p3 │ 3 │ p1
p3 │ 3 │ p2
(5 rows)

select tableoid::regclass, * from p;
tableoid │ a │ b
──────────┼───┼────
p3 │ 3 │ p3
p3 │ 3 │ p1
p3 │ 3 │ p2
(3 rows)

I'm not sure if it's a bug that ought to be fixed, but thought I should
highlight it just in case.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2019-04-11 05:40:54 Re: Should we add GUCs to allow partition pruning to be disabled?
Previous Message Amit Langote 2019-04-11 05:09:14 Re: speeding up planning with partitions