From: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: bug in update tuple routing with foreign partitions |
Date: | 2019-04-11 05:33:57 |
Message-ID: | 390bfb5d-4b82-18e8-8e42-dca94cd55341@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Fujita-san,
Thanks for the review.
On 2019/04/10 17:38, Etsuro Fujita wrote:
> (2019/03/06 18:33), Amit Langote wrote:
>> I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
>> to use for partition routing targets. Specifically, the bug occurs when
>> UPDATE targets include a foreign partition that is locally modified (as
>> opposed to being modified directly on the remove server) and its
>> ResultRelInfo (called subplan result rel in the source code) is reused for
>> tuple routing:
>
>> The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
>> subplan result rel if its ri_FdwState is already set.
>
> I'm not sure that is a good idea, because that requires to create a new
> ResultRelInfo, which is not free; I think it requires a lot of work.
After considering this a bit more and studying your proposed solution, I
tend to agree. Beside the performance considerations you point out that I
too think are valid, I realize that going with my approach would mean
embedding some assumptions in the core code about ri_FdwState; in this
case, assuming that a subplan result rel's ri_FdwState is not to be
re-purposed for tuple routing insert, which is only based on looking at
postgres_fdw's implementation. Not to mention the ensuing complexity in
the core tuple routing code -- it now needs to account for the fact that
not all subplan result rels may have been re-purposed for tuple-routing,
something that's too complicated to handle in PG 11.
IOW, let's call this a postgres_fdw bug and fix it there as you propose.
> Another solution to avoid that would be to store the fmstate created by
> postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
> ri_FdwState, like the attached. This would not need any changes to the
> core, and this would not cause any overheads either, IIUC. What do you
> think about that?
+1. Just one comment:
+ /*
+ * If the given resultRelInfo already has PgFdwModifyState set, it means
+ * the foreign table is an UPDATE subplan resultrel; in which case, store
+ * the resulting state into the PgFdwModifyState.
+ */
The last "PgFdwModifyState" should be something else? Maybe,
sub-PgModifyState or sub_fmstate?
>> I've also added a
>> test in postgres_fdw.sql to exercise this test case.
>
> Thanks again, but the test case you added works well without any fix:
Oops, I should really adopt a habit of adding a test case before code.
> +-- Check case where the foreign partition is a subplan target rel and
> +-- foreign parttion is locally modified (target table being joined
> +-- locally prevents a direct/remote modification plan).
> +explain (verbose, costs off)
> +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
> returning *;
> + QUERY PLAN
> +------------------------------------------------------------------------------
>
> + Update on public.utrtest
> + Output: remp.a, remp.b, "*VALUES*".column1
> + Foreign Update on public.remp
> + Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING
> a, b
> + Update on public.locp
> + -> Hash Join
> + Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
> + Hash Cond: (remp.a = "*VALUES*".column1)
> + -> Foreign Scan on public.remp
> + Output: remp.b, remp.ctid, remp.a
> + Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
> + -> Hash
> + Output: "*VALUES*".*, "*VALUES*".column1
> + -> Values Scan on "*VALUES*"
> + Output: "*VALUES*".*, "*VALUES*".column1
> + -> Hash Join
> + Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
> + Hash Cond: (locp.a = "*VALUES*".column1)
> + -> Seq Scan on public.locp
> + Output: locp.b, locp.ctid, locp.a
> + -> Hash
> + Output: "*VALUES*".*, "*VALUES*".column1
> + -> Values Scan on "*VALUES*"
> + Output: "*VALUES*".*, "*VALUES*".column1
> +(24 rows)
>
> In this test case, the foreign partition is updated before ri_FdwState is
> overwritten by postgresBeginForeignInsert() that's invoked by the tuple
> routing code, so it would work without any fix. So I modified this test
> case as such.
Thanks for fixing that. I hadn't noticed that foreign partition remp is
updated before local partition locp; should've added a locp0 for values
(0) so that remp appears second in the ModifyTable's subplans. Or, well,
make the foreign table remp appear later by making it a partition for
values in (3) as your patch does.
BTW, have you noticed that the RETURNING clause returns the same row twice?
+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
returning *;
+ a | b | x
+---+-------------------+---
+ 3 | qux triggered ! | 2
+ 3 | xyzzy triggered ! | 3
+ 3 | qux triggered ! | 3
+(3 rows)
You can see that the row that's moved into remp is returned twice (one
with "qux triggered!" in b column), whereas it should've been only once?
I checked the behavior with moving into a local partition just to be sure
and the changed row is returned only once.
create table p (a int, b text) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
insert into p values (1, 'p1'), (2, 'p2');
select tableoid::regclass, * from p;
tableoid │ a │ b
──────────┼───┼────
p1 │ 1 │ p1
p2 │ 2 │ p2
(2 rows)
update p set a = 2 returning tableoid::regclass, *;
tableoid │ a │ b
──────────┼───┼────
p2 │ 2 │ p1
p2 │ 2 │ p2
(2 rows)
Add a foreign table partition in the mix and that's no longer true.
create extension postgres_fdw ;
create server loopback foreign data wrapper postgres_fdw;
create user mapping for current_user server loopback;
create table p3base (a int check (a = 3), b text);
create foreign table p3 partition of p for values in (3) server loopback
options (table_name 'p3base');
delete from p;
insert into p values (1, 'p1'), (2, 'p2'), (3, 'p3');
select tableoid::regclass, * from p;
tableoid │ a │ b
──────────┼───┼────
p1 │ 1 │ p1
p2 │ 2 │ p2
p3 │ 3 │ p3
(3 rows)
update p set a = 3 returning tableoid::regclass, *;
tableoid │ a │ b
──────────┼───┼────
p3 │ 3 │ p1
p3 │ 3 │ p2
p3 │ 3 │ p3
p3 │ 3 │ p1
p3 │ 3 │ p2
(5 rows)
select tableoid::regclass, * from p;
tableoid │ a │ b
──────────┼───┼────
p3 │ 3 │ p3
p3 │ 3 │ p1
p3 │ 3 │ p2
(3 rows)
I'm not sure if it's a bug that ought to be fixed, but thought I should
highlight it just in case.
Thanks,
Amit
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2019-04-11 05:40:54 | Re: Should we add GUCs to allow partition pruning to be disabled? |
Previous Message | Amit Langote | 2019-04-11 05:09:14 | Re: speeding up planning with partitions |