Re: bug in update tuple routing with foreign partitions

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug in update tuple routing with foreign partitions
Date: 2019-04-10 08:38:14
Message-ID: 5CADAB76.9060506@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2019/03/06 18:33), Amit Langote wrote:
> I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
> to use for partition routing targets. Specifically, the bug occurs when
> UPDATE targets include a foreign partition that is locally modified (as
> opposed to being modified directly on the remove server) and its
> ResultRelInfo (called subplan result rel in the source code) is reused for
> tuple routing:
>
> -- setup
> create extension postgres_fdw ;
> create server loopback foreign data wrapper postgres_fdw;
> create user mapping for current_user server loopback;
> create table p (a int) partition by list (a);
> create table p1 partition of p for values in (1);
> create table p2base (a int check (a = 2));
> create foreign table p2 partition of p for values in (2) server loopback
> options (table_name 'p2base');
> insert into p values (1), (2);
>
> -- see in the plan that foreign partition p2 is locally modified
> explain verbose update p set a = 2 from (values (1), (2)) s(x) where a =
> s.x returning *;
> QUERY PLAN
>
> ─────────────────────────────────────────────────────────────────────────────────
> Update on public.p (cost=0.05..236.97 rows=50 width=42)
> Output: p1.a, "*VALUES*".column1
> Update on public.p1
> Foreign Update on public.p2
> Remote SQL: UPDATE public.p2base SET a = $2 WHERE ctid = $1 RETURNING a
> -> Hash Join (cost=0.05..45.37 rows=26 width=42)
> Output: 2, p1.ctid, "*VALUES*".*, "*VALUES*".column1
> Hash Cond: (p1.a = "*VALUES*".column1)
> -> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10)
> Output: p1.ctid, p1.a
> -> Hash (cost=0.03..0.03 rows=2 width=32)
> Output: "*VALUES*".*, "*VALUES*".column1
> -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
> width=32)
> Output: "*VALUES*".*, "*VALUES*".column1
> -> Hash Join (cost=100.05..191.59 rows=24 width=42)
> Output: 2, p2.ctid, "*VALUES*".*, "*VALUES*".column1
> Hash Cond: (p2.a = "*VALUES*".column1)
> -> Foreign Scan on public.p2 (cost=100.00..182.27 rows=2409
> width=10)
> Output: p2.ctid, p2.a
> Remote SQL: SELECT a, ctid FROM public.p2base FOR UPDATE
> -> Hash (cost=0.03..0.03 rows=2 width=32)
> Output: "*VALUES*".*, "*VALUES*".column1
> -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
> width=32)
> Output: "*VALUES*".*, "*VALUES*".column1
>
>
> -- as opposed to directly on remote side (because there's no local join)
> explain verbose update p set a = 2 returning *;
> QUERY PLAN
> ─────────────────────────────────────────────────────────────────────────────
> Update on public.p (cost=0.00..227.40 rows=5280 width=10)
> Output: p1.a
> Update on public.p1
> Foreign Update on public.p2
> -> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10)
> Output: 2, p1.ctid
> -> Foreign Update on public.p2 (cost=100.00..191.90 rows=2730 width=10)
> Remote SQL: UPDATE public.p2base SET a = 2 RETURNING a
> (8 rows)
>
> Running the first update query crashes:
>
> update p set a = 2 from (values (1), (2)) s(x) where a = s.x returning
> tableoid::regclass, *;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> The problem seems to occur because ExecSetupPartitionTupleRouting thinks
> it can reuse p2's ResultRelInfo for tuple routing. In this case, it can't
> be used, because its ri_FdwState contains information that will be needed
> for postgresExecForeignUpdate to work, but it's still used today. Because
> it's assigned to be used for tuple routing, its ri_FdwState will be
> overwritten by postgresBeginForeignInsert that's invoked by the tuple
> routing code using the aforementioned ResultRelInfo. Crash occurs when
> postgresExecForeignUpdate () can't find the information it's expecting in
> the ri_FdwState.

Agreed, as I said before in another thread.

> The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
> subplan result rel if its ri_FdwState is already set.

I'm not sure that is a good idea, because that requires to create a new
ResultRelInfo, which is not free; I think it requires a lot of work.
Another solution to avoid that would be to store the fmstate created by
postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
ri_FdwState, like the attached. This would not need any changes to the
core, and this would not cause any overheads either, IIUC. What do you
think about that?

> I have attached 2 patches, one for PG 11 where the problem first appeared
> and another for HEAD. The patch for PG 11 is significantly bigger due to
> having to handle the complexities of mapping of subplan result rel indexes
> to leaf partition indexes. Most of that complexity is gone in HEAD due to
> 3f2393edefa5, so the patch for HEAD is much simpler.

Thanks for the patches!

> I've also added a
> test in postgres_fdw.sql to exercise this test case.

Thanks again, but the test case you added works well without any fix:

+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
returning *;
+ QUERY PLAN

+------------------------------------------------------------------------------
+ Update on public.utrtest
+ Output: remp.a, remp.b, "*VALUES*".column1
+ Foreign Update on public.remp
+ Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1
RETURNING a, b
+ Update on public.locp
+ -> Hash Join
+ Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+ Hash Cond: (remp.a = "*VALUES*".column1)
+ -> Foreign Scan on public.remp
+ Output: remp.b, remp.ctid, remp.a
+ Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+ -> Hash
+ Output: "*VALUES*".*, "*VALUES*".column1
+ -> Values Scan on "*VALUES*"
+ Output: "*VALUES*".*, "*VALUES*".column1
+ -> Hash Join
+ Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+ Hash Cond: (locp.a = "*VALUES*".column1)
+ -> Seq Scan on public.locp
+ Output: locp.b, locp.ctid, locp.a
+ -> Hash
+ Output: "*VALUES*".*, "*VALUES*".column1
+ -> Values Scan on "*VALUES*"
+ Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)

In this test case, the foreign partition is updated before ri_FdwState
is overwritten by postgresBeginForeignInsert() that's invoked by the
tuple routing code, so it would work without any fix. So I modified
this test case as such.

Sorry for the long delay, again.

Best regards,
Etsuro Fujita

Attachment Content-Type Size
HEAD-bug-update-tuple-routing-with-foreign-parts-efujita.patch text/x-patch 6.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2019-04-10 09:13:10 Re: Transaction commits VS Transaction commits (with parallel) VS query mean time
Previous Message Kyotaro HORIGUCHI 2019-04-10 08:30:51 Re: Problem with default partition pruning