Re: bug in update tuple routing with foreign partitions

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug in update tuple routing with foreign partitions
Date: 2019-04-11 11:31:06
Message-ID: 5CAF257A.9050604@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2019/04/11 14:33), Amit Langote wrote:
> On 2019/04/10 17:38, Etsuro Fujita wrote:
>> (2019/03/06 18:33), Amit Langote wrote:
>>> I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
>>> to use for partition routing targets. Specifically, the bug occurs when
>>> UPDATE targets include a foreign partition that is locally modified (as
>>> opposed to being modified directly on the remove server) and its
>>> ResultRelInfo (called subplan result rel in the source code) is reused for
>>> tuple routing:
>>
>>> The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
>>> subplan result rel if its ri_FdwState is already set.
>>
>> I'm not sure that is a good idea, because that requires to create a new
>> ResultRelInfo, which is not free; I think it requires a lot of work.
>
> After considering this a bit more and studying your proposed solution, I
> tend to agree. Beside the performance considerations you point out that I
> too think are valid, I realize that going with my approach would mean
> embedding some assumptions in the core code about ri_FdwState; in this
> case, assuming that a subplan result rel's ri_FdwState is not to be
> re-purposed for tuple routing insert, which is only based on looking at
> postgres_fdw's implementation.

Yeah, that assumption might not apply to some other FDWs.

> Not to mention the ensuing complexity in
> the core tuple routing code -- it now needs to account for the fact that
> not all subplan result rels may have been re-purposed for tuple-routing,
> something that's too complicated to handle in PG 11.

I think so too.

> IOW, let's call this a postgres_fdw bug and fix it there as you propose.

Agreed.

>> Another solution to avoid that would be to store the fmstate created by
>> postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
>> ri_FdwState, like the attached. This would not need any changes to the
>> core, and this would not cause any overheads either, IIUC. What do you
>> think about that?
>
> +1. Just one comment:
>
> + /*
> + * If the given resultRelInfo already has PgFdwModifyState set, it means
> + * the foreign table is an UPDATE subplan resultrel; in which case, store
> + * the resulting state into the PgFdwModifyState.
> + */
>
> The last "PgFdwModifyState" should be something else? Maybe,
> sub-PgModifyState or sub_fmstate?

OK, will revise. My favorite would be the latter.

>>> I've also added a
>>> test in postgres_fdw.sql to exercise this test case.
>>
>> Thanks again, but the test case you added works well without any fix:
>
> Oops, I should really adopt a habit of adding a test case before code.
>
>> +-- Check case where the foreign partition is a subplan target rel and
>> +-- foreign parttion is locally modified (target table being joined
>> +-- locally prevents a direct/remote modification plan).
>> +explain (verbose, costs off)
>> +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
>> returning *;
>> + QUERY PLAN
>> +------------------------------------------------------------------------------
>>
>> + Update on public.utrtest
>> + Output: remp.a, remp.b, "*VALUES*".column1
>> + Foreign Update on public.remp
>> + Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING
>> a, b
>> + Update on public.locp
>> + -> Hash Join
>> + Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
>> + Hash Cond: (remp.a = "*VALUES*".column1)
>> + -> Foreign Scan on public.remp
>> + Output: remp.b, remp.ctid, remp.a
>> + Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
>> + -> Hash
>> + Output: "*VALUES*".*, "*VALUES*".column1
>> + -> Values Scan on "*VALUES*"
>> + Output: "*VALUES*".*, "*VALUES*".column1
>> + -> Hash Join
>> + Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
>> + Hash Cond: (locp.a = "*VALUES*".column1)
>> + -> Seq Scan on public.locp
>> + Output: locp.b, locp.ctid, locp.a
>> + -> Hash
>> + Output: "*VALUES*".*, "*VALUES*".column1
>> + -> Values Scan on "*VALUES*"
>> + Output: "*VALUES*".*, "*VALUES*".column1
>> +(24 rows)
>>
>> In this test case, the foreign partition is updated before ri_FdwState is
>> overwritten by postgresBeginForeignInsert() that's invoked by the tuple
>> routing code, so it would work without any fix. So I modified this test
>> case as such.
>
> Thanks for fixing that. I hadn't noticed that foreign partition remp is
> updated before local partition locp; should've added a locp0 for values
> (0) so that remp appears second in the ModifyTable's subplans. Or, well,
> make the foreign table remp appear later by making it a partition for
> values in (3) as your patch does.
>
> BTW, have you noticed that the RETURNING clause returns the same row twice?

I noticed this, but I didn't think it hard. :(

> +update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
> returning *;
> + a | b | x
> +---+-------------------+---
> + 3 | qux triggered ! | 2
> + 3 | xyzzy triggered ! | 3
> + 3 | qux triggered ! | 3
> +(3 rows)
>
> You can see that the row that's moved into remp is returned twice (one
> with "qux triggered!" in b column), whereas it should've been only once?

Yeah, this is unexpected behavior, so will look into this. Thanks for
pointing that out!

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2019-04-11 11:52:40 Re: Pluggable Storage - Andres's take
Previous Message Anastasia Lubennikova 2019-04-11 11:30:20 Re: Failure in contrib test _int on loach