Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Date: 2016-02-08 13:41:56
Message-ID: CAFjFpRfgbfK9Mfv_4x2UCH6twjdROaV78qQZCWzzyYCG2PqL_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 8, 2016 at 4:15 PM, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
wrote:

> On 2016/02/05 17:50, Ashutosh Bapat wrote:
>
> Btw, IIUC, I think the patch fails to adjust the targetlist of the
>> top plan created that way, to output the fdw_scan_tlist, as
>> discussed in [1] (ie, I think the attached patch is needed, which is
>> created on top of your patch pg_fdw_join_v8.patch).
>>
>
> fdw_scan_tlist represents the output fetched from the foreign server and
>> is not necessarily the output of ForeignScan. ForeignScan node's output
>> is represented by tlist argument to.
>>
>> 1119 return make_foreignscan(tlist,
>> 1120 local_exprs,
>> 1121 scan_relid,
>> 1122 params_list,
>> 1123 fdw_private,
>> 1124 fdw_scan_tlist,
>> 1125 remote_exprs,
>> 1126 outer_plan);
>>
>> This tlist is built using build_path_tlist() for all join plans. IIUC,
>> all of them output the same targetlist. We don't need to make sure that
>> targetlist match as long as we are using the targetlist passed in by
>> create_scan_plan(). Do you have a counter example?
>>
>
> Maybe my explanation was not correct, but I'm saying that the targertlist
> of the above outer_plan should be set to the fdw_scan_tlist, to avoid
> misbehavior. Here is such an example (add() in the example is a user
> defined function that simply adds two arguments, defined by: create
> function add(integer, integer) returns integer as '/path/to/func', 'add'
> language c strict):
>
> postgres=# create foreign table foo (a int) server myserver options
> (table_name 'foo');
> postgres=# create foreign table bar (a int) server myserver options
> (table_name 'bar');
> postgres=# create table tab (a int, b int);
> postgres=# insert into foo select a from generate_series(1, 1000) a;
> postgres=# insert into bar select a from generate_series(1, 1000) a;
> postgres=# insert into tab values (1, 1);
> postgres=# analyze foo;
> postgres=# analyze bar;
> postgres=# analyze tab;
>
> [Terminal 1]
> postgres=# begin;
> BEGIN
> postgres=# update tab set b = b + 1 where a = 1;
> UPDATE 1
>
> [Terminal 2]
> postgres=# explain verbose select tab.* from tab, foo, bar where foo.a =
> bar.a and add(foo.a, bar.a) > 0 limit 1 for update;
>
> QUERY PLAN
>
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------
> ---------------------------------
> Limit (cost=100.00..107.70 rows=1 width=70)
> Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
> -> LockRows (cost=100.00..2663.48 rows=333 width=70)
> Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
> -> Nested Loop (cost=100.00..2660.15 rows=333 width=70)
> Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
> -> Foreign Scan (cost=100.00..2654.97 rows=333 width=56)
> Output: foo.*, bar.*
> Filter: (add(foo.a, bar.a) > 0)
> Relations: (public.foo) INNER JOIN (public.bar)
> Remote SQL: SELECT ROW(r2.a), ROW(r3.a), r2.a, r3.a
> FROM (public.foo r2 INNER JOIN public.bar r3 ON (TRUE)) WHERE ((r2.a =
> r3.a)) F
> OR UPDATE OF r2 FOR UPDATE OF r3
> -> Hash Join (cost=247.50..301.25 rows=333 width=56)
> Output: foo.*, bar.*
> Hash Cond: (foo.a = bar.a)
> Join Filter: (add(foo.a, bar.a) > 0)
> -> Foreign Scan on public.foo
> (cost=100.00..135.00 rows=1000 width=32)
> Output: foo.*, foo.a
> Remote SQL: SELECT a FROM public.foo FOR
> UPDATE
> -> Hash (cost=135.00..135.00 rows=1000
> width=32)
> Output: bar.*, bar.a
> -> Foreign Scan on public.bar
> (cost=100.00..135.00 rows=1000 width=32)
> Output: bar.*, bar.a
> Remote SQL: SELECT a FROM
> public.bar FOR UPDATE
> -> Materialize (cost=0.00..1.01 rows=1 width=14)
> Output: tab.a, tab.b, tab.ctid
> -> Seq Scan on public.tab (cost=0.00..1.01 rows=1
> width=14)
> Output: tab.a, tab.b, tab.ctid
> (27 rows)
>
> postgres=# select tab.* from tab, foo, bar where foo.a = bar.a and
> add(foo.a, bar.a) > 0 limit 1 for update;
>
> [Terminal 1]
> postgres=# commit;
> COMMIT
>
> [Terminal 2] (After the commit in Terminal 1, Terminal 2 will show the
> following.)
> a | b
> ---+---
> (0 rows)
>
> This is wrong. (Note that since the SELECT FOR UPDATE doesn't impose any
> condition on a tuple from the local table tab, the EvalPlanQual recheck
> executed should succeed.) The reason for that is that the targetlist of
> the local join plan is the same as for the ForeignScan, which outputs
> neither foo.a nor bar.a required as an argument of the function add().
>
>
I see what you are trying to say now. In ExecScan, ExecScanFetch will
execute the outer plan for EvalPlanQual check and then at
208 if (!qual || ExecQual(qual, econtext, false))
it will try to evaluate the local conditions, where it needs the foo.a and
bar.a which are not part of the projected output for ForeignScan and the
outer plan.

But then aren't the local conditions being evaluated twice, once by the
outer plan and then again by ExecScan? Is this OK? What happens when the
local conditions have side effects? We should probably delete them from the
outer_plan's quals.

The patch attached fixes the targetlist as per mail from Robert and the
quals as explained above.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_join_pd_v11.patch application/x-download 165.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shulgin, Oleksandr 2016-02-08 14:01:00 Re: More stable query plans via more predictable column statistics
Previous Message Andrew Dunstan 2016-02-08 13:30:38 Re: proposal: schema PL session variables