Re: WIP: Join push-down for foreign tables

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Join push-down for foreign tables
Date: 2011-10-09 11:35:56
Message-ID: CADyhKSXS3+WQEzx7XyR=SnziOBzRpSU-dBBNOj6BEJO9wHzatw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hanada-san,

The proposed patch put an invocation of PlanForeignJoin on the
create_foreignjoin_path() being also called by match_unsorted_outer().
Is it a suitable position to make a decision whether a join can be
pushed-down?

I think; it needs an additional functionality to provide higher priority
on the foreign-join plan that other plans, when fdw determind a particular
join can be pushed-down.
(Sorry, I have no idea right now.)

Let's see the following result.

postgres=# EXPLAIN SELECT * FROM ft1 , ft2, lt3 WHERE ft1.a = ft2.x
AND ft1.a = lt3.s;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Merge Join (cost=205.08..758.83 rows=30750 width=108)
Merge Cond: (ft1.a = lt3.s)
-> Merge Join (cost=119.66..199.66 rows=5000 width=72)
Merge Cond: (ft1.a = ft2.x)
-> Sort (cost=59.83..62.33 rows=1000 width=36)
Sort Key: ft1.a
-> Foreign Scan on ft1 (cost=10.00..10.00 rows=1000 width=36)
Remote SQL: DECLARE pgsql_fdw_cursor_0 SCROLL
CURSOR FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1
-> Sort (cost=59.83..62.33 rows=1000 width=36)
Sort Key: ft2.x
-> Foreign Scan on ft2 (cost=10.00..10.00 rows=1000 width=36)
Remote SQL: DECLARE pgsql_fdw_cursor_1 SCROLL
CURSOR FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2
-> Sort (cost=85.43..88.50 rows=1230 width=36)
Sort Key: lt3.s
-> Seq Scan on lt3 (cost=0.00..22.30 rows=1230 width=36)
(15 rows)

Then, I turned off the enable_mergejoin.

postgres=# EXPLAIN SELECT * FROM ft1 , ft2, lt3 WHERE ft1.a = ft2.x
AND ft1.a = lt3.s;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Hash Join (cost=37.67..1126.42 rows=30750 width=108)
Hash Cond: (ft1.a = lt3.s)
-> Foreign Scan on multiple foreign tables (cost=0.00..0.00
rows=5000 width=72)
Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y FROM public.ft1
ft1, public.ft2 ft2 WHERE (ft1.a = ft2.x)
-> Hash (cost=22.30..22.30 rows=1230 width=36)
-> Seq Scan on lt3 (cost=0.00..22.30 rows=1230 width=36)
(6 rows)

Probably, the basic design is correct. However, the planner gives
higher priority on the join plan between
local and foreign than pushing-down foreign relations.

Does it make sense not to consider any other possible plans when FDW
decided a particular join can be
pushed down?

Thanks,

2011年10月7日18:06 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
> 2011年10月4日12:08 Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>:
>>> In my opinion, FdwRoutine should have an additional API to inform the core its
>>> supported features; such as inner-join, outer-join, order-by,
>>> group-by, aggregate
>>> functions, insert, update, delete, etc... in the future version.
>>
>> Sure, so in my design PlanForeignJoin is optional.
>>
>> The lack of capability is informed from FDW with setting function
>> pointer in FdwRoutine to NULL. If PlanForeignJoin was NULL, core
>> (planner) will give up to consider join push-down, and use one of local
>> join methods such as NestLoop and MergeJoin for those foreign tables.
>> As you say, other push-down-able features would also have optional
>> handler function for each.
>>
> Sorry, I overlooked it was already implemented at create_foreignjoin_path().
>
> I additionally tried several cases using pgsql_fdw.
> In some cases, it seems to me the planner don't push down the join tree
> as you probably expected.
> Please see the following example:
>
> I defined three foreign tables: ft1(a int, b text), ft2(x int, y
> text), ft3(s int, t text),
> and lt1, lt2, lt3 are regular local tables.
>
> postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join ft3 on a = s);
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
> Foreign Scan on multiple foreign tables (cost=0.00..0.00 rows=25000 width=108)
> Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y, ft3.s, ft3.t FROM
> public.ft1 ft1, public.ft2 ft2, public.ft3 ft3 WHERE (ft1.a = ft3.s)
> AND (ft1.a = ft2.x)
> (2 rows)
>
> It works good.
> (P.S. I noticed that pgsql_fdw has incorrect Assert(). Please fix
> pgsql_fdw.c:730)
>
> However, an existence of local relation makes planner confused.
> It seems to me you expect "ft1 join ft2 on a = x"
>
> postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join lt3 on a = s);
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------
> Merge Join (cost=205.08..758.83 rows=30750 width=108)
> Merge Cond: (ft1.a = lt3.s)
> -> Merge Join (cost=119.66..199.66 rows=5000 width=72)
> Merge Cond: (ft1.a = ft2.x)
> -> Sort (cost=59.83..62.33 rows=1000 width=36)
> Sort Key: ft1.a
> -> Foreign Scan on ft1 (cost=10.00..10.00 rows=1000 width=36)
> Remote SQL: DECLARE pgsql_fdw_cursor_2 SCROLL
> CURSOR FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1
> -> Sort (cost=59.83..62.33 rows=1000 width=36)
> Sort Key: ft2.x
> -> Foreign Scan on ft2 (cost=10.00..10.00 rows=1000 width=36)
> Remote SQL: DECLARE pgsql_fdw_cursor_3 SCROLL
> CURSOR FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2
> -> Sort (cost=85.43..88.50 rows=1230 width=36)
> Sort Key: lt3.s
> -> Seq Scan on lt3 (cost=0.00..22.30 rows=1230 width=36)
> (15 rows)
>
> What is the reason why the foreign join is not pushed down?
> Maybe, injected Sort plan prevent the planner to consider both side of
> relations being foreign scan owned by same server? I'm still
> investigating the reason.
>
> I hope comments from committers. :-(
>
> A collateral evidence is below.
> If we try to sort the result by a key being not used to join, the both
> of foreign scan gets pushed down.
>
> postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by y;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
> Sort (cost=307.19..319.69 rows=5000 width=72)
> Sort Key: ft2.y
> -> Foreign Scan on multiple foreign tables (cost=0.00..0.00
> rows=5000 width=72)
> Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y FROM public.ft1
> ft1, public.ft2 ft2 WHERE (ft1.a = ft2.x)
> (4 rows)
>
> However, when I tried to sort by a key being used to join, the both of
> foreign scan was not pushed down.
>
> postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by a;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
> Merge Join (cost=119.66..199.66 rows=5000 width=72)
> Merge Cond: (ft1.a = ft2.x)
> -> Sort (cost=59.83..62.33 rows=1000 width=36)
> Sort Key: ft1.a
> -> Foreign Scan on ft1 (cost=10.00..10.00 rows=1000 width=36)
> Remote SQL: DECLARE pgsql_fdw_cursor_6 SCROLL CURSOR
> FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1
> -> Sort (cost=59.83..62.33 rows=1000 width=36)
> Sort Key: ft2.x
> -> Foreign Scan on ft2 (cost=10.00..10.00 rows=1000 width=36)
> Remote SQL: DECLARE pgsql_fdw_cursor_7 SCROLL CURSOR
> FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2
> (10 rows)
>
> Thanks,
> --
> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-10-09 12:20:27 Re: libpq, PQdescribePrepared -> PQftype, PQfmod, no PQnullable
Previous Message Florian Pflug 2011-10-09 10:45:08 Re: libpq, PQdescribePrepared -> PQftype, PQfmod, no PQnullable