Re: WIP: Join push-down for foreign tables

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Join push-down for foreign tables
Date: 2011-10-07 16:06:48
Message-ID: CADyhKSUaL31GENeYgzfKshAi7NXh=yHBR8N2XKiu+B2fZmC-UA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011年10月4日12:08 Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>:
>> In my opinion, FdwRoutine should have an additional API to inform the core its
>> supported features; such as inner-join, outer-join, order-by,
>> group-by, aggregate
>> functions, insert, update, delete, etc... in the future version.
>
> Sure, so in my design PlanForeignJoin is optional.
>
> The lack of capability is informed from FDW with setting function
> pointer in FdwRoutine to NULL. If PlanForeignJoin was NULL, core
> (planner) will give up to consider join push-down, and use one of local
> join methods such as NestLoop and MergeJoin for those foreign tables.
> As you say, other push-down-able features would also have optional
> handler function for each.
>
Sorry, I overlooked it was already implemented at create_foreignjoin_path().

I additionally tried several cases using pgsql_fdw.
In some cases, it seems to me the planner don't push down the join tree
as you probably expected.
Please see the following example:

I defined three foreign tables: ft1(a int, b text), ft2(x int, y
text), ft3(s int, t text),
and lt1, lt2, lt3 are regular local tables.

postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join ft3 on a = s);

QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Foreign Scan on multiple foreign tables (cost=0.00..0.00 rows=25000 width=108)
Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y, ft3.s, ft3.t FROM
public.ft1 ft1, public.ft2 ft2, public.ft3 ft3 WHERE (ft1.a = ft3.s)
AND (ft1.a = ft2.x)
(2 rows)

It works good.
(P.S. I noticed that pgsql_fdw has incorrect Assert(). Please fix
pgsql_fdw.c:730)

However, an existence of local relation makes planner confused.
It seems to me you expect "ft1 join ft2 on a = x"

postgres=# explain SELECT * FROM (ft1 join ft2 on a = x join lt3 on a = s);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Merge Join (cost=205.08..758.83 rows=30750 width=108)
Merge Cond: (ft1.a = lt3.s)
-> Merge Join (cost=119.66..199.66 rows=5000 width=72)
Merge Cond: (ft1.a = ft2.x)
-> Sort (cost=59.83..62.33 rows=1000 width=36)
Sort Key: ft1.a
-> Foreign Scan on ft1 (cost=10.00..10.00 rows=1000 width=36)
Remote SQL: DECLARE pgsql_fdw_cursor_2 SCROLL
CURSOR FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1
-> Sort (cost=59.83..62.33 rows=1000 width=36)
Sort Key: ft2.x
-> Foreign Scan on ft2 (cost=10.00..10.00 rows=1000 width=36)
Remote SQL: DECLARE pgsql_fdw_cursor_3 SCROLL
CURSOR FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2
-> Sort (cost=85.43..88.50 rows=1230 width=36)
Sort Key: lt3.s
-> Seq Scan on lt3 (cost=0.00..22.30 rows=1230 width=36)
(15 rows)

What is the reason why the foreign join is not pushed down?
Maybe, injected Sort plan prevent the planner to consider both side of
relations being foreign scan owned by same server? I'm still
investigating the reason.

I hope comments from committers. :-(

A collateral evidence is below.
If we try to sort the result by a key being not used to join, the both
of foreign scan gets pushed down.

postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by y;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=307.19..319.69 rows=5000 width=72)
Sort Key: ft2.y
-> Foreign Scan on multiple foreign tables (cost=0.00..0.00
rows=5000 width=72)
Remote SQL: SELECT ft1.a, ft1.b, ft2.x, ft2.y FROM public.ft1
ft1, public.ft2 ft2 WHERE (ft1.a = ft2.x)
(4 rows)

However, when I tried to sort by a key being used to join, the both of
foreign scan was not pushed down.

postgres=# explain SELECT * FROM (ft1 join ft2 on a = x) order by a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Merge Join (cost=119.66..199.66 rows=5000 width=72)
Merge Cond: (ft1.a = ft2.x)
-> Sort (cost=59.83..62.33 rows=1000 width=36)
Sort Key: ft1.a
-> Foreign Scan on ft1 (cost=10.00..10.00 rows=1000 width=36)
Remote SQL: DECLARE pgsql_fdw_cursor_6 SCROLL CURSOR
FOR SELECT ft1.a, ft1.b FROM public.ft1 ft1
-> Sort (cost=59.83..62.33 rows=1000 width=36)
Sort Key: ft2.x
-> Foreign Scan on ft2 (cost=10.00..10.00 rows=1000 width=36)
Remote SQL: DECLARE pgsql_fdw_cursor_7 SCROLL CURSOR
FOR SELECT ft2.x, ft2.y FROM public.ft2 ft2
(10 rows)

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Usama Dar 2011-10-07 16:17:22 PQsendQuery/ PQgetResult Problem
Previous Message Robert Haas 2011-10-07 15:50:00 Re: alter table only ... drop constraint broken in HEAD