Re: Join push-down support for foreign tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join push-down support for foreign tables
Date: 2015-03-06 10:25:45
Message-ID: CAFjFpRf9YcUOwYOsvtGXWZn5inJQf_VcEfWyu4kjK+YH7f4ZwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kaigai-san, Hanada-san,
Attached please find a patch to print the column names prefixed by the
relation names. I haven't tested the patch fully. The same changes will be
needed for CustomPlan node specific code.

Now I am able to make sense out of the Output information

postgres=# explain verbose select * from ft1 join ft2 using (val);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Foreign Scan (cost=100.00..125.60 rows=2560 width=12)
Output: ft1.val, ft1.val2, ft2.val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_1 FROM (SELECT val, val2 FROM
public.lt) l (a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r
(a_0, a_1)
ON ((r.a_0 = l.a_0))
(3 rows)

On Fri, Mar 6, 2015 at 6:41 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:

> > Actually val and val2 come from public.lt in "r" side, but as you say
> > it's too difficult to know that from EXPLAIN output. Do you have any
> > idea to make the "Output" item more readable?
> >
> A fundamental reason why we need to have symbolic aliases here is that
> postgres_fdw has remote query in cstring form. It makes implementation
> complicated to deconstruct/construct a query that is once constructed
> on the underlying foreign-path level.
> If ForeignScan keeps items to construct remote query in expression node
> form (and construction of remote query is delayed to beginning of the
> executor, probably), we will be able to construct more human readable
> remote query.
>
> However, I don't recommend to work on this great refactoring stuff
> within the scope of join push-down support project.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>
>
> > -----Original Message-----
> > From: pgsql-hackers-owner(at)postgresql(dot)org
> > [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Shigeru Hanada
> > Sent: Thursday, March 05, 2015 10:00 PM
> > To: Ashutosh Bapat
> > Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
> > Subject: Re: [HACKERS] Join push-down support for foreign tables
> >
> > Hi Ashutosh, thanks for the review.
> >
> > 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat <
> ashutosh(dot)bapat(at)enterprisedb(dot)com>:
> > > In create_foreignscan_path() we have lines like -
> > > 1587 pathnode->path.param_info = get_baserel_parampathinfo(root,
> rel,
> > > 1588
> > > required_outer);
> > > Now, that the same function is being used for creating foreign scan
> paths
> > > for joins, we should be calling get_joinrel_parampathinfo() on a join
> rel
> > > and get_baserel_parampathinfo() on base rel.
> >
> > Got it. Please let me check the difference.
> >
> > >
> > > The patch seems to handle all the restriction clauses in the same way.
> There
> > > are two kinds of restriction clauses - a. join quals (specified using
> ON
> > > clause; optimizer might move them to the other class if that doesn't
> affect
> > > correctness) and b. quals on join relation (specified in the WHERE
> clause,
> > > optimizer might move them to the other class if that doesn't affect
> > > correctness). The quals in "a" are applied while the join is being
> computed
> > > whereas those in "b" are applied after the join is computed. For
> example,
> > > postgres=# select * from lt;
> > > val | val2
> > > -----+------
> > > 1 | 2
> > > 1 | 3
> > > (2 rows)
> > >
> > > postgres=# select * from lt2;
> > > val | val2
> > > -----+------
> > > 1 | 2
> > > (1 row)
> > >
> > > postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
> > > val | val2 | val | val2
> > > -----+------+-----+------
> > > 1 | 2 | 1 | 2
> > > 1 | 3 | |
> > > (2 rows)
> > >
> > > postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
> > > lt2.val2);
> > > val | val2 | val | val2
> > > -----+------+-----+------
> > > 1 | 2 | 1 | 2
> > > (1 row)
> > >
> > > The difference between these two kinds is evident in case of outer
> joins,
> > > for inner join optimizer puts all of them in class "b". The remote
> query
> > > sent to the foreign server has all those in ON clause. Consider foreign
> > > tables ft1 and ft2 pointing to local tables on the same server.
> > > postgres=# \d ft1
> > > Foreign table "public.ft1"
> > > Column | Type | Modifiers | FDW Options
> > > --------+---------+-----------+-------------
> > > val | integer | |
> > > val2 | integer | |
> > > Server: loopback
> > > FDW Options: (table_name 'lt')
> > >
> > > postgres=# \d ft2
> > > Foreign table "public.ft2"
> > > Column | Type | Modifiers | FDW Options
> > > --------+---------+-----------+-------------
> > > val | integer | |
> > > val2 | integer | |
> > > Server: loopback
> > > FDW Options: (table_name 'lt2')
> > >
> > > postgres=# explain verbose select * from ft1 left join ft2 on
> (ft1.val2 =
> > > ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;
> > >
> > > QUERY PLAN
> > >
> > >
> >
> ----------------------------------------------------------------------------
> >
> ---------------------------------------------------------------------------
> > >
> >
> ----------------------------------------------------------------------------
> > --------
> > > Foreign Scan (cost=100.00..125.60 rows=2560 width=16)
> > > Output: val, val2, val, val2
> > > Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val,
> val2 FROM
> > > public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt)
> r (a
> > > _0, a_1) ON ((((r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND
> ((r.a_1 =
> > > l.a_1))
> > > (3 rows)
> > >
> > > The result is then wrong
> > > postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2)
> where
> > > ft1.val + ft2.val > ft1.val2 or ft2.val is null;
> > > val | val2 | val | val2
> > > -----+------+-----+------
> > > 1 | 2 | |
> > > 1 | 3 | |
> > > (2 rows)
> > >
> > > which should match the result obtained by substituting local tables for
> > > foreign ones
> > > postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
> > > lt.val + lt2.val > lt.val2 or lt2.val is null;
> > > val | val2 | val | val2
> > > -----+------+-----+------
> > > 1 | 3 | |
> > > (1 row)
> > >
> > > Once we start distinguishing the two kinds of quals, there is some
> > > optimization possible. For pushing down a join it's essential that all
> the
> > > quals in "a" are safe to be pushed down. But a join can be pushed
> down, even
> > > if quals in "a" are not safe to be pushed down. But more clauses one
> pushed
> > > down to foreign server, lesser are the rows fetched from the foreign
> server.
> > > In postgresGetForeignJoinPath, instead of checking all the restriction
> > > clauses to be safe to be pushed down, we need to check only those
> which are
> > > join quals (class "a").
> >
> > The argument restrictlist of GetForeignJoinPaths contains both
> > conditions mixed, so I added extract_actual_join_clauses() to separate
> > it into two lists, join_quals and other clauses. This is similar to
> > what create_nestloop_plan and siblings do.
> >
> >
> > >
> > > Following EXPLAIN output seems to be confusing
> > > ft1 and ft2 both are pointing to same lt on a foreign server.
> > > postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2
> where
> > > ft1.val + ft1.val2 = ft2.val;
> > >
> > > QUERY PLAN
> > >
> > >
> >
> ----------------------------------------------------------------------------
> >
> ---------------------------------------------------------------------------
> > > --------------------------
> > > Foreign Scan (cost=100.00..132.00 rows=2560 width=8)
> > > Output: (val + val2)
> > > Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROM
> public.lt) l
> > > (a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1)
> ON ((
> > > (r.a_0 + r.a_1) = l.a_0))
> > >
> > > Output just specified val + val2, it doesn't tell, where those val and
> val2
> > > come from, neither it's evident from the rest of the context.
> > >
> >
> > Actually val and val2 come from public.lt in "r" side, but as you say
> > it's too difficult to know that from EXPLAIN output. Do you have any
> > idea to make the "Output" item more readable?
> >
> > --
> > Shigeru HANADA
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
explain_relnames.patch text/x-patch 3.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2015-03-06 11:23:55 Re: MD5 authentication needs help
Previous Message Fabien COELHO 2015-03-06 09:41:03 extend pgbench expressions with functions