Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Date: 2016-08-02 13:02:47
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8012120ED@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Etsuro Fujita [mailto:fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp]
> Sent: Tuesday, August 02, 2016 9:36 PM
> To: Kaigai Kouhei(海外 浩平); pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans
>
> On 2016/08/01 20:15, Etsuro Fujita wrote:
> > I thought about the Relations line a bit more and noticed that there are
> > cases where the table reference names for joining relations in the
> > Relations line are printed incorrectly. Here is an example:
> >
> > postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1,
> > ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2
> > where t1.a = t2.a) as t(t1a, t2a);
> > QUERY PLAN
> >
> ----------------------------------------------------------------------------------
> ----------------------------------
> >
> > Unique (cost=204.12..204.13 rows=2 width=8)
> > Output: t1.a, t2.a
> > -> Sort (cost=204.12..204.12 rows=2 width=8)
> > Output: t1.a, t2.a
> > Sort Key: t1.a, t2.a
> > -> Append (cost=100.00..204.11 rows=2 width=8)
> > -> Foreign Scan (cost=100.00..102.04 rows=1 width=8)
> > Output: t1.a, t2.a
> > Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
> > Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
> > INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
> > -> Foreign Scan (cost=100.00..102.04 rows=1 width=8)
> > Output: t1_1.a, t2_1.a
> > Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
> > Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
> > INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
> > (14 rows)
> >
> > The table reference names for ft1 and ft2 in the Relations line for the
> > second Foreign Scan should be t1_1 and t2_1 respectively.
> >
> > Another concern about the Relations line is, that represents just an
> > internal representation of a pushed-down join, so that would not
> > necessarily match a deparsed query shown in the Remote SQL line. Here
> > is an example, which I found when working on supporting pushing down
> > full outer join a lot more, by improving the deparsing logic so that
> > postgres_fdw can build a remote query that involves subqueries [1],
> > which I'll work on for 10.0:
> >
> > + -- full outer join with restrictions on the joining relations
> > + EXPLAIN (COSTS false, VERBOSE)
> > + SELECT t1.c1, t2.c1 FROM (SELECT c1 FROM ft4 WHERE c1 BETWEEN 50 AND
> > 60) t1 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 BETWEEN 50 AND 60) t2 ON
> > (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
> > + QUERY
> > PLAN
> > +
> >
> ----------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------
> ---------------------------------
> >
> > + Foreign Scan
> > + Output: ft4.c1, ft5.c1
> > + Relations: (public.ft4) FULL JOIN (public.ft5)
> > + Remote SQL: SELECT ss1.c1, ss2.c1 FROM ((SELECT c1 FROM "S 1"."T 3"
> > WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss1(c1) FULL JOIN (SELECT c1 FROM
> > "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss2(c1) ON (((ss1.c1 =
> > ss2.c1)))) ORDER BY ss1.c1 ASC NULLS LAST, ss2.c1 ASC NULLS LAST
> > + (4 rows)
> >
> > "(public.ft4) FULL JOIN (public.ft5)" in the Relations line does not
> > exactly match the deparsed query in the Remote SQL line, which I think
> > would be rather confusing for users. (We may be able to print more
> > exact information in the Relations line so as to match the depaserd
> > query, but I think that that would make the Relations line redundant.)
> >
> > Would we really need the Relations line? If joining relations are
> > printed by core like "Foreign Join on public.ft1 t1_1, public.ft2 t2_1"
> > as proposed upthread, we can see those relations from that, not the
> > Relations line. Also we can see the join tree structure from the
> > deparsed query in the Remote SQL line. The Relations line seems to be
> > not that useful anymore, then. What do you think about that?
>
> I removed the Relations line. Here is an updated version of the patch.
>
> * As I said upthread, I left the upper-relation handling for another
> patch. Currently, the patch prints "Foreign Scan" with no relations in
> that case.
>
> * I kept custom joins as-is. We would need discussions about how to
> choose relations we print in EXPLAIN, so I'd also like to leave that for
> yet another patch.
>
Please don't rely on fs_relids bitmap to pick up relations to be printed.
It just hold a set of underlying relations, but it does not mean all of
these relations are actually scanned inside of the ForeignScan.

You didn't answer the following scenario I pointed out in the upthread.

| Please assume an enhancement of postgres_fdw that reads a small local table (tbl_1)
| and parse them as VALUES() clause within a remote query to execute remote join
| with foreign tables (ftbl_2, ftbl_3).
| This ForeignScan node has three underlying relations; tbl_1, ftbl_2 and ftbl_3.
| Likely, tbl_1 will be scanned by SeqScan, not ForeignScan itself.
| In this case, which relations shall be printed around ForeignScan?
| Is it possible to choose proper relation names without hint by the extension?
| ^^^^^^^^^^^^

To care about these FDW usage, you should add an alternative bitmap rather
than fs_relids/custom_relids. My suggestion is, having explain_relids for
the purpose.
Also, the logic to print "Foreign (Scan|Insert|Update|Delete)" is different
from what I suggested. I'm suggesting to allow extension giving a label
to fill up "Foreign %s" format.

Please explain why your choice is better than my proposition.

At least, my proposition is available to apply on both of foreign-scan and
custom-scan, and no need to future maintenance if and when FDW gets support
remote Aggregation, Sort or others.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-08-02 13:31:47 Re: Tracking wait event for latches
Previous Message Robert Haas 2016-08-02 12:56:35 Re: Wrong defeinition of pq_putmessage_noblock since 9.5