Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Date: 2016-08-01 11:15:03
Message-ID: edd1ee88-d7c9-583e-6c0a-bc7d50e40522@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016/07/29 13:05, Etsuro Fujita wrote:
> In a foreign-join case,
> however, we can't see such relations from the EXPLAIN printed *by core*.
> postgres_fdw avoids this issue by adding such relations to the EXPLAIN
> using ExplainForeignScan as shown in the below example, but since such
> relations are essential, I think that information should be shown by
> core itself.
>
> postgres=# explain select * from (select ft1.a from ft1 left join ft2 on
> ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3
> left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;
> QUERY PLAN
> ----------------------------------------------------------------
> Hash Full Join (cost=202.06..204.12 rows=1 width=8)
> Hash Cond: (ft1.a = ft3.a)
> -> Foreign Scan (cost=100.00..102.05 rows=1 width=4)
> Relations: (public.ft1) LEFT JOIN (public.ft2)
> -> Hash (cost=102.05..102.05 rows=1 width=4)
> -> Foreign Scan (cost=100.00..102.05 rows=1 width=4)
> Relations: (public.ft3) LEFT JOIN (public.ft4)
> (7 rows)
>
> From the Relations line shown by postgres_fdw, we can see which foreign
> join joins which foreign tables, but if no such lines, we couldn't.

I thought about the Relations line a bit more and noticed that there are
cases where the table reference names for joining relations in the
Relations line are printed incorrectly. Here is an example:

postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1,
ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2
where t1.a = t2.a) as t(t1a, t2a);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Unique (cost=204.12..204.13 rows=2 width=8)
Output: t1.a, t2.a
-> Sort (cost=204.12..204.12 rows=2 width=8)
Output: t1.a, t2.a
Sort Key: t1.a, t2.a
-> Append (cost=100.00..204.11 rows=2 width=8)
-> Foreign Scan (cost=100.00..102.04 rows=1 width=8)
Output: t1.a, t2.a
Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
-> Foreign Scan (cost=100.00..102.04 rows=1 width=8)
Output: t1_1.a, t2_1.a
Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
INNER JOIN public.t2 r2 ON (((r1.a = r2.a))))
(14 rows)

The table reference names for ft1 and ft2 in the Relations line for the
second Foreign Scan should be t1_1 and t2_1 respectively.

Another concern about the Relations line is, that represents just an
internal representation of a pushed-down join, so that would not
necessarily match a deparsed query shown in the Remote SQL line. Here
is an example, which I found when working on supporting pushing down
full outer join a lot more, by improving the deparsing logic so that
postgres_fdw can build a remote query that involves subqueries [1],
which I'll work on for 10.0:

+ -- full outer join with restrictions on the joining relations
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM (SELECT c1 FROM ft4 WHERE c1 BETWEEN 50 AND
60) t1 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 BETWEEN 50 AND 60) t2 ON
(t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
+
QUERY
PLAN

+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan
+ Output: ft4.c1, ft5.c1
+ Relations: (public.ft4) FULL JOIN (public.ft5)
+ Remote SQL: SELECT ss1.c1, ss2.c1 FROM ((SELECT c1 FROM "S 1"."T 3"
WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss1(c1) FULL JOIN (SELECT c1 FROM
"S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss2(c1) ON (((ss1.c1 =
ss2.c1)))) ORDER BY ss1.c1 ASC NULLS LAST, ss2.c1 ASC NULLS LAST
+ (4 rows)

"(public.ft4) FULL JOIN (public.ft5)" in the Relations line does not
exactly match the deparsed query in the Remote SQL line, which I think
would be rather confusing for users. (We may be able to print more
exact information in the Relations line so as to match the depaserd
query, but I think that that would make the Relations line redundant.)

Would we really need the Relations line? If joining relations are
printed by core like "Foreign Join on public.ft1 t1_1, public.ft2 t2_1"
as proposed upthread, we can see those relations from that, not the
Relations line. Also we can see the join tree structure from the
deparsed query in the Remote SQL line. The Relations line seems to be
not that useful anymore, then. What do you think about that?

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5710D7E2.7010302%40lab.ntt.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2016-08-01 11:24:15 Re: Combining hash values
Previous Message Aleksander Alekseev 2016-08-01 09:45:31 Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)