From: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Oddity in EXPLAIN for foreign/custom join pushdown plans |
Date: | 2016-07-29 04:05:07 |
Message-ID: | 741173c5-af93-00b4-e367-df28fee5910a@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2016/07/28 22:11, Kouhei Kaigai wrote:
I wrote:
>> That may be so, but my point is that the target relations involved in
>> the foreign join (ie, ft1 and ft2) should be printed somewhere in the
>> EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan.
> Why? According to your rule, Hash Join should take "on t0,t1,t2".
>
> postgres=# explain select id from t0 natural join t1 natural join t2;
> QUERY PLAN
> -----------------------------------------------------------------------------
> Hash Join (cost=6370.00..4560826.24 rows=98784048 width=4)
> Hash Cond: (t0.aid = t1.aid)
> -> Hash Join (cost=3185.00..3199360.58 rows=98784048 width=8)
> Hash Cond: (t0.bid = t2.bid)
> -> Seq Scan on t0 (cost=0.00..1833334.80 rows=100000080 width=12)
> -> Hash (cost=1935.00..1935.00 rows=100000 width=4)
> -> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4)
> -> Hash (cost=1935.00..1935.00 rows=100000 width=4)
> -> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=4)
> (9 rows)
I don't think it needs "on t0,t1,t2", because we can see joining
relations from inner/outer plans in that case. In a foreign-join case,
however, we can't see such relations from the EXPLAIN printed *by core*.
postgres_fdw avoids this issue by adding such relations to the EXPLAIN
using ExplainForeignScan as shown in the below example, but since such
relations are essential, I think that information should be shown by
core itself.
postgres=# explain select * from (select ft1.a from ft1 left join ft2 on
ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3
left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;
QUERY PLAN
----------------------------------------------------------------
Hash Full Join (cost=202.06..204.12 rows=1 width=8)
Hash Cond: (ft1.a = ft3.a)
-> Foreign Scan (cost=100.00..102.05 rows=1 width=4)
Relations: (public.ft1) LEFT JOIN (public.ft2)
-> Hash (cost=102.05..102.05 rows=1 width=4)
-> Foreign Scan (cost=100.00..102.05 rows=1 width=4)
Relations: (public.ft3) LEFT JOIN (public.ft4)
(7 rows)
From the Relations line shown by postgres_fdw, we can see which foreign
join joins which foreign tables, but if no such lines, we couldn't.
I wrote:
>> Probably something like this:
>>
>> Foreign Processing
>> Remote Operations: ...
>>
>> In the Remote Operations line, the FDW/extension could print any info
>> about remote operations, eg, "Scan/Join + Aggregate".
> "Foreign" implies this node is processed by FDW, but "Procesing" gives us
> no extra information; seems to me redundant.
I intentionally chose that word and thought we could leave detailed
descriptions about remote operations to the FDW/extension; a broader
word like "Processing" seems to work well because we allow various kinds
of operations to the remote side, in addition to scans/joins, to be
performed in that one Foreign Scan node indicated by "Foreign
Processing", such as aggregation, window functions, distinct, order by,
row locking, table modification, or combinations of them.
> Prior to the new invention, please explain why you don't want to by my
> suggestion first? Annoying is a feel of you, but not a logic to persuade
> others.
I'm not saying that the idea I proposed is better than your suggestion.
Just brain storming. I want to know what options we have and the pros
and cons of each approach.
>>> postgres=# explain select id from t0 natural join t1 natural join t2;
>>> QUERY PLAN
>>> ---------------------------------------------------------------------------
>>> Custom Scan (GpuJoin) (cost=12385.67..291245.35 rows=9815465 width=4)
>>> GPU Projection: t0.id
>>> Depth 1: GpuHashJoin, HashKeys: (t0.bid)
>>> JoinQuals: (t0.bid = t2.bid)
>>> Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
>>> Depth 2: GpuHashJoin, HashKeys: (t0.aid)
>>> JoinQuals: (t0.aid = t1.aid)
>>> Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
>>> -> Seq Scan on t0 (cost=0.00..183333.96 rows=9999996 width=12)
>>> -> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4)
>>> -> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=4)
>>> (11 rows)
> My largest concern for you proposition is, ForeignScan/CustomScan node is
> enforced to print name of underlying relations, regardless of its actual
> behavior. The above GpuJoin never scans tables at least, thus, it mislead
> users if we have no choice to print underlying relation names.
OK, I understand we would need special handling for such custom joins.
Best regards,
Etsuro Fujita
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2016-07-29 04:28:22 | Re: Oddity in EXPLAIN for foreign/custom join pushdown plans |
Previous Message | Michael Paquier | 2016-07-29 03:47:53 | Re: Wrong defeinition of pq_putmessage_noblock since 9.5 |