Re: Oddity in EXPLAIN for foreign/custom join pushdown plans

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Date: 2016-07-29 04:05:07
Message-ID: 741173c5-af93-00b4-e367-df28fee5910a@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016/07/28 22:11, Kouhei Kaigai wrote:

I wrote:
>> That may be so, but my point is that the target relations involved in
>> the foreign join (ie, ft1 and ft2) should be printed somewhere in the
>> EXPLAIN output by core, as in EXPLAIN for a simple foreign table scan.

> Why? According to your rule, Hash Join should take "on t0,t1,t2".
>
> postgres=# explain select id from t0 natural join t1 natural join t2;
> QUERY PLAN
> -----------------------------------------------------------------------------
> Hash Join (cost=6370.00..4560826.24 rows=98784048 width=4)
> Hash Cond: (t0.aid = t1.aid)
> -> Hash Join (cost=3185.00..3199360.58 rows=98784048 width=8)
> Hash Cond: (t0.bid = t2.bid)
> -> Seq Scan on t0 (cost=0.00..1833334.80 rows=100000080 width=12)
> -> Hash (cost=1935.00..1935.00 rows=100000 width=4)
> -> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4)
> -> Hash (cost=1935.00..1935.00 rows=100000 width=4)
> -> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=4)
> (9 rows)

I don't think it needs "on t0,t1,t2", because we can see joining
relations from inner/outer plans in that case. In a foreign-join case,
however, we can't see such relations from the EXPLAIN printed *by core*.
postgres_fdw avoids this issue by adding such relations to the EXPLAIN
using ExplainForeignScan as shown in the below example, but since such
relations are essential, I think that information should be shown by
core itself.

postgres=# explain select * from (select ft1.a from ft1 left join ft2 on
ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3
left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;
QUERY PLAN
----------------------------------------------------------------
Hash Full Join (cost=202.06..204.12 rows=1 width=8)
Hash Cond: (ft1.a = ft3.a)
-> Foreign Scan (cost=100.00..102.05 rows=1 width=4)
Relations: (public.ft1) LEFT JOIN (public.ft2)
-> Hash (cost=102.05..102.05 rows=1 width=4)
-> Foreign Scan (cost=100.00..102.05 rows=1 width=4)
Relations: (public.ft3) LEFT JOIN (public.ft4)
(7 rows)

From the Relations line shown by postgres_fdw, we can see which foreign
join joins which foreign tables, but if no such lines, we couldn't.

I wrote:
>> Probably something like this:
>>
>> Foreign Processing
>> Remote Operations: ...
>>
>> In the Remote Operations line, the FDW/extension could print any info
>> about remote operations, eg, "Scan/Join + Aggregate".

> "Foreign" implies this node is processed by FDW, but "Procesing" gives us
> no extra information; seems to me redundant.

I intentionally chose that word and thought we could leave detailed
descriptions about remote operations to the FDW/extension; a broader
word like "Processing" seems to work well because we allow various kinds
of operations to the remote side, in addition to scans/joins, to be
performed in that one Foreign Scan node indicated by "Foreign
Processing", such as aggregation, window functions, distinct, order by,
row locking, table modification, or combinations of them.

> Prior to the new invention, please explain why you don't want to by my
> suggestion first? Annoying is a feel of you, but not a logic to persuade
> others.

I'm not saying that the idea I proposed is better than your suggestion.
Just brain storming. I want to know what options we have and the pros
and cons of each approach.

>>> postgres=# explain select id from t0 natural join t1 natural join t2;
>>> QUERY PLAN
>>> ---------------------------------------------------------------------------
>>> Custom Scan (GpuJoin) (cost=12385.67..291245.35 rows=9815465 width=4)
>>> GPU Projection: t0.id
>>> Depth 1: GpuHashJoin, HashKeys: (t0.bid)
>>> JoinQuals: (t0.bid = t2.bid)
>>> Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
>>> Depth 2: GpuHashJoin, HashKeys: (t0.aid)
>>> JoinQuals: (t0.aid = t1.aid)
>>> Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
>>> -> Seq Scan on t0 (cost=0.00..183333.96 rows=9999996 width=12)
>>> -> Seq Scan on t2 (cost=0.00..1935.00 rows=100000 width=4)
>>> -> Seq Scan on t1 (cost=0.00..1935.00 rows=100000 width=4)
>>> (11 rows)

> My largest concern for you proposition is, ForeignScan/CustomScan node is
> enforced to print name of underlying relations, regardless of its actual
> behavior. The above GpuJoin never scans tables at least, thus, it mislead
> users if we have no choice to print underlying relation names.

OK, I understand we would need special handling for such custom joins.

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-07-29 04:28:22 Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Previous Message Michael Paquier 2016-07-29 03:47:53 Re: Wrong defeinition of pq_putmessage_noblock since 9.5