Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-04-09 01:48:44
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8010CD77E@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hanada-san,

> In addition to your comments, I removed useless code that retrieves ForeignPath
> from outer/inner RelOptInfo and store them into ForeignPath#fdw_private. Now
> postgres_fdw’s join pushd-down is free from existence of ForeignPath under the
> join relation. This means that we can support the case Robert mentioned before,
> that whole "(huge JOIN large) JOIN small” can be pushed down even if “(huge JOIN
> large)” is dominated by another join path.
>
Yes, it's definitely reasonable design, and fits intention of the interface.
I should point out it from the beginning. :-)

> > "l" of the first SELECT represents a whole-row reference.
> > However, underlying SELECT contains system columns in its target-
> > list.
> >
> > Is it available to construct such a query?
> > SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ...
> > ^^^^^^^^^^
>
> Simple relation reference such as "l" is not sufficient for the purpose, yes.
> But putting columns in parentheses would not work when a user column is referenced
> in original query.
>
> I implemented deparseProjectionSql to use ROW(...) expression for a whole-row
> reference in the target list, in addition to ordinary column references for
> actually used columns and ctid.
>
> Please see the test case for mixed use of ctid and whole-row reference to
> postgres_fdw’s regression tests. Now a whole-row reference in the remote query
> looks like this:
>
It seems to me that deparseProjectionSql() works properly.

> -- ctid with whole-row reference
> EXPLAIN (COSTS false, VERBOSE)
> SELECT t1.ctid, t1, t2 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3,
> t1.c1 OFFSET 100 LIMIT 10;
>
> ----------------------------------------------------------------------------
> ----------------------------------------------------------------------------
> -------------------------------------------------------------------------
> Limit
> Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
> -> Sort
> Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
> Sort Key: t1.c3, t1.c1
> -> Foreign Scan
> Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
> Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT l.a7,
> ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a12, l.a10 FROM
> (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a1
> (8 rows)
>
> In fact l.a12 and l.a10, for t1.c3 and t1.c1, are redundant in transferred data,
> but IMO this would simplify the code for deparsing.
>
I agree. Even if we can reduce networking cost a little, tuple
construction takes CPU cycles. Your decision is fair enough for
me.

> > * merge_fpinfo()
> > It seems to me fpinfo->rows should be joinrel->rows, and
> > fpinfo->width also should be joinrel->width.
> > No need to have special intelligence here, isn't it?
>
>
> Oops. They are vestige of my struggle which disabled SELECT clause optimization
> (omit unused columns). Now width and rows are inherited from joinrel. Besides
> that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use simple
> summary, not average.
>
Does fpinfo->fdw_startup_cost represent a cost to open connection to remote
PostgreSQL, doesn't it?

postgres_fdw.c:1757 says as follows:

/*
* Add some additional cost factors to account for connection overhead
* (fdw_startup_cost), transferring data across the network
* (fdw_tuple_cost per retrieved row), and local manipulation of the data
* (cpu_tuple_cost per retrieved row).
*/

If so, does a ForeignScan that involves 100 underlying relation takes 100
times heavy network operations on startup? Probably, no.
I think, average is better than sum, and max of them will reflect the cost
more correctly.
Also, fdw_tuple_cost introduce the cost of data transfer over the network.
I thinks, weighted average is the best strategy, like:
fpinfo->fdw_tuple_cost =
(fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) * fpinfo_o->fdw_tuple_cost +
(fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) * fpinfo_i->fdw_tuple_cost;

That's just my suggestion. Please apply the best way you thought.

> > * explain output
> >
> > EXPLAIN output may be a bit insufficient to know what does it
> > actually try to do.
> >
> > postgres=# explain select * from ft1,ft2 where a = b;
> > QUERY PLAN
> > --------------------------------------------------------
> > Foreign Scan (cost=200.00..212.80 rows=1280 width=80)
> > (1 row)
> >
> > Even though it is not an immediate request, it seems to me
> > better to show users joined relations and remote ON/WHERE
> > clause here.
> >
>
> Like this?
>
> Foreign Scan on ft1 INNER JOIN ft2 ON ft1.a = ft2.b (cost=200.00..212.80
> rows=1280 width=80)
> …
>
No. This line is produced by ExplainScanTarget(), so it requires the
backend knowledge to individual FDW.
Rather than the backend, postgresExplainForeignScan() shall produce
the output.

> It might produce a very long line in a case of joining many tables because it
> contains most of remote query other than SELECT clause, but I prefer detailed.
> Another idea is to print “Join Cond” and “Remote Filter” as separated EXPLAIN
> items.
>
It is good, if postgres_fdw can generate relations name involved in
the join for each level, and join cond/remote filter individually.

> Note that v8 patch doesn’t contain this change yet!
>
It is a "nice to have" feature. So, I don't think the first commit needs
to support this. Just a suggestion in the next step.

* implementation suggestion

At the deparseJoinSql(),

+ /* print SELECT clause of the join scan */
+ initStringInfo(&selbuf);
+ i = 0;
+ foreach(lc, baserel->reltargetlist)
+ {
+ Var *var = (Var *) lfirst(lc);
+ TargetEntry *tle;
+
+ if (i > 0)
+ appendStringInfoString(&selbuf, ", ");
+ deparseJoinVar(var, &context);
+
+ tle = makeTargetEntry((Expr *) copyObject(var),
+ i + 1, pstrdup(""), false);
+ if (fdw_ps_tlist)
+ *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+
+ i++;
+ }

The tle is a copy of the original target-entry, and var-node is also
copied one. Why is the tle copied on lappend() again?
Also, NULL as acceptable as 3rd argument of makeTargetEntry.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2015-04-09 02:33:44 Re: Removal of FORCE option in REINDEX
Previous Message Tom Lane 2015-04-09 01:31:52 Re: Failure to coerce unknown type to specific type