|From:||Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>|
|To:||Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>|
|Cc:||Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)anarazel(dot)de>|
|Subject:||Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Sorry for my late response.
The attached patch reflects your comments.
> Here are few comments on latest patch:
> make/make check is fine, however I am getting regression failure in
> postgres_fdw contrib module (attached regression.diff).
> Please investigate and fix.
It was an incorrect interaction when postgres_fdw tries to push down
sorting to the remote side. We cannot attach LIMIT clause on the plain
scan path across SORT, however, the previous version estimated the cost
for the plain scan with LIMIT clause even if local sorting is needed.
If remote scan may return just 10 rows, estimated cost of the local sort
is very lightweight, thus, this unreasonable path was chosen.
(On the other hands, its query execution results were correct because
ps_numTuples is not delivered across Sort node, so ForeignScan eventually
scanned all the remote tuples. It made correct results but not optimal
from the viewpoint of performance.)
The v3 patch estimates the cost with remote LIMIT clause only if supplied
pathkey strictly matches with the final output order of the query, thus,
no local sorting is expected.
Some of the regression test cases still have different plans but due to
the new optimization by remote LIMIT clause.
Without remote LIMIT clause, some of regression test cases preferred
remote-JOIN + local-SORT then local-LIMIT.
Once we have remote-LIMIT option, it allows to discount the cost for
remote-SORT by choice of top-k heap sorting.
It changed the optimizer's decision on some test cases.
Potential one big change is the test case below.
-- CROSS JOIN, not pushed down
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
It assumed CROSS JOIN was not pushed down due to the cost for network
traffic, however, remote LIMIT reduced the estimated number of tuples
to be moved. So, all of the CROSS JOIN + ORDER BY + LIMIT became to run
on the remote side.
> + *
> + * MEMO: root->limit_tuples is not attached when query
> + * grouping-clause or aggregate functions. So, we don's adjust
> + * rows even if LIMIT <const> is supplied.
> Can you please explain why you are not doing this for grouping-clause or
> aggregate functions.
See grouping_planner() at optimizer/plan/planner.c
It puts an invalid value on the root->limit_tuples if query has GROUP BY clause,
so we cannot know number of tuples to be returned even if we have upper limit
* Figure out whether there's a hard limit on the number of rows that
* query_planner's result subplan needs to return. Even if we know a
* hard limit overall, it doesn't apply if the query has any
* grouping/aggregation operations, or SRFs in the tlist.
if (parse->groupClause ||
root->limit_tuples = -1.0;
root->limit_tuples = limit_tuples;
> don's => don't
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
|Next Message||Heikki Linnakangas||2016-12-05 08:11:42||Re: pgcrypto compilation error due to stack-allocated EVP_CIPHER_CTX|
|Previous Message||Amit Kapila||2016-12-05 07:46:38||Re: Write Ahead Logging for Hash Indexes|