Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "Andres Freund" <andres(at)anarazel(dot)de>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Subject: Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]
Date: 2017-03-01 00:33:23
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8012B8A9D@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The attached patch is rebased version of pass-down LIMIT clause patch,
which was forgotten to register on the last CF.

It allows to inform required number of rows to the sub-plans not only
ones we have individually handled at pass_down_bound().
Its primary target is control of number of remote tuple transfer over
the network connection by postgres_fdw.

According to the past discussion, we add a new field @ps_numTuples on
the PlanState to represent the required number of tuples.
Limit node assign a particular number on the field of sub-plan, then
this sub-plan can know its upper node does not require entire tuples,
and adjust its execution storategy.
Like MergeAppend, the sub-plan can also pass down the bounds to its
sub-plans again, if it makes sense and works correctly.

This feature is potentially a basis of GPU-based sorting on top of
CustomScan, because it has advantage for a workload to pick up the
top-N tuples if its data-size is enough small to load onto GPU-RAM.

Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kohei KaiGai
> Sent: Tuesday, January 03, 2017 12:07 PM
> To: Kaigai Kouhei(海外 浩平) <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> Cc: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>; Robert Haas
> <robertmhaas(at)gmail(dot)com>; pgsql-hackers(at)postgresql(dot)org; Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>; Andres Freund <andres(at)anarazel(dot)de>
> Subject: Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan
> [take-2]
>
> Oops, I oversight this patch was marked as "returned with feedback", not
> "moved to the next CF".
>
> Its status has not been changed since the last update. (Code was revised
> according to the last comment by Jeevan, but CF-Nov was time up at that
> time.)
>
> How do I handle the patch?
>
> 2016-12-05 16:49 GMT+09:00 Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>:
> > Hello,
> >
> > Sorry for my late response.
> > The attached patch reflects your comments.
> >
> >> Here are few comments on latest patch:
> >>
> >>
> >> 1.
> >> make/make check is fine, however I am getting regression failure in
> >> postgres_fdw contrib module (attached regression.diff).
> >> Please investigate and fix.
> >>
> > It was an incorrect interaction when postgres_fdw tries to push down
> > sorting to the remote side. We cannot attach LIMIT clause on the plain
> > scan path across SORT, however, the previous version estimated the
> > cost for the plain scan with LIMIT clause even if local sorting is needed.
> > If remote scan may return just 10 rows, estimated cost of the local
> > sort is very lightweight, thus, this unreasonable path was chosen.
> > (On the other hands, its query execution results were correct because
> > ps_numTuples is not delivered across Sort node, so ForeignScan
> > eventually scanned all the remote tuples. It made correct results but
> > not optimal from the viewpoint of performance.)
> >
> > The v3 patch estimates the cost with remote LIMIT clause only if
> > supplied pathkey strictly matches with the final output order of the
> > query, thus, no local sorting is expected.
> >
> > Some of the regression test cases still have different plans but due
> > to the new optimization by remote LIMIT clause.
> > Without remote LIMIT clause, some of regression test cases preferred
> > remote-JOIN + local-SORT then local-LIMIT.
> > Once we have remote-LIMIT option, it allows to discount the cost for
> > remote-SORT by choice of top-k heap sorting.
> > It changed the optimizer's decision on some test cases.
> >
> > Potential one big change is the test case below.
> >
> > -- CROSS JOIN, not pushed down
> > EXPLAIN (VERBOSE, COSTS OFF)
> > SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1,
> > t2.c1 OFFSET 100 LIMIT 10;
> >
> > It assumed CROSS JOIN was not pushed down due to the cost for network
> > traffic, however, remote LIMIT reduced the estimated number of tuples
> > to be moved. So, all of the CROSS JOIN + ORDER BY + LIMIT became to
> > run on the remote side.
> >
> >> 2.
> >> + *
> >> + * MEMO: root->limit_tuples is not attached when query
> >> contains
> >> + * grouping-clause or aggregate functions. So, we don's
> adjust
> >> + * rows even if LIMIT <const> is supplied.
> >>
> >> Can you please explain why you are not doing this for grouping-clause
> >> or aggregate functions.
> >>
> > See grouping_planner() at optimizer/plan/planner.c It puts an invalid
> > value on the root->limit_tuples if query has GROUP BY clause, so we
> > cannot know number of tuples to be returned even if we have upper
> > limit actually.
> >
> > /*
> > * Figure out whether there's a hard limit on the number of rows
> that
> > * query_planner's result subplan needs to return. Even if we
> know a
> > * hard limit overall, it doesn't apply if the query has any
> > * grouping/aggregation operations, or SRFs in the tlist.
> > */
> > if (parse->groupClause ||
> > parse->groupingSets ||
> > parse->distinctClause ||
> > parse->hasAggs ||
> > parse->hasWindowFuncs ||
> > parse->hasTargetSRFs ||
> > root->hasHavingQual)
> > root->limit_tuples = -1.0;
> > else
> > root->limit_tuples = limit_tuples;
> >
> >> 3.
> >> Typo:
> >>
> >> don's => don't
> >>
> > Fixed,
> >
> > best regards,
> > ----
> > PG-Strom Project / NEC OSS Promotion Center KaiGai Kohei
> > <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> >
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org) To
> > make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >
>
>
>
> --
> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

Attachment Content-Type Size
passdown-limit-fdw.v4.patch application/octet-stream 17.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2017-03-01 00:59:28 Adding support for Default partition in partitioning
Previous Message Jan Michálek 2017-02-28 23:27:20 Re: Other formats in pset like markdown, rst, mediawiki