PassDownLimitBound for ForeignScan/CustomScan [take-2]

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)anarazel(dot)de>
Subject: PassDownLimitBound for ForeignScan/CustomScan [take-2]
Date: 2016-11-24 10:21:31
Message-ID: CAM2+6=VBkoPE54U_jGr8iSuR7Xe+BnwesiUK1tye3hckcy_AyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 21, 2016 at 1:59 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:

> Hello,
>
> The attached patch is a revised version of pass-down LIMIT to FDW/CSP.
>
> Below is the updates from the last version.
>
> 'ps_numTuples' of PlanState was declared as uint64, instead of long
> to avoid problems on 32bits machine when a large LIMIT clause is
> supplied.
>
> 'ps_numTuples' is re-interpreted; 0 means that its upper node wants
> to fetch all the tuples. It allows to eliminate a boring initialization
> on ExecInit handler for each executor node.
>
> Even though it was not suggested, estimate_path_cost_size() of postgres_fdw
> adjusts number of rows if foreign-path is located on top-level of
> the base-relations and LIMIT clause takes a constant value.
> It will make more adequate plan as follows:
>
> * WITHOUT this patch
> --------------------
> postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id
> and t_a.x < t_b.x LIMIT 100;
> QUERY PLAN
> ------------------------------------------------------------
> ----------------------------
> Limit (cost=261.17..274.43 rows=100 width=88)
> Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
> -> Hash Join (cost=261.17..581.50 rows=2416 width=88)
> Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
> Hash Cond: (t_a.id = t_b.id)
> Join Filter: (t_a.x < t_b.x)
> -> Foreign Scan on public.t_a (cost=100.00..146.12 rows=1204
> width=44)
> Output: t_a.id, t_a.x, t_a.y
> Remote SQL: SELECT id, x, y FROM public.t
> -> Hash (cost=146.12..146.12 rows=1204 width=44)
> Output: t_b.id, t_b.x, t_b.y
> -> Foreign Scan on public.t_b (cost=100.00..146.12
> rows=1204 width=44)
> Output: t_b.id, t_b.x, t_b.y
> Remote SQL: SELECT id, x, y FROM public.t
> (14 rows)
>
> * WITH this patch
> -----------------
> postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id
> and t_a.x < t_b.x LIMIT 100;
>
> QUERY PLAN
> ------------------------------------------------------------
> --------------------------------------------------
> Limit (cost=100.00..146.58 rows=100 width=88)
> Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
> -> Foreign Scan (cost=100.00..146.58 rows=100 width=88)
> Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
> Relations: (public.t_a) INNER JOIN (public.t_b)
> Remote SQL: SELECT r1.id, r1.x, r1.y, r2.id, r2.x, r2.y FROM
> (public.t r1 INNER JOIN public.t r2 ON (((r1.x < r2.x)) AND ((r1.id =
> r2.id))))
> (6 rows)
>
>
> That's nice.

> On the other hands, I noticed it is not safe to attach LIMIT clause at
> the planner stage because root->limit_tuples is declared as double.
> Even if LIMIT clause takes a constant value, it is potentially larger
> than 2^53 which is the limitation we can represent accurately with
> float64 data type but LIMIT clause allows up to 2^63-1.
> So, postgres_fdw now attaches LIMIT clause on the remote query on
> execution time only.
>

I think, it's OK.

Here are few comments on latest patch:

1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.

2.
+ *
+ * MEMO: root->limit_tuples is not attached when query contains
+ * grouping-clause or aggregate functions. So, we don's adjust
+ * rows even if LIMIT <const> is supplied.

Can you please explain why you are not doing this for grouping-clause or
aggregate functions.

3.
Typo:

don's => don't

Rest of the changes look good to me.

Thanks

> Thanks,
> ----
> PG-Strom Project / NEC OSS Promotion Center
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>
>
> > -----Original Message-----
> > From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> > Sent: Thursday, November 10, 2016 3:08 AM
> > To: Kaigai Kouhei(海外 浩平) <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> > Cc: pgsql-hackers(at)postgresql(dot)org; Jeevan Chalke
> > <jeevan(dot)chalke(at)enterprisedb(dot)com>; Etsuro Fujita
> > <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>; Andres Freund <andres(at)anarazel(dot)de>
> > Subject: ##freemail## Re: PassDownLimitBound for ForeignScan/CustomScan
> > [take-2]
> >
> > On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> > wrote:
> > > As an example, I enhanced postgres_fdw to understand the ps_numTuples
> > > if it is set. If and when remote ORDER BY is pushed down, the latest
> > > code tries to sort the entire remote table because it does not know
> > > how many rows to be returned. Thus, it took larger execution time.
> > > On the other hands, the patched one runs the remote query with LIMIT
> > > clause according to the ps_numTuples; which is informed by the Limit
> > > node on top of the ForeignScan node.
> >
> > So there are two cases here. If the user says LIMIT 12, we could in
> theory
> > know that at planner time and optimize accordingly. If the user says
> LIMIT
> > twelve(), however, we will need to wait until execution time unless
> twelve()
> > happens to be capable of being simplified to a constant by the planner.
> >
> > Therefore, it's possible to imagine having two mechanisms here. In the
> > simple case where the LIMIT and OFFSET values are constants, we could
> > implement a system to get hold of that information during planning and
> > use it for whatever we like. In addition, we can have an
> > execution-time system that optimizes based on values available at
> execution
> > (regardless of whether those values were also available during planning).
> > Those are, basically, two separate things, and this patch has enough to
> > do just focusing on one of them.
> >
> > --
> > Robert Haas
> > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> > Company
>

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment Content-Type Size
regression.diffs application/octet-stream 35.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2016-11-24 10:29:10 Functions Immutable but not parallel safe?
Previous Message Etsuro Fujita 2016-11-24 10:04:46 Re: Push down more full joins in postgres_fdw