Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]
Date: 2016-11-21 08:29:33
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80125F306@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

The attached patch is a revised version of pass-down LIMIT to FDW/CSP.

Below is the updates from the last version.

'ps_numTuples' of PlanState was declared as uint64, instead of long
to avoid problems on 32bits machine when a large LIMIT clause is
supplied.

'ps_numTuples' is re-interpreted; 0 means that its upper node wants
to fetch all the tuples. It allows to eliminate a boring initialization
on ExecInit handler for each executor node.

Even though it was not suggested, estimate_path_cost_size() of postgres_fdw
adjusts number of rows if foreign-path is located on top-level of
the base-relations and LIMIT clause takes a constant value.
It will make more adequate plan as follows:

* WITHOUT this patch
--------------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id and t_a.x < t_b.x LIMIT 100;
QUERY PLAN
----------------------------------------------------------------------------------------
Limit (cost=261.17..274.43 rows=100 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
-> Hash Join (cost=261.17..581.50 rows=2416 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
Hash Cond: (t_a.id = t_b.id)
Join Filter: (t_a.x < t_b.x)
-> Foreign Scan on public.t_a (cost=100.00..146.12 rows=1204 width=44)
Output: t_a.id, t_a.x, t_a.y
Remote SQL: SELECT id, x, y FROM public.t
-> Hash (cost=146.12..146.12 rows=1204 width=44)
Output: t_b.id, t_b.x, t_b.y
-> Foreign Scan on public.t_b (cost=100.00..146.12 rows=1204 width=44)
Output: t_b.id, t_b.x, t_b.y
Remote SQL: SELECT id, x, y FROM public.t
(14 rows)

* WITH this patch
-----------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id and t_a.x < t_b.x LIMIT 100;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Limit (cost=100.00..146.58 rows=100 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
-> Foreign Scan (cost=100.00..146.58 rows=100 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
Relations: (public.t_a) INNER JOIN (public.t_b)
Remote SQL: SELECT r1.id, r1.x, r1.y, r2.id, r2.x, r2.y FROM (public.t r1 INNER JOIN public.t r2 ON (((r1.x < r2.x)) AND ((r1.id = r2.id))))
(6 rows)

On the other hands, I noticed it is not safe to attach LIMIT clause at
the planner stage because root->limit_tuples is declared as double.
Even if LIMIT clause takes a constant value, it is potentially larger
than 2^53 which is the limitation we can represent accurately with
float64 data type but LIMIT clause allows up to 2^63-1.
So, postgres_fdw now attaches LIMIT clause on the remote query on
execution time only.

Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> Sent: Thursday, November 10, 2016 3:08 AM
> To: Kaigai Kouhei(海外 浩平) <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> Cc: pgsql-hackers(at)postgresql(dot)org; Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com>; Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>; Andres Freund <andres(at)anarazel(dot)de>
> Subject: ##freemail## Re: PassDownLimitBound for ForeignScan/CustomScan
> [take-2]
>
> On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> wrote:
> > As an example, I enhanced postgres_fdw to understand the ps_numTuples
> > if it is set. If and when remote ORDER BY is pushed down, the latest
> > code tries to sort the entire remote table because it does not know
> > how many rows to be returned. Thus, it took larger execution time.
> > On the other hands, the patched one runs the remote query with LIMIT
> > clause according to the ps_numTuples; which is informed by the Limit
> > node on top of the ForeignScan node.
>
> So there are two cases here. If the user says LIMIT 12, we could in theory
> know that at planner time and optimize accordingly. If the user says LIMIT
> twelve(), however, we will need to wait until execution time unless twelve()
> happens to be capable of being simplified to a constant by the planner.
>
> Therefore, it's possible to imagine having two mechanisms here. In the
> simple case where the LIMIT and OFFSET values are constants, we could
> implement a system to get hold of that information during planning and
> use it for whatever we like. In addition, we can have an
> execution-time system that optimizes based on values available at execution
> (regardless of whether those values were also available during planning).
> Those are, basically, two separate things, and this patch has enough to
> do just focusing on one of them.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company

Attachment Content-Type Size
passdown-limit-fdw.v2.patch application/octet-stream 11.1 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-11-21 08:33:47 Re: Re: BUG #13755: pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled
Previous Message Craig Ringer 2016-11-21 08:21:47 Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?