Re: pgsql_fdw, FDW for PostgreSQL server

From: Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Date: 2012-03-29 15:46:03
Message-ID: 4F7483BB.90903@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached are latest version of pgsql_fdw patches. Note that
pgsql_fdw_analyze.patch is only for test the effect of local statistics.
Please apply patches in the order below:

(1) pgsql_fdw_v18.patch
(2) pgsql_fdw_pushdown_v11.patch
(3) pgsql_fdw_analyze.patch (if you want to try local stats)

(2012/03/27 20:49), Albe Laurenz wrote:
>> How about getting # of rows estimate by executing EXPLAIN for
>> fully-fledged remote query (IOW, contains pushed-down WHERE clause),
> and
>> estimate selectivity of local filter on the basis of the statistics
>> which are generated by FDW via do_analyze_rel() and FDW-specific
>> sampling function? In this design, we would be able to use quite
>> correct rows estimate because we can consider filtering stuffs done on
>> each side separately, though it requires expensive remote EXPLAIN for
>> each possible path.
>
> That sounds nice.
> How would that work with a query that has one condition that could be
> pushed down and one that has to be filtered locally?
>
> Would you use the (local) statistics for the full table or can you
> somehow account for the fact that rows have already been filtered
> out remotely, which might influence the distribution?

First of all, pgsql_fdw sorts conditions into three groups:

(A) can be pushed down, and contains no Param node
(B) can be pushed down, and contains Param node
(C) can not be pushed down

Then pgsql_fdw generates SELECT statement which contains only (A), and
execute EXPLAIN for that query to get rough estimate. Then, pgsql
estimates selectivity of (B) and (C) by calling
clauselist_selectivity(). Finally pgsql_fdw multiply rough estimate,
selectivity of (B) and (C). Thus we can get estimate of # of rows
returned by the scan.

Aside that, before executing actual query, pgsql_fdw appends (B) to the
remote query string to use external parameter. Conditions in (C) are
evaluated on local side anyway.

>> You are right, built-in equality and inequality operators don't cause
>> collation problem. Perhaps allowing them would cover significant
> cases
>> of string comparison, but I'm not sure how to determine whether an
>> operator is = or != in generic way. We might have to hold list of oid
>> for collation-safe operator/functions until we support ROUTINE MAPPING
>> or something like that... Anyway, I'll fix pgsql_fdw to allow = and
> !=
>> for character types.
>
> I believe that this covers a significant percentage of real-world cases.
> I'd think that every built-in operator with name "=" or "<>" could
> be pushed down.

I've fixed pgsql_fdw to push down "=" and "<>" even if they take
collatable type as operands. Now collation check is done for only input
type of OpExpr and ScalarArrayOpExpr. In addition to the two, operators
below seem safe to push down.

concatenate : ||
LIKE : ~~, ~~*, !~~, !~~*
POSIX regex : ~, ~*, !~, !~*
Text Search : @@

Regards,
--
Shigeru HANADA

Attachment Content-Type Size
pgsql_fdw_v18.patch text/plain 113.0 KB
pgsql_fdw_pushdown_v11.patch text/plain 57.5 KB
pgsql_fdw_analyze.patch text/plain 32.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2012-03-29 15:49:44 Re: Command Triggers patch v18
Previous Message Tom Lane 2012-03-29 15:42:10 Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)